MaskCollection-class {Biostrings} | R Documentation |
The MaskCollection class is a container for storing a collection of masks that can be used to mask regions in a sequence.
In the context of the Biostrings package, a mask is a set of regions
in a sequence that need to be excluded from some computation.
For example, when calling alphabetFrequency
or matchPattern
on a chromosome sequence, you might want
to exclude some regions like the centromere or the repeat regions.
This can be achieved by putting one or several masks on the sequence
before calling alphabetFrequency
on it.
A MaskCollection object is a vector-like object that represents such set of masks. Like standard R vectors, it has a "length" which is the number of masks contained in it. But unlike standard R vectors, it also has a "width" which determines the length of the sequences it can be "put on". For example, a MaskCollection object of width 20000 can only be put on an XString object of 20000 letters.
Each mask in a MaskCollection object x
is just a finite set of
integers that are >= 1 and <= width(x)
.
When "put on" a sequence, these integers indicate the positions of the
letters to mask.
Internally, each mask is represented by a NormalIRanges object.
In the code snippets below, x
is a MaskCollection object.
length(x)
:
The number of masks in x
.
width(x)
:
The common with of all the masks in x
.
This determines the length of the sequences that x
can be
"put on".
active(x)
:
A logical vector of the same length as x
where each
element indicates whether the corresponding mask is active or not.
names(x)
:
NULL
or a character vector of the same length as x
.
nir_list(x)
:
A list of the same length as x
, where each element is
a NormalIRanges object representing a mask in x
.
Mask(mask.width, start=NULL, end=NULL, width=NULL)
:
Return a single mask (i.e. a MaskCollection object of length 1)
of width mask.width
(a single integer >= 1)
and masking the ranges of positions specified by start
,
end
and width
.
See the IRanges
constructor (?IRanges
)
for how start
, end
and width
can be specified.
Note that the returned mask is active and unnamed.
In the code snippets below, x
is a MaskCollection object.
isEmpty(x)
:
Return a logical vector of the same length as x
, indicating,
for each mask in x
, whether it's empty or not.
max(x)
:
The greatest (or last, or rightmost) masked position for each mask.
This is a numeric vector of the same length as x
.
min(x)
:
The smallest (or first, or leftmost) masked position for each mask.
This is a numeric vector of the same length as x
.
maskedwidth(x)
:
The number of masked position for each mask.
This is an integer vector of the same length as x
where
all values are >= 0 and <= width(x)
.
maskedratio(x)
:
maskedwidth(x) / width(x)
In the code snippets below, x
is a MaskCollection object.
x[i]
:
Return a new MaskCollection object made of the selected masks.
i
can be a numeric vector, a logical vector, NULL
or missing.
append(x, values, after=length(x))
:
Add masks to x
.
x[[i]]
:
Extract the i-th mask as a NormalIRanges object.
In the code snippets below, x
is a MaskCollection object.
narrow(x, start=NA, end=NA, width=NA, use.names=TRUE)
:
Narrow the masks in x
.
reduce(x)
:
Return a MaskCollection object of length 1 made of the union
(or merging, or collapsing) of all the active masks in x
.
gaps(x)
:
Invert the masks in x
.
H. Pages
MaskedXString-class,
maskMotif
,
alphabetFrequency
,
reverse
,
matchPattern
,
NormalIRanges-class
## Making a MaskCollection object: mask1 <- Mask(mask.width=29, start=c(11, 25, 28), width=c(5, 2, 2)) mask2 <- Mask(mask.width=29, start=c(3, 10, 27), width=c(5, 8, 1)) mask3 <- Mask(mask.width=29, start=c(7, 12), width=c(2, 4)) mymasks <- append(append(mask1, mask2), mask3) mymasks length(mymasks) width(mymasks) reduce(mymasks) gaps(mymasks) ## Putting a MaskCollection object on a sequence: x <- DNAString("ACACAACTAGATAGNACTNNGAGAGACGC") x length(x) # same as width(mymasks) nchar(x) # same as length(x) masks(x) <- mymasks x length(x) # has not changed nchar(x) # has changed alphabetFrequency(x) ## Removing the masks: masks(x) <- NULL x alphabetFrequency(x) ## Active/inactive masks: reduce(mymasks) active(mymasks)[2] <- FALSE mymasks reduce(mymasks) ## Other advanced operations: mymasks[[2]] length(mymasks[[2]]) mymasks[[2]][-3] append(mymasks[-2], gaps(mymasks[2])) mymasks2 <- narrow(mymasks, start=8) mymasks2 mymasks2[[2]]