XString-class {Biostrings} | R Documentation |
The BString class is a general container for storing a big string (a long sequence of characters) and for making its manipulation easy and efficient.
The DNAString, RNAString and AAString classes are similar containers but with the more biology-oriented purpose of storing a DNA sequence (DNAString), an RNA sequence (RNAString), or a sequence of amino acids (AAString).
All those containers derive directly (and with no additional slots) from the XString virtual class. They are also said to be XString subtypes.
The 2 main differences between an XString object and a standard character vector are: (1) the data stored in an XString object are not copied on object duplication and (2) an XString object can only store a single string (see the XStringSet container for an efficient way to store a big collection of strings in a single object).
Unlike the DNAString, RNAString and AAString containers that accept only a predefined set of letters (the alphabet), a BString object can be used for storing any single string based on a single-byte character set.
In the code snippet below,
x
can be a single string (character vector of length 1)
or an XString object.
BString(x, start=1, nchar=NA, check=TRUE)
:
Tries to convert x
into a BString object by reading
nchar
letters starting at position start
in x
.
In the code snippets below, x
is an XString object.
alphabet(x)
:
NULL
for a BString
object.
See the corresponding man pages when x
is a
DNAString, RNAString or AAString object.
length(x)
or nchar(x)
:
Get the length of an XString object, i.e., its number of letters.
In the code snippets below, x
is an XString object.
as.character(x)
:
Converts x
to a character string.
toString(x)
:
Equivalent to as.character(x)
.
In the code snippets below, x
is an XString object.
subseq(x, start=NA, end=NA, width=NA)
:
Extract the subsequence from x
specified by start
,
end
and width
.
At least one of start
, end
and width
must be
NA
and the other ones must be single numeric values.
If at least two of them are NA
s, then start=NA
is
interpreted as start=1
and end=NA
is interpreted as
end=length(x)
.
A negative value for start
or end
is interpreted
relatively to the end of x
e.g. start=-1
is equivalent
to start=length(x)
.
Finally, if width
is not NA
, then start
and
end
cannot be both NA
s.
A note about performance: subseq
does NOT copy the sequence data,
hence it's very efficient and is the recommended way to extract a
subsequence (i.e. a set of consecutive letters) from an XString object.
For example, extracting a 100Mb subsequence from Human chromosome 1
(250Mb) with subseq
is (almost) instantaneous and has (almost)
no memory footprint (the cost in time and memory does not depend on the
length of the original sequence or on the length of the subsequence to
extract).
x[i]
:
Return a new XString object made of the selected letters (subscript
i
must be an NA-free numeric vector specifying the positions of
the letters to select).
The returned object belongs to the same class (i.e. same XString
subtype) as x
.
Note that, unlike subseq
, x[i]
does copy the sequence
data and therefore will be very inefficient for extracting a big number
of letters (e.g. when i
contains millions of positions).
In the code snippets below, e1
and e2
are XString objects.
e1 == e2
:
TRUE
if e1
is equal to e2
.
FALSE
otherwise.
Comparison between two XString objects of different subtypes (e.g. a BString object and a DNAString object) is not supported with one exception: a DNAString object and an RNAString object can be compared (see RNAString-class for more details about this).
Comparison between a BString object and a character string is also supported (see examples below).
e1 != e2
:
Equivalent to !(e1 == e2)
.
H. Pages
letter
,
DNAString-class,
RNAString-class,
AAString-class,
XStringSet-class,
XStringViews-class,
reverse
b <- BString("I am a BString object") b length(b) ## Subsequence extraction subseq(b) subseq(b, start=3) subseq(b, start=-3) subseq(b, end=-3) subseq(b, end=-3, width=5) ## Subsetting b2 <- b[length(b):1] # better done with reverse(b) as.character(b2) b2 == b # FALSE b2 == as.character(b2) # TRUE ## b[1:length(b)] is equal but not identical to b! b == b[1:length(b)] # TRUE identical(b, 1:length(b)) # FALSE ## This is because subsetting an XString object with [ makes a copy ## of part or all its sequence data. Hence, for the resulting object, ## the internal slot containing the memory address of the sequence ## data differs from the original. This is enough for identical() to ## see the 2 objects as different.