XString-class {Biostrings}R Documentation

BString objects

Description

The BString class is a general container for storing a big string (a long sequence of characters) and for making its manipulation easy and efficient.

The DNAString, RNAString and AAString classes are similar containers but with the more biology-oriented purpose of storing a DNA sequence (DNAString), an RNA sequence (RNAString), or a sequence of amino acids (AAString).

All those containers derive directly (and with no additional slots) from the XString virtual class. They are also said to be XString subtypes.

Details

The 2 main differences between an XString object and a standard character vector are: (1) the data stored in an XString object are not copied on object duplication and (2) an XString object can only store a single string (see the XStringSet container for an efficient way to store a big collection of strings in a single object).

Unlike the DNAString, RNAString and AAString containers that accept only a predefined set of letters (the alphabet), a BString object can be used for storing any single string based on a single-byte character set.

Constructor-like functions and generics

In the code snippet below, x can be a single string (character vector of length 1) or an XString object.

BString(x, start=1, nchar=NA, check=TRUE): Tries to convert x into a BString object by reading nchar letters starting at position start in x.

Accessor methods

In the code snippets below, x is an XString object.

alphabet(x): NULL for a BString object. See the corresponding man pages when x is a DNAString, RNAString or AAString object.
length(x) or nchar(x): Get the length of an XString object, i.e., its number of letters.

Coercion

In the code snippets below, x is an XString object.

as.character(x): Converts x to a character string.
toString(x): Equivalent to as.character(x).

Subsequence extraction and subsetting

In the code snippets below, x is an XString object.

subseq(x, start=NA, end=NA, width=NA): Extract the subsequence from x specified by start, end and width. At least one of start, end and width must be NA and the other ones must be single numeric values. If at least two of them are NAs, then start=NA is interpreted as start=1 and end=NA is interpreted as end=length(x). A negative value for start or end is interpreted relatively to the end of x e.g. start=-1 is equivalent to start=length(x). Finally, if width is not NA, then start and end cannot be both NAs.

A note about performance: subseq does NOT copy the sequence data, hence it's very efficient and is the recommended way to extract a subsequence (i.e. a set of consecutive letters) from an XString object. For example, extracting a 100Mb subsequence from Human chromosome 1 (250Mb) with subseq is (almost) instantaneous and has (almost) no memory footprint (the cost in time and memory does not depend on the length of the original sequence or on the length of the subsequence to extract).

x[i]: Return a new XString object made of the selected letters (subscript i must be an NA-free numeric vector specifying the positions of the letters to select). The returned object belongs to the same class (i.e. same XString subtype) as x.

Note that, unlike subseq, x[i] does copy the sequence data and therefore will be very inefficient for extracting a big number of letters (e.g. when i contains millions of positions).

Equality

In the code snippets below, e1 and e2 are XString objects.

e1 == e2: TRUE if e1 is equal to e2. FALSE otherwise.

Comparison between two XString objects of different subtypes (e.g. a BString object and a DNAString object) is not supported with one exception: a DNAString object and an RNAString object can be compared (see RNAString-class for more details about this).

Comparison between a BString object and a character string is also supported (see examples below).

e1 != e2: Equivalent to !(e1 == e2).

Author(s)

H. Pages

See Also

letter, DNAString-class, RNAString-class, AAString-class, XStringSet-class, XStringViews-class, reverse

Examples

  b <- BString("I am a BString object")
  b
  length(b)

  ## Subsequence extraction
  subseq(b)
  subseq(b, start=3)
  subseq(b, start=-3)
  subseq(b, end=-3)
  subseq(b, end=-3, width=5)

  ## Subsetting
  b2 <- b[length(b):1]       # better done with reverse(b)

  as.character(b2)

  b2 == b                    # FALSE
  b2 == as.character(b2)     # TRUE

  ## b[1:length(b)] is equal but not identical to b!
  b == b[1:length(b)]        # TRUE
  identical(b, 1:length(b))  # FALSE
  ## This is because subsetting an XString object with [ makes a copy
  ## of part or all its sequence data. Hence, for the resulting object,
  ## the internal slot containing the memory address of the sequence
  ## data differs from the original. This is enough for identical() to
  ## see the 2 objects as different.

[Package Biostrings version 2.8.18 Index]