BSgenome-class {BSgenome}R Documentation

The BSgenome class

Description

A container for the complete genome sequence of a given specie.

Details

[TODO: Put some details here]

Accesor methods

In the code snippets below, x is a BSgenome object.

seqnames(x): Returns the index of the single sequences contained in x. Each single sequence is stored in a BString (or derived) object and comes from a source file (FASTA) with a single record. The names returned by seqnames(x) usually reflect the names of those source files but a common prefix or suffix was eventually removed in order to keep them as short as possible.
mseqnames(x): Returns the index of the multiple sequences contained in x. Each multiple sequence is stored in a BStringViews object and comes from a source file (FASTA) with multiple records. The names returned by mseqnames(x) usually reflect the names of those source files but a common prefix or suffix was eventually removed in order to keep them as short as possible.
names(x): Returns the index of all sequences contained in x. This is the same as c(seqnames(x), mseqnames(x)).

Standard generic methods

In the code snippets below, x is a BSgenome object and name is the name of a sequence (character-string).

length(x): Returns the length of x, i.e., the number of all sequences that it contains. This is the same as length(names(x)).
x[[name]]: [TODO: Document me]
x$name: [TODO: Document me]

Other functions and generics

In the code snippets below, x is a BSgenome object and name is the name of a sequence (character-string).

unload(x, name): [TODO: Document me]

Author(s)

H. Pages

See Also

available.genomes, BString, DNAString, BStringViews, getSeq, matchPattern, rm, gc

Examples

  library(BSgenome.Celegans.UCSC.ce2)   # This doesn't load the chromosome 
                                        # sequences into memory.
  length(Celegans)                      # Number of sequences in this genome.
  Celegans                              # Displays index of all the sequences
                                        # in this genome.
  mem0 <- gc()["Vcells", "(Mb)"]        # Current amount of data in memory (in
                                        # Mb).
  Celegans[["chrV"]]                    # Loads chromosome V into memory (hence
                                        # takes a long time).
  gc()["Vcells", "(Mb)"] - mem0         # Chromosome V occupies 20Mb of memory.
  Celegans[["chrV"]]                    # Much faster (sequence is already in
                                        # memory, hence it's not loaded again).
  Celegans$chrV                         # Equivalent to Celegans[["chrV"]].
  class(Celegans$chrV)                  # Chromosome V (like any other
                                        # chromosome sequence) is a DNAString
                                        # object.
  nchar(Celegans$chrV)                  # Its has 20922231 letters (nucleotides).
  x <- Celegans$chrV                    # Very fast because a BString object
                                        # doesn't contain the sequence, only a
                                        # pointer to the sequence, hence chrV
                                        # seq is not duplicated in memory. But
                                        # we now have 2 objects pointing to the
                                        # same place in memory.
  y <- substr(x, 10, 100)               # A 3rd object pointing to chrV seq.
  
  ## We must remove all references to chrV seq if we want the 20Mb of memory
  ## used by it to be freed (note that it can be hard to keep track of all the
  ## references to a given sequence).
  ## IMPORTANT: The 1st reference to this seq (Celegans$chrV) should be removed
  ## last. This is achieved with unload(). All other references are removed by
  ## just removing the referencing object.
  rm(x)
  rm(y)
  unload(Celegans, "chrV")
  gc()["Vcells", "(Mb)"]

[Package BSgenome version 1.4.1 Index]