BSgenome-class {BSgenome} | R Documentation |
A container for the complete genome sequence of a given specie.
[TODO: Put some details here]
In the code snippets below,
x
is a BSgenome object.
seqnames(x)
:
Returns the index of the single sequences contained in x
.
Each single sequence is stored in a BString (or derived)
object and comes from a source file (FASTA) with a single record.
The names returned by seqnames(x)
usually reflect the names
of those source files but a common prefix or suffix was eventually
removed in order to keep them as short as possible.
mseqnames(x)
:
Returns the index of the multiple sequences contained in x
.
Each multiple sequence is stored in a BStringViews
object and comes from a source file (FASTA) with multiple records.
The names returned by mseqnames(x)
usually reflect the names
of those source files but a common prefix or suffix was eventually
removed in order to keep them as short as possible.
names(x)
:
Returns the index of all sequences contained in x
.
This is the same as c(seqnames(x), mseqnames(x))
.
In the code snippets below,
x
is a BSgenome object
and name
is the name of a sequence (character-string).
length(x)
:
Returns the length of x
, i.e., the number of all sequences
that it contains. This is the same as length(names(x))
.
x[[name]]
:
[TODO: Document me]
x$name
:
[TODO: Document me]
In the code snippets below,
x
is a BSgenome object
and name
is the name of a sequence (character-string).
unload(x, name)
:
[TODO: Document me]
H. Pages
available.genomes
,
BString,
DNAString,
BStringViews,
getSeq
,
matchPattern
,
rm
,
gc
library(BSgenome.Celegans.UCSC.ce2) # This doesn't load the chromosome # sequences into memory. length(Celegans) # Number of sequences in this genome. Celegans # Displays index of all the sequences # in this genome. mem0 <- gc()["Vcells", "(Mb)"] # Current amount of data in memory (in # Mb). Celegans[["chrV"]] # Loads chromosome V into memory (hence # takes a long time). gc()["Vcells", "(Mb)"] - mem0 # Chromosome V occupies 20Mb of memory. Celegans[["chrV"]] # Much faster (sequence is already in # memory, hence it's not loaded again). Celegans$chrV # Equivalent to Celegans[["chrV"]]. class(Celegans$chrV) # Chromosome V (like any other # chromosome sequence) is a DNAString # object. nchar(Celegans$chrV) # Its has 20922231 letters (nucleotides). x <- Celegans$chrV # Very fast because a BString object # doesn't contain the sequence, only a # pointer to the sequence, hence chrV # seq is not duplicated in memory. But # we now have 2 objects pointing to the # same place in memory. y <- substr(x, 10, 100) # A 3rd object pointing to chrV seq. ## We must remove all references to chrV seq if we want the 20Mb of memory ## used by it to be freed (note that it can be hard to keep track of all the ## references to a given sequence). ## IMPORTANT: The 1st reference to this seq (Celegans$chrV) should be removed ## last. This is achieved with unload(). All other references are removed by ## just removing the referencing object. rm(x) rm(y) unload(Celegans, "chrV") gc()["Vcells", "(Mb)"]