IRanges-class {Biostrings} | R Documentation |
The IRanges class is a simple container for storing a set of integer ranges.
A NormalIRanges object is an IRanges object that is "normal". See the Normality section below for the definition and properties of normal IRanges objects.
An IRanges object is a data frame-like object where each row describes a "range" of integers.
A "range" of integers is a finite set of consecutive integer values. Each range can be fully described with exactly 2 integers which can be arbitrarily picked up among the 3 following integers: its "start" i.e. its smallest (or first, or leftmost) value; its "end" i.e. its greatest (or last, or rightmost) value; and its "width" i.e. the number of values in the range. For example the set of integers that are greater than or equal to -20 and less than or equal to 400 is the range of integers that starts at -20 and has a width of 421.
The start can be any integer (see start
below) but the
width must be a nonnegative integer (see width
below).
The end of a range is its start plus its width minus one (see
end
below).
An "empty" range is a range that contains no value i.e. a range that
has a null width.
Note that for an empty range, the end is smaller than the start.
Two ranges are considered equal iff they share the same start and width. Note that with this definition, 2 empty ranges are generally not equal (they need to share the same start to be considered equal).
The length of an IRanges object is the number of ranges in it i.e. the number of rows in the object.
An IRanges object is considered empty iff all its ranges are empty.
Note that it is unlikely that the user will have to create or manipulate directly an IRanges instance when using the Biostrings package. However the IRanges class being a superclass of the XStringViews class, any XStringViews object is also an IRanges object and can be manipulated as such. Therefore all the methods described here also work with an XStringViews object.
An important difference with standard R data frames is that IRanges objects only support single subscript subsetting i.e. subsetting by row, whereas standard R data frames can be subsetted by row and by column. As a consequence, the length of an IRanges object is its number of rows, whereas the length of a standard R data frame object is its number of columns.
In the code snippets below, x
is an IRanges object.
length(x)
:
The number of ranges in x
.
start(x)
:
The start values of the ranges.
This is a vector of positive integers of the same length as x
.
width(x)
:
The number of integers in each range.
This is a vector of nonnegative integers of the same length as x
.
end(x)
:
start(x) + width(x) - 1L
names(x)
:
NULL
or a character vector of the same length as x
.
desc(x)
:
desc
is an alias for names
.
IRanges(start=NULL, end=NULL, width=NULL)
:
Return the IRanges object containing the ranges specified by start
,
end
and width
.
Exactly two of the start
, end
and width
arguments must be specified as integer vectors (with no NA
s)
and the other argument must be NULL
.
If start
and end
are specified, then they must be
vectors of the same length.
If start
and width
(or end
and width
)
are specified, then the length of width
must be <= to the
length of start
and, if it is <, then width
is expanded
cyclically to the length of start
.
In the code snippet below, x
is an IRanges object.
x[i]
:
Return a new IRanges object (of the same type as x
)
made of the selected ranges.
i
can be a numeric vector, a logical vector, NULL
or missing. If x
is a NormalIRanges object and i
a positive numeric subscript (i.e. a numeric vector of positive
values), then i
must be strictly increasing.
In the code snippets below, x
is an IRanges object.
isEmpty(x)
:
Return a logical value indicating whether x
is empty or not.
as.data.frame(x, row.names=NULL, optional=FALSE, ...)
:
Converts x
into a standard R data frame object.
row.names
must be NULL
or a character vector giving
the row names for the data frame, and optional
and any
additional argument (...
) is ignored.
See ?as.data.frame
for more information about these
arguments.
duplicated(x)
:
Determines which elements of x
are equal to elements
with smaller subscripts, and returns a logical vector indicating
which elements are duplicates.
It is semantically equivalent to duplicated(as.data.frame(x))
(see ?duplicated
for more information).
as.matrix(x, ...)
:
Converts x
into a 2-column integer matrix containing
start(x)
and width(x)
.
Extra arguments (...
) are ignored.
A NormalIRanges object is an IRanges object that is "normal".
An IRanges object is said to be "normal" when its ranges are:
(a) not empty (i.e. they have a non-null width);
(b) not overlapping;
(c) ordered from left to right;
(d) not even adjacent (i.e. there must be a non empty gap between 2
consecutive ranges).
If x
is an IRanges object with more than one element (i.e.
length >= 2
), then x
is normal iff:
start(x)[i] <= end(x)[i] < start(x)[i+1] <= end(x)[i+1]for every 1 <=
i
< length(x)
.
If length(x) == 1
, then x
is normal iff width(x)[1] >= 1
.
If length(x) == 0
, then x
is normal.
An IRanges object can be used to represent an arbitrary finite set of
integers (that are not necessarily consecutive).
Now the 2 most interesting properties of normal IRanges objects are that:
(1) they are the "best" (in terms of storage space) IRanges objects for
representing arbitrary finite sets of integers and (2) the mapping between
finite sets of integers and normal IRanges objects is one-to-one.
More precisely, if x
is an IRanges object, then it can be seen
as representing the set of integers obtained by taking the union of
all its ranges.
Inverserly, since any finite set of integers can be obtained by a finite
union of ranges, then it can be represented by an IRanges object, but this
representation is clearly not unique.
However, among all the IRanges objects that represent (or map) the
same finite set of integers, only one is normal, and this normal
representation is minimal in terms of length (and therefore in terms
of storage space).
Subsetting x
is currently not supported although it could be but
should only accept strictly increasing subscripts in order to preserve
normality.
Use the isNormal
method to check whether an IRanges object is
normal or not. In the code snippet below, x
is an IRanges object.
isNormal(x)
:
Return a logical value indicating whether x
is normal or not.
whichFirstNotNormal(x)
:
Return NA
if x
is normal, or the smallest valid indice
i
in x
for which x[1:i]
is not normal.
max(x)
:
(Defined for NormalIRanges objects only.)
The maximum value in the finite set of integers represented by x
.
min(x)
:
(Defined for NormalIRanges objects only.)
The minimum value in the finite set of integers represented by x
.
first(x)
:
deprecated. Use start
instead.
last(x)
:
deprecated. Use end
instead.
H. Pages
IRanges-utils,
XStringViews-class,
as.data.frame
,
duplicated
,
as.matrix
x <- IRanges(start=c(2:-1, 13:15), width=c(0:3, 2:0)) x length(x) start(x) width(x) end(x) isEmpty(x) as.data.frame(x) as.matrix(x) ## Subsetting: x[4:2] # 3 ranges x[-1] # 6 ranges x[FALSE] # 0 range x0 <- x[width(x) == 0] # 2 ranges isEmpty(x0) ## Unlock the IRanges instance and use replacement methods to slide ## or resize its elements: x <- as(x, "UnlockedIRanges") width(x) <- width(x) * 2 + 1 # resize elements x start(x) <- end(x) # slide elements x start(x)[4] <- end(x)[4] # slide the 4th element x end(x)[1] <- start(x)[3] # slide the first element x width(x) <- c(2, 0) # resize elements x duplicated(x) ## Name the elements: names(x) names(x) <- c("range1", "range2") x x[names(x) == ""] # 5 ranges x[names(x) != ""] # 2 ranges ## Using an IRanges object for storing a big set of ranges is more ## efficient than using a standard R data frame: N <- 2000000L # nb of ranges W <- 180L # width of each range start <- 1L end <- 50000000L set.seed(777) range_starts <- sort(sample(end-W+1L, N)) range_widths <- rep.int(W, N) ## Instantiation is faster system.time(x <- IRanges(start=range_starts, width=range_widths)) system.time(y <- data.frame(start=range_starts, width=range_widths)) ## Subsetting is faster system.time(x16 <- x[c(TRUE, rep.int(FALSE, 15))]) system.time(y16 <- y[c(TRUE, rep.int(FALSE, 15)), ]) ## Internal representation is more compact object.size(x16) object.size(y16) ## Normality: isNormal(x16) # FALSE if (interactive()) x16 <- as(x16, "NormalIRanges") # Error! whichFirstNotNormal(x16) # 57 isNormal(x16[1:56]) # TRUE xx <- as(x16[1:56], "NormalIRanges") class(xx) max(xx) min(xx)