IRanges-class {Biostrings}R Documentation

IRanges and NormalIRanges objects

Description

The IRanges class is a simple container for storing a set of integer ranges.

A NormalIRanges object is an IRanges object that is "normal". See the Normality section below for the definition and properties of normal IRanges objects.

Details

An IRanges object is a data frame-like object where each row describes a "range" of integers.

A "range" of integers is a finite set of consecutive integer values. Each range can be fully described with exactly 2 integers which can be arbitrarily picked up among the 3 following integers: its "start" i.e. its smallest (or first, or leftmost) value; its "end" i.e. its greatest (or last, or rightmost) value; and its "width" i.e. the number of values in the range. For example the set of integers that are greater than or equal to -20 and less than or equal to 400 is the range of integers that starts at -20 and has a width of 421.

The start can be any integer (see start below) but the width must be a nonnegative integer (see width below). The end of a range is its start plus its width minus one (see end below). An "empty" range is a range that contains no value i.e. a range that has a null width. Note that for an empty range, the end is smaller than the start.

Two ranges are considered equal iff they share the same start and width. Note that with this definition, 2 empty ranges are generally not equal (they need to share the same start to be considered equal).

The length of an IRanges object is the number of ranges in it i.e. the number of rows in the object.

An IRanges object is considered empty iff all its ranges are empty.

Note that it is unlikely that the user will have to create or manipulate directly an IRanges instance when using the Biostrings package. However the IRanges class being a superclass of the XStringViews class, any XStringViews object is also an IRanges object and can be manipulated as such. Therefore all the methods described here also work with an XStringViews object.

IRanges object vs data frame

An important difference with standard R data frames is that IRanges objects only support single subscript subsetting i.e. subsetting by row, whereas standard R data frames can be subsetted by row and by column. As a consequence, the length of an IRanges object is its number of rows, whereas the length of a standard R data frame object is its number of columns.

Accesor methods

In the code snippets below, x is an IRanges object.

length(x): The number of ranges in x.
start(x): The start values of the ranges. This is a vector of positive integers of the same length as x.
width(x): The number of integers in each range. This is a vector of nonnegative integers of the same length as x.
end(x): start(x) + width(x) - 1L
names(x): NULL or a character vector of the same length as x.
desc(x): desc is an alias for names.

Constructor

IRanges(start=NULL, end=NULL, width=NULL): Return the IRanges object containing the ranges specified by start, end and width. Exactly two of the start, end and width arguments must be specified as integer vectors (with no NAs) and the other argument must be NULL. If start and end are specified, then they must be vectors of the same length. If start and width (or end and width) are specified, then the length of width must be <= to the length of start and, if it is <, then width is expanded cyclically to the length of start.

Subsetting

In the code snippet below, x is an IRanges object.

x[i]: Return a new IRanges object (of the same type as x) made of the selected ranges. i can be a numeric vector, a logical vector, NULL or missing. If x is a NormalIRanges object and i a positive numeric subscript (i.e. a numeric vector of positive values), then i must be strictly increasing.

Other methods

In the code snippets below, x is an IRanges object.

isEmpty(x): Return a logical value indicating whether x is empty or not.
as.data.frame(x, row.names=NULL, optional=FALSE, ...): Converts x into a standard R data frame object. row.names must be NULL or a character vector giving the row names for the data frame, and optional and any additional argument (...) is ignored. See ?as.data.frame for more information about these arguments.
duplicated(x): Determines which elements of x are equal to elements with smaller subscripts, and returns a logical vector indicating which elements are duplicates. It is semantically equivalent to duplicated(as.data.frame(x)) (see ?duplicated for more information).
as.matrix(x, ...): Converts x into a 2-column integer matrix containing start(x) and width(x). Extra arguments (...) are ignored.

Normality

A NormalIRanges object is an IRanges object that is "normal".

An IRanges object is said to be "normal" when its ranges are: (a) not empty (i.e. they have a non-null width); (b) not overlapping; (c) ordered from left to right; (d) not even adjacent (i.e. there must be a non empty gap between 2 consecutive ranges). If x is an IRanges object with more than one element (i.e. length >= 2), then x is normal iff:

  start(x)[i] <= end(x)[i] < start(x)[i+1] <= end(x)[i+1]
for every 1 <= i < length(x). If length(x) == 1, then x is normal iff width(x)[1] >= 1. If length(x) == 0, then x is normal.

An IRanges object can be used to represent an arbitrary finite set of integers (that are not necessarily consecutive). Now the 2 most interesting properties of normal IRanges objects are that: (1) they are the "best" (in terms of storage space) IRanges objects for representing arbitrary finite sets of integers and (2) the mapping between finite sets of integers and normal IRanges objects is one-to-one. More precisely, if x is an IRanges object, then it can be seen as representing the set of integers obtained by taking the union of all its ranges. Inverserly, since any finite set of integers can be obtained by a finite union of ranges, then it can be represented by an IRanges object, but this representation is clearly not unique. However, among all the IRanges objects that represent (or map) the same finite set of integers, only one is normal, and this normal representation is minimal in terms of length (and therefore in terms of storage space).

Subsetting x is currently not supported although it could be but should only accept strictly increasing subscripts in order to preserve normality.

Use the isNormal method to check whether an IRanges object is normal or not. In the code snippet below, x is an IRanges object.

isNormal(x): Return a logical value indicating whether x is normal or not.
whichFirstNotNormal(x): Return NA if x is normal, or the smallest valid indice i in x for which x[1:i] is not normal.
max(x): (Defined for NormalIRanges objects only.) The maximum value in the finite set of integers represented by x.
min(x): (Defined for NormalIRanges objects only.) The minimum value in the finite set of integers represented by x.

Deprecated methods

first(x): deprecated. Use start instead.
last(x): deprecated. Use end instead.

Author(s)

H. Pages

See Also

IRanges-utils, XStringViews-class, as.data.frame, duplicated, as.matrix

Examples

  x <- IRanges(start=c(2:-1, 13:15), width=c(0:3, 2:0))
  x
  length(x)
  start(x)
  width(x)
  end(x)
  isEmpty(x)
  as.data.frame(x)
  as.matrix(x)

  ## Subsetting:
  x[4:2]                  # 3 ranges
  x[-1]                   # 6 ranges
  x[FALSE]                # 0 range
  x0 <- x[width(x) == 0]  # 2 ranges
  isEmpty(x0)

  ## Unlock the IRanges instance and use replacement methods to slide
  ## or resize its elements:
  x <- as(x, "UnlockedIRanges")
  width(x) <- width(x) * 2  + 1  # resize elements
  x
  start(x) <- end(x)             # slide elements
  x
  start(x)[4] <- end(x)[4]       # slide the 4th element
  x
  end(x)[1] <- start(x)[3]       # slide the first element
  x
  width(x) <- c(2, 0)            # resize elements
  x
  duplicated(x)

  ## Name the elements:
  names(x)
  names(x) <- c("range1", "range2")
  x
  x[names(x) == ""]  # 5 ranges
  x[names(x) != ""]  # 2 ranges

  ## Using an IRanges object for storing a big set of ranges is more
  ## efficient than using a standard R data frame:
  N <- 2000000L  # nb of ranges
  W <- 180L      # width of each range
  start <- 1L
  end <- 50000000L
  set.seed(777)
  range_starts <- sort(sample(end-W+1L, N))
  range_widths <- rep.int(W, N)
  ## Instantiation is faster
  system.time(x <- IRanges(start=range_starts, width=range_widths))
  system.time(y <- data.frame(start=range_starts, width=range_widths))
  ## Subsetting is faster
  system.time(x16 <- x[c(TRUE, rep.int(FALSE, 15))])
  system.time(y16 <- y[c(TRUE, rep.int(FALSE, 15)), ])
  ## Internal representation is more compact
  object.size(x16)
  object.size(y16)

  ## Normality:
  isNormal(x16)                        # FALSE
  if (interactive())
      x16 <- as(x16, "NormalIRanges")  # Error!
  whichFirstNotNormal(x16)             # 57
  isNormal(x16[1:56])                  # TRUE
  xx <- as(x16[1:56], "NormalIRanges")
  class(xx)
  max(xx)
  min(xx)

[Package Biostrings version 2.8.18 Index]