| Type: | Package |
| Title: | File-Backed Matrix Class with Convenient Read and Write Access |
| Version: | 1.3 |
| Date: | 2018-02-26 |
| Description: | Interface for working with large matrices stored in files, not in computer memory. Supports multiple non-character data types (double, integer, logical and raw) of various sizes (e.g. 8 and 4 byte real values). Access to parts of the matrix is done by indexing, exactly as with usual R matrices. Supports very large matrices. Tested on multi-terabyte matrices. Allows for more than 2^32 rows or columns. Allows for quick addition of extra columns to a filematrix. Cross-platform as the package has R code only. |
| BugReports: | https://github.com/andreyshabalin/filematrix/issues |
| URL: | https://github.com/andreyshabalin/filematrix |
| License: | LGPL-3 |
| Depends: | methods, utils |
| VignetteBuilder: | knitr |
| Suggests: | knitr, rmarkdown, RSQLite |
| NeedsCompilation: | no |
| Packaged: | 2018-02-27 06:24:35 UTC; Andrey |
| Author: | Andrey A Shabalin |
| Maintainer: | Andrey A Shabalin <andrey.shabalin@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2018-02-27 16:38:01 UTC |
File-backed numeric matrix.
Description
File-Backed Matrix Class with Convenient Read and Write Access
Details
Interface for working with large matrices stored in files, not in computer memory. Supports multiple non-character data types (double, integer, logical and raw) of various sizes (e.g. 8 and 4 byte real values). Access to parts of the matrix is done by indexing (e.g. fm[,1]), exactly as with usual R matrices. Supports very large matrices. Tested on multi-terabyte matrices. Allows for more than 2^32 rows or columns. Allows for quick addition of extra columns to a filematrix. Cross-platform as the package has R code only.
A new file.matrix object can be created with fm.create
and fm.create.from.matrix. Existing file.matrix
files can be opened with fm.open.
Once a file.matrix is created or opened it can be accessed
as a regular matrix object in R.
All changes to file.matrix object are written to the data files
without extra buffering.
Note
Due to lack of 64 bit integer data type in R, the package uses double values for calculation of indices. The precision of double data type is sufficient for indexing matrices up to 8,192 terabytes in size.
Author(s)
Andrey A Shabalin andrey.shabalin@gmail.com
See Also
See fm.create and filematrix
for reference.
Run browseVignettes("filematrix") for the list of vignettes.
Manipulating file matrices (class "filematrix")
Description
filematrix is a class for working with very large matrices
stored in files, not held in computer memory.
It is intended as a simple, efficient solution to handling big numeric data
(i.e., datasets larger than memory capacity) in R.
A new filematrix can be created with fm.create.
It can be created from an existing R matrix
with fm.create.from.matrix.
A text file with a matrix can be scanned and converted into a filematrix
with fm.create.from.text.file.
An existing filematrix can be opened for read/write access
with fm.open or loaded fully in memory
with fm.load.
A filematrix can be handled as an ordinary matrix in R.
It can be read from and written to via usual indexing
with possible omission of indices.
For example: fm[1:3,2:4] and fm[,2:4].
The values can also be accessed as a vector
with single indexing.
For example: fm[3:7] and fm[4:7] = 1:4.
A whole filematrix can be read memory as an ordinary R matrix
with as.matrix function or empty indexing fm[].
The dimensions of filematrix can be obtained via dim,
nrow and ncol functions and
modified with dim function.
For example: dim(fm) and dim(fm) = c(10,100).
The number of elements in filematrix is returned by length function.
A filematrix can have row and column names.
They can be accessed using the standard functions
rownames, colnames, and dimnames.
A filematrix can be closed after use with close command.
Note, however, that there is no risk of losing modifications
to a filematrix if an object is not closed,
as all changes are written to disk without delay.
Usage
## S3 method for class 'filematrix'
x[i,j]
## S3 replacement method for class 'filematrix'
x[i,j] <- value
## S4 method for signature 'filematrix'
as.matrix(x)
## S4 method for signature 'filematrix'
dim(x)
## S4 replacement method for signature 'filematrix'
dim(x) <- value
## S4 method for signature 'filematrix'
length(x)
## S4 method for signature 'filematrix'
rownames(x)
## S4 replacement method for signature 'filematrix'
rownames(x) <- value
## S4 method for signature 'filematrix'
colnames(x)
## S4 replacement method for signature 'filematrix'
colnames(x) <- value
## S4 method for signature 'filematrix'
dimnames(x)
## S4 replacement method for signature 'filematrix'
dimnames(x) <- value
Arguments
x |
A filematrix object ( |
i, j |
Row/column indices specifying elements to extract or replace. |
value |
A new value to replace the indexed element(s). |
Value
length function returns the number of elements in the filematrix.
Functions colnames, rownames, and dimnames return
the same values as their counterparts for the regular R matrices.
Methods
isOpen-
Returns
TRUEis the filematrix is open. readAll():Return the whole matrix.
Same asfm[]oras.matrix(fm)writeAll(value):-
Fill in the whole matrix.
Same asfm[] = value readSubCol(i, j, num):-
Read
numvalues in columnjstarting with rowi.
Same asfm[i:(i+num-1), j] writeSubCol(i, j, value):-
Write values in the column
jstarting with rowi.
Same asfm[i:(i+length(value)-1), j] = value readCols(start, num):-
Read
numcolumns starting with columnstart.
Same asfm[, start:(start+num-1)] writeCols(start, value):-
Write columns starting with column
start.
Same asfm[, start:(start+ncol(value)-1)] = value readSeq(start, len):-
Read
lenvalues from the matrix starting withstart-th value.
Same asfm[start:(start+len-1)] writeSeq(start, value):-
Write values in the matrix starting with
start-th value.
Same asfm[start:(start+length(value)-1)] = value appendColumns(mat)-
Increases filematrix by adding columns to the right side of the matrix. Matrix
matmust have the same number of rows.
Same asfm = cbind(fm, mat)for ordinary matrices.
Author(s)
Andrey A Shabalin andrey.shabalin@gmail.com
See Also
For function creating and opening file matrices see
fm.create.
Run browseVignettes("filematrix") for the list of vignettes.
Functions to Create a New, or Open an Existing Filematrix
Description
Create a new or open existing filematrix object.
fm.create creates a new filematrix.
If a filematrix with this name exists, it is overwritten (destroyed).
fm.create.from.matrix creates a new filematrix copy of
an existing R matrix.
fm.open opens an existing filematrix for read/write access.
fm.load loads entire existing filematrix
into memory as an ordinary R matrix.
fm.create.from.text.file reads a matrix from a text file
into a new filematrix.
The rows in the text file become columns in the filematrix.
The transposition happens because the text files stores data by rows and
filematrices store data by columns.
Usage
fm.create(
filenamebase,
nrow = 0,
ncol = 1,
type = "double",
size = NULL,
lockfile = NULL)
fm.create.from.matrix(
filenamebase,
mat,
size = NULL,
lockfile = NULL)
fm.open(
filenamebase,
readonly = FALSE,
lockfile = NULL)
fm.load(filenamebase, lockfile = NULL)
fm.create.from.text.file(
textfilename,
filenamebase,
skipRows = 1,
skipColumns = 1,
sliceSize = 1000,
omitCharacters = "NA",
delimiter = "\t",
rowNamesColumn = 1,
type = "double",
size = NULL)
## S4 method for signature 'filematrix'
close(con)
closeAndDeleteFiles(con)
Arguments
filenamebase |
Name without extension for the files storing the filematrix. |
nrow |
Number of rows in the matrix. Values over 2^32 are supported. |
ncol |
Number of columns in the matrix. Values over 2^32 are supported. |
type |
The type of values stored in the matrix.
Can be either |
size |
Size of each item of the matrix in bytes. |
mat |
Regular R matrix, to be copied into a new filematrix. |
readonly |
If |
textfilename |
Name of the text file with matrix data, to be copied into a new filematrix. |
skipRows |
Number of rows with column names.
The matrix values are expected after first |
skipColumns |
Number of columns before matrix values begin. Can be zero. |
sliceSize |
The text file with matrix is read in chuncks of |
omitCharacters |
The text string representing missing values.
Default value is |
delimiter |
The delimiter separating values in the text matrix file. |
rowNamesColumn |
The row names are taken from the |
con |
A filematrix object. |
lockfile |
Optional. Name of a lock file (file is overwritten). Used to avoid simultaneous operations by multiple R instances accessing the same filematrix or different filematrices on the same hard drive. Do not use if not sure. |
Details
Once created or opened, a filematrix object can be accessed
as an ordinary matrix using both matrix fm[,] and
vector fm[] indexing.
The indices can be integer (no zeros) or logical vectors.
Value
Returns a filematrix object.
The object can be closed with close command or
closed and deleted from disk with closeAndDeleteFiles command.
Author(s)
Andrey A Shabalin andrey.shabalin@gmail.com
See Also
For more on the use of filematrices see filematrix.
Run browseVignettes("filematrix") for the list of vignettes.
Examples
# Create a 10x10 matrix
fm = fm.create(filenamebase=tempfile(), nrow=10, ncol=10)
# Change values in the top 3x3 corner
fm[1:3,1:3] = 1:9
# View the values in the top 4x4 corner
fm[1:4,1:4]
# Close and delete the filematrix
closeAndDeleteFiles(fm)