kallisto-wrapper {scater} | R Documentation |
Run the abundance quantification tool kallisto
on a set of FASTQ
files. Requires kallisto
(http://pachterlab.github.io/kallisto/)
to be installed and a kallisto feature index must have been generated prior
to using this function. See the kallisto website for installation and basic
usage instructions.
Read kallisto results for a single sample into a list
After generating transcript/feature abundance results using kallisto for a
batch of samples, read these abundance values into a
SingleCellExperiment
object.
runKallisto(targets_file, transcript_index, single_end = TRUE, output_prefix = "output", fragment_length = NULL, fragment_standard_deviation = NULL, n_cores = 2, n_bootstrap_samples = 0, bootstrap_seed = NULL, correct_bias = TRUE, plaintext = FALSE, kallisto_version = "current", verbose = TRUE, dry_run = FALSE, kallisto_cmd = "kallisto") readKallistoResultsOneSample(directory, read_h5 = FALSE, kallisto_version = "current") readKallistoResults(kallisto_log = NULL, samples = NULL, directories = NULL, read_h5 = FALSE, kallisto_version = "current", verbose = TRUE)
targets_file |
character string giving the path to a tab-delimited text file with either 2 columns (single-end reads) or 3 columns (paired-end reads) that gives the sample names (first column) and FastQ file names (column 2 and if applicable 3). The file is assumed to have column headers, although these are not used. |
transcript_index |
character string giving the path to the kallisto index to be used for the feature abundance quantification. |
single_end |
logical, are single-end reads used, or paired-end reads? |
output_prefix |
character string giving the prefix for the output folder
that will contain the kallisto results. The default is |
fragment_length |
scalar integer or numeric giving the estimated
average fragment length. Required argument if |
fragment_standard_deviation |
scalar numeric giving the estimated
standard deviation of read fragment length. Required argument if
|
n_cores |
integer giving the number of cores (nodes/threads) to use for
the kallisto jobs. The package |
n_bootstrap_samples |
integer giving the number of bootstrap samples that kallisto should use (default is 0). With bootstrap samples, uncertainty in abundance can be quantified. |
bootstrap_seed |
scalar integer or numeric giving the seed to use for the bootstrap sampling (default used by kallisto is 42). Optional argument. |
correct_bias |
logical, should kallisto's option to model and correct abundances for sequence specific bias? Requires kallisto version 0.42.2 or higher. |
plaintext |
logical, if |
kallisto_version |
character string indicating whether or not the
version of kallisto to be used is |
verbose |
logical, should timings for the run be printed? |
dry_run |
logical, if |
kallisto_cmd |
(optional) string giving full command to use to call kallisto, if simply typing "kallisto" at the command line does not give the required version of kallisto or does not work. Default is simply "kalliso". If used, this argument should give the full path to the desired kallisto binary. |
directory |
character string giving the path to the directory containing the kallisto results for the sample. |
read_h5 |
logical, if |
kallisto_log |
list, generated by |
samples |
character vector providing a set of sample names to use for the abundance results. |
directories |
character vector providing a set of directories containing kallisto abundance results to be read in. |
A kallisto transcript index can be built from a FASTA file:
kallisto index [arguments] FASTA-file
. See the kallisto documentation
for further details.
The directory is expected to contain results for just a single
sample. Putting more than one sample's results in the directory will result
in unpredictable behaviour with this function. The function looks for the
files (with the default names given by kallisto) 'abundance.txt',
'run_info.json' and (if read_h5=TRUE
) 'abundance/h5'. If these files
are missing, or if results files have different names, then this function
will not find them.
This function expects to find only one set of kallisto abundance results per directory; multiple adundance results in a given directory will be problematic.
A list containing three elements for each sample for which feature
abundance has been quantified: (1) kallisto_call
, the call used for
kallisto, (2) kallisto_log
the log generated by kallisto, and (3)
output_dir
the directory in which the kallisto results can be found.
A list with two elements: (1) a data.frame abundance
with
columns for 'target_id' (feature, transcript, gene etc), 'length' (feature
length), 'eff_length' (effective feature length), 'est_counts' (estimated
feature counts), 'tpm' (transcripts per million) and possibly many columns
containing bootstrap estimated counts; and (2) a list run_info
with
details about the kallisto run that generated the results.
a SingleCellExperiment object
## Not run: ## If in kallisto's 'test' directory, then try these calls: ## Generate 'targets.txt' file: write.table(data.frame(Sample="sample1", File1="reads_1.fastq.gz", File2="reads_1.fastq.gz"), file="targets.txt", quote=FALSE, row.names=FALSE, sep="\t") kallisto_log <- runKallisto("targets.txt", "transcripts.idx", single_end=FALSE, output_prefix="output", verbose=TRUE, n_bootstrap_samples=10, dry_run = FALSE) ## End(Not run) # If kallisto results are in the directory "output", then call: # readKallistoResultsOneSample("output") ## Not run: kallisto_log <- runKallisto("targets.txt", "transcripts.idx", single_end=FALSE, output_prefix="output", verbose=TRUE, n_bootstrap_samples=10) sceset <- readKallistoResults(kallisto_log) ## End(Not run)