countShiftReads {RiboProfiling} | R Documentation |
Apply an offset on the read start along the transcript and returns the coverage on the 5pUTR, CDS, 3pUTR, as well as a matrix of codon coverage per ORF.
countShiftReads(exonGRanges, cdsPosTransc, alnGRanges, shiftValue, motifSize)
exonGRanges |
a GRangesList. It contains the exon coordinates grouped by transcript. |
cdsPosTransc |
a list. It contains the relative positions of the start and end of the ORFs. The transcript names in exonGRanges and cdsPosTransc should be the same. |
alnGRanges |
A GRanges object containing the alignment information. In order to improve the performance the GAlignments BAM object should be transformed into a GRanges object with cigar match size metadata. |
shiftValue |
integer. The offset for recalibrating reads on transcripts when computing coverage. The default value for this parameter is 0, no offset should be performed. |
motifSize |
an integer. The number of nucleotides in each motif on which to compute coverage and usage. Either 3, 6, or 9. Default 3 nucleotides (codon). |
a list with 2 objects. The first object in the list is a data.frame containing: information on ORFs (names, chromosomal position, length) as well as the counts on the 5pUTR, CDS and 3pUTR once the offset is applied. The second object in the list is a list in itself. It contains: for each ORF in the cdsPosTransc, for each codon the sum of read starts covering the 3 codon nucleotides. For motifs of size 6 nucleotides, the motif coverage is computed only for the first codon in the motif, considered as the codon in the P-site. For motifs of size 9 nucleotides, the motif coverage is computed only for the second codon in the motif, considered as the codon in the P-site. This per codon coverage does not contain information on the codon type, just its position in the ORF and its coverage.
#read the BAM file into a GAlignments object using #GenomicAlignments::readGAlignments #the GAlignments object should be similar to ctrlGAlignments data(ctrlGAlignments) aln <- ctrlGAlignments #transform the GAlignments object into a GRanges object (faster processing) alnGRanges <- readsToStartOrEnd(aln, what="start") #make a txdb object containing the annotations for the specified species. #In this case hg19. txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene #Please make sure that seqnames of txdb correspond to #the seqnames of the alignment files ("chr" particle) #if not rename the txdb seqlevels #renameSeqlevels(txdb, sub("chr", "", seqlevels(txdb))) #get all CDSs by transcript cds <- GenomicFeatures::cdsBy(txdb, by="tx", use.names=TRUE) #get all exons by transcript exonGRanges <- GenomicFeatures::exonsBy(txdb, by="tx", use.names=TRUE) #get the per transcript relative position of start and end codons #cdsPosTransc <- orfRelativePos(cds, exonGRanges) data(cdsPosTransc) #compute the counts on the different features after applying #the specified shift value on the read start along the transcript countsData <- countShiftReads(exonGRanges[names(cdsPosTransc)], cdsPosTransc, alnGRanges, -14)