annotateTissueDisease {Onassis} | R Documentation |
annotateTissueDisease
annotateTissueDisease is a function to automatize the annotation process of tissues and diseases. It connects to the GEOmetadb through the geo_metadb_path
parameter to retrieve the metadata of the samples provided in the gsm_list
parameter.
A dictionary for tissues/cell lines is built from the tissue_obo
file provided as parameter. All the samples' metadata are annotated with tissue concepts from the tissue_obo and samples are clustered based on the semantic similarity of the defined semantic annotation sets.
To reduce the number of tissue semantic sets, similar semantic sets are merged based on a givin semantic similarity threshold provided in the height_threshold
parameter. Within each semantic set, samples are annotated with disease concepts from the dictionary obtained from disease_obo
parameter. For each disease the function retrieves the columns of the score_matrix
to organize them in a list with tissues, diseases and scores.
annotateTissueDisease(geo_metadb_path, gsm_list, tissue_obo, disease_obo, outdir, height_threshold, score_matrix)
geo_metadb_path |
The full path of the directory where the GEOmetadb.sqlite file is stored |
gsm_list |
A list of GEO sample ids (GSM)s to annotate with tissue and disease concepts |
tissue_obo |
The obo ontology containing concepts to identify tissues/cell lines |
disease_obo |
The obo ontology containing concepts to identify diseases |
outdir |
The directory where the results will be stored |
height_threshold |
The percentage of clusters to merge based on the height of the dendrogram produced by the hclust method. Height_threshold is defined in the range [0, 1]. |
score_matrix |
A matrix where rows represent units (GRanges or genes) and columns represent GSMs. |
A list of the tissue semantic sets defined by Onassis. For each tissue, a list of diseases and for each disease the columns of the score_matrix
that were annotated with a given tissue and a given disease
if(!file.exists(file.path(getwd(), 'GEOmetadb.sqlite'))){ message('To run this example please copy GEOmetadb.sqlite in your current working directory') } else{ geo_metadb_path <- getwd() score_matrix <- readRDS(system.file('extdata', 'score_matrix.rds', package='Onassis')) gsm_list <- colnames(score_matrix) tissue_obo <- system.file('extdata', 'sample.cs.obo', package='OnassisJavaLibs') disease_obo <- system.file('extdata', 'sample.do.obo', package='OnassisJavaLibs') outdir = getwd() height_threshold <- 0.4 result_list <- annotateTissueDisease(geo_metadb_path, gsm_list, tissue_obo, disease_obo, outdir, height_threshold, score_matrix) }