annotateTissueDisease {Onassis}R Documentation

annotateTissueDisease

Description

annotateTissueDisease is a function to automatize the annotation process of tissues and diseases. It connects to the GEOmetadb through the geo_metadb_path parameter to retrieve the metadata of the samples provided in the gsm_list parameter. A dictionary for tissues/cell lines is built from the tissue_obo file provided as parameter. All the samples' metadata are annotated with tissue concepts from the tissue_obo and samples are clustered based on the semantic similarity of the defined semantic annotation sets. To reduce the number of tissue semantic sets, similar semantic sets are merged based on a givin semantic similarity threshold provided in the height_threshold parameter. Within each semantic set, samples are annotated with disease concepts from the dictionary obtained from disease_obo parameter. For each disease the function retrieves the columns of the score_matrix to organize them in a list with tissues, diseases and scores.

Usage

annotateTissueDisease(geo_metadb_path, gsm_list, tissue_obo, disease_obo,
  outdir, height_threshold, score_matrix)

Arguments

geo_metadb_path

The full path of the directory where the GEOmetadb.sqlite file is stored

gsm_list

A list of GEO sample ids (GSM)s to annotate with tissue and disease concepts

tissue_obo

The obo ontology containing concepts to identify tissues/cell lines

disease_obo

The obo ontology containing concepts to identify diseases

outdir

The directory where the results will be stored

height_threshold

The percentage of clusters to merge based on the height of the dendrogram produced by the hclust method. Height_threshold is defined in the range [0, 1].

score_matrix

A matrix where rows represent units (GRanges or genes) and columns represent GSMs.

Value

A list of the tissue semantic sets defined by Onassis. For each tissue, a list of diseases and for each disease the columns of the score_matrix that were annotated with a given tissue and a given disease

Examples

if(!file.exists(file.path(getwd(), 'GEOmetadb.sqlite'))){
     message('To run this example please copy GEOmetadb.sqlite in your current working directory')
     } else{
     geo_metadb_path <- getwd()
     score_matrix <- readRDS(system.file('extdata', 'score_matrix.rds', package='Onassis'))
     gsm_list <- colnames(score_matrix)
     tissue_obo <- system.file('extdata', 'sample.cs.obo', package='OnassisJavaLibs')
     disease_obo <- system.file('extdata', 'sample.do.obo', package='OnassisJavaLibs')
     outdir = getwd()
     height_threshold <- 0.4
     result_list <- annotateTissueDisease(geo_metadb_path, gsm_list, tissue_obo, disease_obo, outdir, height_threshold, score_matrix)
     }

[Package Onassis version 1.2.7 Index]