semsim {ontoTools}R Documentation

Compute semantic similarity measure for terms in an object-ontology complex

Description

Compute semantic similarity measure for terms in an object-ontology complex

Usage

semsim(c1, c2, ooc, acc=NULL, pc=NULL)
conceptProbs(ooc,acc=NULL,inds=NULL) 
subsumers(c1, c2, ont, acc=NULL) 
pms(c1, c2, ooc, acc=NULL, pc=NULL) 
usageCount(map,acc,inds)

Arguments

c1 c1, c2: "character" terms to be compared
c2 c1, c2: "character" terms to be compared
ooc ooc: an object of class "OOC": object-ontology complex
ont ont: an object of class "ontology": annotated rooted DAG
acc acc: optional (sparse) accessibility matrix for the ontology
pc pc: optional vector of concept probabilities, if pre-computed
map map: OOmap component of an ooc
inds inds: vector of numeric indices, row indices of object-ontology map to be processed

Details

For large ontologies, computation of the term accessibility relationships and term probabilities can be costly. Once these are computed to support one semsim calculation, they should be saved. The acc and pc parameters allow use of this saved information.

Value

semsim returns the measure of semantic similarity cited by Lord et al (2003).

Author(s)

Vince Carey <stvjc@channing.harvard.edu>

References

PW Lord et al, Bioinformatics, 19(10)2003:1275

Examples

#
# we are given a graph of GOMF and the OOmap between LL and GOMF
# derived from humanLLMappings and stored as data resources in
# ontoTools -- these will have to be updated regularly
#
data(goMFgraph.1.5)
data(LL2GOMFooMap.1.5)
#
# build the rooted DAG, the ontology, and the OOC objects
#
gomfrDAG <- new("rootedDAG", root="GO:0003674", DAG=goMFgraph.1.5)
GOMFonto <- new("ontology", name="GOMF", version="bioc GO 1.5", rDAG=gomfrDAG)
LLGOMFOOC <- makeOOC(GOMFonto, LL2GOMFooMap.1.5)
#
# we are given the accessibility matrix for the GO MF graph as a 
# data resource, and we can compute some term probabilities
#
data(goMFamat.1.5)
pc <- conceptProbs(LLGOMFOOC, goMFamat.1.5, inds=1:20)
#
# now we will get a sample of GO MF terms and compute the
# semantic similarities of pairs of terms in the sample
#
data(LL2GOMFcp.1.5) # full set of precomputed concept probabilities
library(GO)
library(Biobase)
library(combinat)
GO() # get the GO environments
GOtags <- ls(env=GOTERM)
GOlabs <- multiget(GOtags, env=GOTERM)
GOMFtags <- GOtags[sapply(GOlabs,names)=="MF"]
GOMFterms <- unlist(multiget(GOMFtags,env=GOTERM))
ntags <- length(GOMFtags)
if (any(duplicated(GOMFterms)))
 {
 dups <- (1:ntags)[duplicated(GOMFterms)]
 GOMFterms[dups] <- paste(GOMFterms[dups],".2",sep="")
 }
names(GOMFterms) <- GOMFtags
set.seed(1234)
st <- sample(names(GOMFterms),size=50) # take the sample
st <- intersect(st, names(LL2GOMFcp.1.5))[1:10] # use only those terms available in GO 1.5
pst <- combn(st,2)   # get a matrix with the pairs of terms in columns
npst <- ncol(pst)
ss <- rep(NA,npst)
for (i in 1:npst)  # compute semantic similarities
  {
  cat(i)
  ss[i] <- semsim( pst[1,i], pst[2,i], ooc=LLGOMFOOC, acc=goMFamat.1.5, pc=LL2GOMFcp.1.5 )
  }
print(summary(ss))
top <- (1:npst)[ss==max(ss,na.rm=TRUE)][1]  # index of the most similar pair
             # note -- must come to an understanding of the NAs
print( GOMFterms[ as.character(pst[,top]) ] )
pen <- (1:npst)[ss==max(ss[-top],na.rm=TRUE)][1] # second most similar
print( GOMFterms[ as.character(pst[,pen]) ] )

[Package Contents]