GOXMLParser {AnnBuilder} | R Documentation |
These functions are used by GO-class
to read/parse the
Gene Ontology data file (in XML formate) and figures out the
parent-child relations.
GOXMLParser(fileName) getChildNodes(goid, goData) getOffspringNodes(goid, goData, keepTree = FALSE) getParentNodes(goid, goData, sep = ";") getAncestors(goid, goData, sep = ";", keepTree = FALSE, top = "GO:0003673") getTopGOid(what = c("MF", "BP", "CC", "GO")) mapGO2Category(goData) getGOGroupIDs(onto = FALSE) mapGO2AllProbe(go2Probe, goData, goid = "", sep = ";", all = TRUE)
fileName |
fileName a character string for the name of the
file of Gene Ontology xml data that are stored locally |
goData |
goData a matrix with three columns for GO ids,
parent GO ids, and the ontology terms |
goid |
goid a character string for the id of Gene Ontology
term (e.g. GO:006742) |
keepTree |
keepTree a boolean indicating whether the tree
structure showing parent-child relations will be preserved |
sep |
sep a character string for separator used to
separate multiple entries |
top |
top a character string for the GO id that is the
root for all the other GO ids along parent-child relation tree |
what |
what a character string that has to be one of "mf",
"bp", "cc", "go" |
onto |
onto a boolean that is set to TRUE if the GO id for
the topmost node is to be returned or FALSE if the GO ids for the
three categories (BP, MF, and CC) to be returned |
go2Probe |
go2Probe a matrix that maps GO ids to probe
ids |
all |
all a boolean to indicate whether to map all the GO
ids contained in goData to probe ids (TRUE) or just the GO ids
specified by goid (FALSE) |
The GO site provides an XML document for the molecular function,
biological process, and cellular component of genes. The basic XML
structure is something like:
<go:term>
<go:accession>GO:000xxx</go:accession>
<go:name>a string for the function, process, or component</go:name>
<go:isa rdf:resource="http://www.geneontology.org/go#GO:000xxxx" />
<go:part-of:resource="http://www.geneontology.org/go#GO:000xxxx" />
.
.
</go:term>
The XML document read from Gene Ontology site does not differentiate
among the molecular function,biological process, and cellular component
of genes as a go:name tag is used for the function, process, and
component of genes. To determine whether a go:name tag is for the
function, process, or component of a given gene identified by a GO
accession number, the go:isa or go:part-of tag that keep reference of
the parent-child relationship have to be retained for later use to
move up a tree to find the correct category. As the result, the matrix
returned by GOXMLParser
has columns for
the GOids, the GO ids of the direct parents (a ";" is used to
separate multiple GO ids), and the ontology term defined, together with
some columns for other data.
getChildNodes
finds the direct children of a given GO id
based on a matrix containing the parent-child relationships (e. g. the
one returned by GOXMLParser
).
getOffspringNodes
finds all the direct or direct children
of a given GO id based on a matrix containing the parent-child
relationships (e. g. the one returned by GOXMLParser
)
getParentNodes
finds the direct parent of a given GO id
based on a matrix containing the parent-child relationships (e. g. the
one returned by GOXMLParser
).
getAncestors
finds all the direct or direct parents
of a given GO id based on a matrix containing the parent-child
relationships (e. g. the one returned by GOXMLParser
)
getTopGOid
figures out the root GO id for "mf" - molecular
funciton, "bp" - biological process, "cc" - celullar component, and
"go" - the whole Gene Ontology tree))
mapGO2Category
maps GO ids to the three categories (MF,
BP, CC) they belong to.
getGOGroupIDs
returns the GO id(s) for the topmost or the
three nodes corresponding to the three categories (MF, BP, and CC).
mapGO2AllProbe
maps GO ids to probe ids that are related
to the GO id and all its offsprings.
GOXMLParser
returns a matrix.
getChildNodes
returns a vector of character strings.
getOffspringNodes
returns a vector or list of vectors
depending on wheter the tree structure of parent-childern will be
preserved.
getParentNodes
returns a vector of character string.
getAncestors
returns a vector or list of vectors
depending on whether the tree structure of parent-childern will be
preserved.
mapGO2Category
returns a matrix with two columns
containing GO ids and letters representing one of the three categories
(MF, BP, and CC).
getGOGroupIDs
returns a vector of string(s) for GO
id(s).
mapGO2AllProbe
returns a matrix with GO ids as one
column and mappings to probe ids related to the GO ids and all its
offsprings as the other column.
getTopGOid
returns a character string for a GO id.
This function is part of the Biocondutor project within a package at the Dana-Farber Cancer Institute to provide Bioinformatics functionalities through R
Jianhua (John) Zhang
# Create the XML doc cat(paste("<?xml version='1.0'?>", "<!-- A test file for the examples in GOXMLParser.R Doc -->", "<go>", "<go:term>", "<go:accession>GO:0003674</go:accession>", "<go:name>molecular_function</go:name>", "<go:is_a rdf='http://wwww.myurl.org/go#GO:0003673' />", "<go:part_of rdf = 'http://wwww.myurl.org/go#GO:0003672' />", "</go:term>", "<go:term>", "<go:accession>GO:0005575</go:accession>", "<go:name>cellular_cpmponent</go:name>", "<go:is_a rdf= 'http://wwww.myurl.org/go#GO:0003673'/>", "<go:part_of rdf = 'http://wwww.myurl.org/go#GO:0003674' />", "</go:term>", "</go>"), file = "testDoc") # Parse the dummy file using GOXMLParser goData <- GOXMLParser("testDoc") # Get the child nodes for a GO id getChildNodes("GO:0003674", goData) getOffspringNodes("GO:0003673", goData, FALSE) getParentNodes("GO:0005575", goData) getAncestors("GO:0005575", goData, ";", FALSE, "GO:0003674") getTopGOid("GO") unlink("testDoc")