GOXMLParser {AnnBuilder}R Documentation

Functions to read/parse the XML document of Gene Ontology data

Description

These functions are used by GO-class to read/parse the Gene Ontology data file (in XML formate) and figures out the parent-child relations.

Usage

GOXMLParser(fileName)
getChildNodes(goid, goData)
getOffspringNodes(goid, goData, keepTree = FALSE)
getParentNodes(goid, goData, sep = ";")
getAncestors(goid, goData, sep = ";", keepTree = FALSE, top = "GO:0003673")
getTopGOid(what = c("MF", "BP", "CC", "GO"))
mapGO2Category(goData)
getGOGroupIDs(onto = FALSE)
mapGO2AllProbe(go2Probe, goData, goid = "", sep = ";", all = TRUE)

Arguments

fileName fileName a character string for the name of the file of Gene Ontology xml data that are stored locally
goData goData a matrix with three columns for GO ids, parent GO ids, and the ontology terms
goid goid a character string for the id of Gene Ontology term (e.g. GO:006742)
keepTree keepTree a boolean indicating whether the tree structure showing parent-child relations will be preserved
sep sep a character string for separator used to separate multiple entries
top top a character string for the GO id that is the root for all the other GO ids along parent-child relation tree
what what a character string that has to be one of "mf", "bp", "cc", "go"
onto onto a boolean that is set to TRUE if the GO id for the topmost node is to be returned or FALSE if the GO ids for the three categories (BP, MF, and CC) to be returned
go2Probe go2Probe a matrix that maps GO ids to probe ids
all all a boolean to indicate whether to map all the GO ids contained in goData to probe ids (TRUE) or just the GO ids specified by goid (FALSE)

Details

The GO site provides an XML document for the molecular function, biological process, and cellular component of genes. The basic XML structure is something like: <go:term> <go:accession>GO:000xxx</go:accession> <go:name>a string for the function, process, or component</go:name> <go:isa rdf:resource="http://www.geneontology.org/go#GO:000xxxx" /> <go:part-of:resource="http://www.geneontology.org/go#GO:000xxxx" /> . . </go:term>

The XML document read from Gene Ontology site does not differentiate among the molecular function,biological process, and cellular component of genes as a go:name tag is used for the function, process, and component of genes. To determine whether a go:name tag is for the function, process, or component of a given gene identified by a GO accession number, the go:isa or go:part-of tag that keep reference of the parent-child relationship have to be retained for later use to move up a tree to find the correct category. As the result, the matrix returned by GOXMLParser has columns for the GOids, the GO ids of the direct parents (a ";" is used to separate multiple GO ids), and the ontology term defined, together with some columns for other data.

getChildNodes finds the direct children of a given GO id based on a matrix containing the parent-child relationships (e. g. the one returned by GOXMLParser).

getOffspringNodes finds all the direct or direct children of a given GO id based on a matrix containing the parent-child relationships (e. g. the one returned by GOXMLParser)

getParentNodes finds the direct parent of a given GO id based on a matrix containing the parent-child relationships (e. g. the one returned by GOXMLParser).

getAncestors finds all the direct or direct parents of a given GO id based on a matrix containing the parent-child relationships (e. g. the one returned by GOXMLParser)

getTopGOid figures out the root GO id for "mf" - molecular funciton, "bp" - biological process, "cc" - celullar component, and "go" - the whole Gene Ontology tree))

mapGO2Category maps GO ids to the three categories (MF, BP, CC) they belong to.

getGOGroupIDs returns the GO id(s) for the topmost or the three nodes corresponding to the three categories (MF, BP, and CC).

mapGO2AllProbe maps GO ids to probe ids that are related to the GO id and all its offsprings.

Value

GOXMLParser returns a matrix.
getChildNodes returns a vector of character strings.
getOffspringNodes returns a vector or list of vectors depending on wheter the tree structure of parent-childern will be preserved.
getParentNodes returns a vector of character string.
getAncestors returns a vector or list of vectors depending on whether the tree structure of parent-childern will be preserved.
mapGO2Category returns a matrix with two columns containing GO ids and letters representing one of the three categories (MF, BP, and CC).
getGOGroupIDs returns a vector of string(s) for GO id(s).
mapGO2AllProbe returns a matrix with GO ids as one column and mappings to probe ids related to the GO ids and all its offsprings as the other column.
getTopGOid returns a character string for a GO id.

Note

This function is part of the Biocondutor project within a package at the Dana-Farber Cancer Institute to provide Bioinformatics functionalities through R

Author(s)

Jianhua (John) Zhang

References

http://www.geneontology.org

See Also

GO-class

Examples


# Create the XML doc
  cat(paste("<?xml version='1.0'?>",
         "<!-- A test file for the examples in GOXMLParser.R Doc -->",
         "<go>",            
             "<go:term>",
                 "<go:accession>GO:0003674</go:accession>",
                 "<go:name>molecular_function</go:name>",
                 "<go:is_a rdf='http://wwww.myurl.org/go#GO:0003673' />",
                 "<go:part_of rdf = 'http://wwww.myurl.org/go#GO:0003672' />",
             "</go:term>",
             "<go:term>",
                 "<go:accession>GO:0005575</go:accession>",
                 "<go:name>cellular_cpmponent</go:name>",
                 "<go:is_a rdf= 'http://wwww.myurl.org/go#GO:0003673'/>",
                 "<go:part_of rdf = 'http://wwww.myurl.org/go#GO:0003674' />",
             "</go:term>",
          "</go>"), file = "testDoc")

  # Parse the dummy file using GOXMLParser 
  goData <- GOXMLParser("testDoc")
  # Get the child nodes for a GO id
  getChildNodes("GO:0003674", goData)
  getOffspringNodes("GO:0003673", goData, FALSE)
  getParentNodes("GO:0005575", goData)
  getAncestors("GO:0005575", goData, ";", FALSE, "GO:0003674")
  getTopGOid("GO")
  unlink("testDoc")

[Package AnnBuilder version 1.4.21 Index]