ABPkgBuilder {AnnBuilder}R Documentation

Functions that support a single API for building data packages

Description

These functions support a single API represented by ABPkgBuilder to allow users to build annotation data packages by providing a limited number of parameters. Other parameters will be figured out by the supporting functions.

Usage

ABPkgBuilder(baseName, srcUrls, baseMapType = c("gb", "ug", "ll"),
otherSrc = NULL, pkgName, pkgPath, organism = c("human", "mouse",
"rat"), version = "1.1.0", makeXML = TRUE, author = list(author = "who",
maintainer = "who@email.com"), fromWeb = TRUE)
getBaseParsers(baseMapType = c("gb", "ug"))
createEmptyDPkg(pkgName, pkgPath, folders, force = TRUE)
getDirContent(dirName, exclude = NULL)
getMultiColNames()
getUniColNames()
getTypeColNames()
splitEntry(dataRow, sep = ";", asNumeric = FALSE)
twoStepSplit(dataRow, entrySep = ";", eleSep = "@", asNumeric = FALSE)

Arguments

baseName baseName a character string for the name of a file to be used as a base file to base source data. The file is assumed to have two columns (separated by tabs "t") with the first one being the names of genes (probes) to be annotated and the second one being the maps to GenBank accession numbers, UniGene ids, or LocusLink ids
srcUrls srcUrls a vector of names character strings for the urls where source data files will be retained. Valid sources are LocusLink, UniGene, Golden Path, Gene Ontology, and KEGG. The names for the character strings should be LL, UG, GP, GO, and KEGG, respectively. LL and UG are required
baseMapType baseMapType a character string that is either "gb","ug", or "ll" to indicate whether the probe ids in baseName are mapped to GenBack accession numbers, UniGene ids, or LocusLink ids
otherSrc otherSrc a vector of named character strings for the names of files that contain mappings between probe ids of baseName and LobusLink ids that will be used to obtain the unified mappings between probe ids of baseName and LocusLink ids based on all the sources. The strings should not contain any number and the files have the same structure as baseName
pkgName pkgName a character string for the name of the data package to be built (e. g. hgu95a, rgu34a)
pkgPath pkgPath a character string for the full path of an existing directory where the built backage will be stored
organism organism a character string for the name of the organism of concern (now can only be "human", "mouse", or "rat")
version version a character string for the version number
makeXML makeXML a boolean to indicate whether an XML version will also be generated
author author a list of character strings with an author element for the name of the author and maintainer element for the email address of the author
force force a boolean that is set to TRUE if the package to be created will replace an existing package with the same name
dirName dirName a character string for the name of a directory whose contents are of interests
exclude exclude a character string for a pattern maching parameter that will be used to exclude contents of a directory that mach the pattern
dataRow dataRow a character string containing data elements with elements separated by sep or entrySep and a descriptive string attached to each element following eleSep
sep sep a character string for a separator
entrySep entrySep a character string for a separator
eleSep eleSep a character string for a separator
asNumeric asNumeric a boolean that is TRUE when the splited values will be returned as numeric values
fromWeb fromWeb a boolean to indicate whether the source data will be downloaded from the web or read from a local file
folders folders a vector of character strings for the names of folders to be created within a package that is going to be created

Details

These functions are the results of an effort to make data package building easier for urers. As the results, users may not have great power controlling the process or imputs. Additionally, some of the built in functions that figure out the urls for source data may fail when maintainers of the data source web sites change the name, structure, ect of the source data. When such event occurs, users may have to follow the instructions contained in a vignette named AnnBuilder to build data packages.

getBaseParsers figures out which of the built in parsers to use to parse the source data based on the type of the mappings done for the probes.

createEmptyDPkg creates an empty package with the required subdirectories for data to be stored.

getMultiColNames figures out what data elements for annotation have many to one relations with a probe. The many parts are separated by a separater in parsed annotation data.

getUniColNames figures out what data elements for annotation have one to one relations with a probe.

getTypeColNames figures out what data elements for annotation have many to one relations with a probe and additional information appended to the end of each element following a separate. The many parts are also separated by a separater in parsed annotation data.

splitEntry splits entries by a separator.

twoStepSplit splits entries by the separator specified by sep and the descriptive information of each element by eleSep.

Value

getBaseParsers returns a named vector for the names of the parsers to use to parse the source data.
getDirContent returns a vector of chracter strings for the content of a directory of interests.
getMultiColNames returns a vector of character srings.
getUniColNames returns a vector of character strings.
getTypeColNames returns a vector of character strings.
splitEntry returns a vector of character strings.
twoStepSplit returns a named vector of character strings. The names are the desciptive information appended to each element by eleSep

Note

The functions are part of the Bioconductor project at Dana-Farber Cancer Institute to provide Bioinformatics functionalities through R

Author(s)

Jianhua Zhang

References

HowTo and AnnBuilder vignettes

See Also

GOPkgBuilder,KEGGPkgBuilder

Examples

# Create a temporary directory for the data
myDir <- tempdir()
# Create a temp base data file
geneNMap <- matrix(c("32468_f_at", "D90278", "32469_at", "L00693",
                   "32481_at", "AL031663", "33825_at", " X68733",
                   "35730_at", "X03350", "36512_at", "L32179",
                   "38912_at", "D90042", "38936_at", "M16652",
                   "39368_at", "AL031668"), ncol = 2, byrow = TRUE)
write.table(geneNMap, file = file.path(myDir, "geneNMap"),
sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
# Urls for truncated versions of source data
mySrcUrls <- c(LL =
               "http://www.bioconductor.org/datafiles/wwwsources/Tll_tmpl.gz", UG = "http://www.bioconductor.org/datafiles/wwwsources/Ths.data.gz", 
GO = "http://www.bioconductor.org/datafiles/wwwsources/Tgo.xml")
# Create temp files for other sources
temp <- matrix(c("32468_f_at", NA, "32469_at", "2",
                   "32481_at", NA, "33825_at", " 9",
                   "35730_at", "1576", "36512_at", NA,
                   "38912_at", "10", "38936_at", NA,
                   "39368_at", NA), ncol = 2, byrow = TRUE)
write.table(temp, file = file.path(myDir, "srcone"), sep = "\t",
quote = FALSE, row.names = FALSE, col.names = FALSE)
temp <- matrix(c("32468_f_at", NA, "32469_at", NA,
                   "32481_at", "7051", "33825_at", NA,
                   "35730_at", NA, "36512_at", "1084",
                   "38912_at", NA, "38936_at", NA,
                   "39368_at", "89"), ncol = 2, byrow = TRUE)
write.table(temp, file = file.path(myDir, "srctwo"), sep = "\t",
quote = FALSE, row.names = FALSE, col.names = FALSE)
otherMapping <- c(srcone = file.path(myDir, "srcone"),
srctwo = file.path(myDir, "srctwo"))
# Runs only upon user's request
if(interactive()){
ABPkgBuilder(baseName = file.path(myDir, "geneNMap"),
srcUrls = mySrcUrls, baseMapType = "gb", otherSrc = otherMapping,
pkgName = "myPkg", pkgPath = myDir, organism = "human", version = "1.1.0",
makeXML = TRUE, author = c(author = "myname",
maintainer = "myname@myemail.com"))
# Output files
list.files(myDir)
# Content of the data package
list.files(file.path(myDir, "myPkg"))
list.files(file.path(myDir, "myPkg", "data"))
list.files(file.path(myDir, "myPkg", "man"))
list.files(file.path(myDir, "myPkg", "R"))
unlink(file.path(myDir, "myPkg"), TRUE)
unlink(file.path(myDir, "myPkg.xml"))
unlink(file.path(myDir, "myPkgByNum.xml")) 
}
unlink(c(file.path(myDir, "geneNMap"), file.path(myDir, "srcone"),
file.path(myDir, "srctwo")))

[Package Contents]