readGEOAnn {annotate}R Documentation

Function to extract data from the GEO web site


Data files that are available at GEO web site are identified by GEO accession numbers. Given the url for the CGI script at GEO and a GEO accession number, the functions extract data from the web site and returns a matrix containing the data.


readGEOAnn(GEOAccNum, url = "")
readIDNAcc(GEOAccNum, url = "")
getGPLNames(url ="") 
getSAGEFileInfo(url =
getSAGEGPL(organism = "Homo sapiens", enzyme = c("NlaIII", "Sau3A"))


url url the url for the CGI script at GEO
GEOAccNum GEOAccNum a character string for the GEO accession number of a desired file (e. g. GPL97)
organism organism a character string for the name of the organism of interests
enzyme enzyme a character string that can be eighter NlaII or Sau3A for the enzyme used to create SAGE tags


url is the CGI script that processes user's request. readGEOAnn invokes the CGI by passing a GEO accession number and then processes the data file obtained.

readIDNAcc calls readGEOAnn to read the data and the extracts the columns for probe ids and accession numbers. The GEOAccNum has to be the id for an Affymetrix chip.

getGPLNames parses the html file that lists GEO accession numbers and descriptions of the array represented by the corresponding GEO accession numbers.


Both readGEOAnn and readIDNAcc return a matrix.
getGPLNames returns a named vector of the names of commercial arrays. The names of the vector are the corresponding GEO accession number.


This function is part of the Bioconductor project at Dana-Farber Cancer Institute to provide Bioinformatics functionalities through R


Jianhua Zhang



# Get array names and GEO accession numbers
#geoAccNums <- getGPLNames()
# Read the annotation data file for HG-U133A which is GPL96 based on
# examining geoAccNums 
#temp <- readGEOAnn(GEOAccNum = "GPL96")
#temp2 <- readIDNAcc(GEOAccNum = "GPL96")

[Package annotate version 1.8.0 Index]