mergeSAGE {SAGElyzer}R Documentation

Functions to merge SAGE libraries based on unique SAGE tags

Description

These functions merge individual SAGE libraries based on unique SAGE tags and write the merged data into a file and a table in a database with the unique SAGE tags as one column and counts from all the libraries as the others.

Usage

mergeSAGE(libNames, isDir = TRUE,  skip = 1, pattern = ".sage")
getLibInfo(fileNames)
calNormFact(normalize = c("min", "max"), libNNum)
getLibNNum(fileNames)
getUniqTags(fileNames, skip = 1, sep = "\t")
writeSAGE4Win(fileNames, uniqTags, infoData, pace = 1000)
mapFile2Tag(fileNames, tags, skip, n)
writeSAGECounts(fileNames, uniqTags, skip, sep = "\t")
writeSAGE2DB(dbArgs, colNames, keys, numCols, fileName, what =
c("counts", "map", "info"), charNum = 20, type = "int4")
getColSQL(colNames, charNum, keys, numCols, type)
writeSAGE4Unix(countData, infoData)

Arguments

libNames libNames - a vector of character strings for the name of the SAGE libraries to be merged. libNames can be the name of the directory containing SAGE libraries to be merged
isDir isDir - a boolean that is TRUE if libNames is the name for the directory that contains SAGE libraries to be merged
skip skip - an integer for the number of lines to be skiped when the libraries are merged
pattern pattern - a character string for the pattern to be used to get the file SAGE data files from the directory when libNames is for a directory. Only files that match the pattern will be merged
fileNames fileNames a vector of character strings for SAGE libraries to be writtern to DB or used for analysis
normalize normalize a character string given the name of a function for normalization
libNNum LibNNum a matrix with columns for SAGE library names and maximum and minimun number of counts
uniqTags uniqTags a vecter of character string for the unique SAGE tags
infoData inforData a matrix containing SAGE library information data
pace pace an integer for the maximun number of SGAE tags to be processed each run when writing SAGE library data to database under Windows
tags tags a vecter of character string of SAGE tags
n n an integer for the number of neighbors defined for KNN
sep sep a character string for the separator used
dbArgs dbArgs a list containing arguments for making conntions
colNames colNames a vector of character strings for the names of columns of a matrix
keys keys a vector of character strings for the names of key columns of a database
numCols numCols see ncol
fileName fileName acharacter string for the name of a file to be used to populate a database
what what a character string that can be either 'counts', 'map', or 'info' to indicate what SAGE data to deal with
charNum charNum an integer indicating the number of characters for the length of character columns in a database
type type a character string for the data type of a database column
countData countData a matrix containing tag counts for SAGE libraries

Details

Each SAGE library typically contains two columns with the first one being SAGE tags and the second one being their counts. mergeSAGE merges library files based on the tags. Tags that are missing from a given library but exist in other will be assigned 0s for the library.

mergeSAGE will generate two files. One contains the merged data and the other contains four columns with the first one being the column names of the database table to store the SAGE counts, the second one being the original SAGE library names, the third being the normalization factor that will be used to normalize counts based on the library with the smallest number of tags, and the forth being the factor based on the library with the largest number of tag.

getLibInfo creates the file that contains the information about the data file.

calNormFact calculates the normalization factor.

Value

mergeSAGE returns a list containing two file names

data a character string for the name of the file containing the merged data
info a character string for the name of the file containing information about the merged data


getLibInfo returns a matrix with four columns.

Note

The functions are part of the Bioconductor project at Dana-Farber Cancer Institute to provide Bioinformatics functionalities through R

Author(s)

Jianhua Zhang

References

http://www.ncbi.nlm.nih.gov/geo

See Also

SAGELyzer

Examples

path <- tempdir()
# Create two libraries
lib1 <- cbind(paste("tag", 1:10, sep = ""), 1:10)
lib2 <- cbind(paste("tag", 5:9, sep = ""), 15:19)
write.table(lib1, file = file.path(path, "lib1.sage"), sep = "\t",
row.names = FALSE, col.names = FALSE)
write.table(lib2, file = file.path(path, "lib2.sage"), sep = "\t",
row.names = FALSE, col.names = FALSE) 
libNNum <- getLibNNum(c(file.path(path, "lib1.sage"),
file.path(path, "lib2.sage")))
normFact <- calNormFact("min", libNNum)
uniqTag <- getUniqTags(c(file.path(path, "lib1.sage"),
file.path(path, "lib2.sage")), skip = 0)

[Package SAGElyzer version 1.4.2 Index]