A function to match a query sequence to the sequences of a set of probes.


The query sequence, a character string (probably representing a transcript of interest), is scanned for the presence of exact matches to the sequences in the character vector records. The indices of the set of matches are returned.


matchprobes(query, records, probepos=FALSE)


query A character vector. For example, each element may represent a gene (transcript) of interest.
records A character vector. For example, each element may represent the probes on a DNA array.
probepos A logical value. If TRUE, return also the start positions of the matches in the query sequence.


The matching is done using the C library function strstr. It might be nice to explore other possibilities.


A list. Its first element is a list of the same length as the input vector. Each element of the list is a numeric vector containing the indices of the probes that have a perfect match in the query sequence.
If probepos is TRUE, the returned list has a second element: it is of the same shape as described above, and gives the respective positions of the matches.


R. Gentleman, Laurent Gautier, Wolfgang Huber


  ## The main intention for this function is together with the probe
  ## tables from the "probe" data packages, e.g.:
  ## > library(hgu95av2probe)
  ## > data(probe)
  ## > seq <- probe$sequence
  ## Since we do not want to be dependent on the presence of this 
  ## data package, for the sake of example we simply simulate some
  ## probe sequences:

  bases <- c("A", "C", "G", "T")
  seq   <- sapply(1:1000, function(x) paste(bases[ceiling(4*runif(256))], collapse=""))

  w1 <- seq[20:22]
  w2 <- complementSeq(w1, start=13, stop=13)
  w  <- c(w1, w2)

  matchprobes(w, seq)
  matchprobes(w, seq, probepos=TRUE)

