matchDNAPattern {Biostrings}R Documentation

Generic to find all matches of a pattern in a DNA string

Description

Generic that finds all matches of a pattern in a DNA string. Currently two algorithms are implemented. The default algorithm is an extension of the Boyer-Moore algorithm. The extended algorithm allows some wildcards in addition to the symbols for the bases and gap. The other algorithm is a simple forward search that examines all substrings of the full string of the same length as the pattern from the begining to end.

Usage

matchDNAPattern(pattern, x, algorithm, mismatch)

Arguments

pattern An object representing the pattern string. The string in pattern can use any of the standard DNA pattern letters. See DNAPatternAlphabet for all valid letters.
x An object representing a DNA string.
algorithm Currently the only valid values are "boyer-moore", "forward-search" and "shift-or". The forward search algorithm is often as fast as the more sphisticated Boyer-Moore algorithm when the patterns being matched are very simple. The shift-or algorithm is even faster. However, it can only be used for patterns of length at most 32 or 64 depending on the number of bits in a machine word. The shift-or algorithm can also do inexact matches for a given number of mismatches. The default is "shift-or" where valid and "boyer-moore" otherwise
mismatch An integer, the number of mismatches allowed. The defualt is 0. If the default is non-zero an inexact match algorithm is used for matching.

Value

An object of class "BioString" with the same length as the number of matches. Each element in the "BioString" object is a match. To obtain the start and end points of the matches, use as.matrix on the return value. See documentation for the "BioString" class for more details.

Author(s)

Saikat DebRoy

References

Dan Gusfield - Algorithms on strings, trees, and sequences

See Also

BioString-class for the type of the return value.

Examples

x <- DNAString("AAGCGCGATATG")
m1 <- matchDNAPattern("GCNNNAT", x)
m1
as.matrix(m1)
m2 <- matchDNAPattern("GCNNNAT", x, algorithm="forward-search")
m2
as.matrix(m2)
data('yeastSEQCHR1')
yeast1 <- DNAString(yeastSEQCHR1)
PpiI <- "GAACNNNNNCTC" # a restriction enzyme pattern
match1.PpiI <- matchDNAPattern(PpiI, yeast1)
match2.PpiI <- matchDNAPattern(PpiI, yeast1, algorithm="forward-search")
match1.PpiI
match2.PpiI
match3.PpiI <- matchDNAPattern(PpiI, yeast1, mismatch=1)
match3.PpiI

[Package Biostrings version 1.4.0 Index]