matchDNAPattern {Biostrings} | R Documentation |

## Generic to find all matches of a pattern in a DNA string

### Description

Generic that finds all matches of a pattern in a DNA string. Currently
two algorithms are implemented. The default algorithm is an extension
of the Boyer-Moore algorithm. The extended algorithm allows
some wildcards in addition to the symbols for the bases and gap. The
other algorithm is a simple forward search that examines all
substrings of the full string of the same length as the pattern from
the begining to end.

### Usage

matchDNAPattern(pattern, x, algorithm, mismatch)

### Arguments

`pattern` |
An object representing the pattern string. The string in
`pattern` can use any of the standard DNA pattern letters. See
`DNAPatternAlphabet` for all valid letters. |

`x` |
An object representing a DNA string. |

`algorithm` |
Currently the only valid values are
`"boyer-moore"` , `"forward-search"`
and `"shift-or"` . The forward search algorithm is often as
fast as the more sphisticated Boyer-Moore algorithm when the
patterns being matched are very simple. The shift-or algorithm is
even faster. However, it can only be used for patterns of length at
most 32 or 64 depending on the number of bits in a machine word. The
shift-or algorithm can also do inexact matches for a given number of
mismatches. The default is "shift-or" where valid and "boyer-moore"
otherwise |

`mismatch` |
An integer, the number of mismatches allowed. The
defualt is 0. If the default is non-zero an inexact match algorithm
is used for matching. |

### Value

An object of class "BioString" with the same length as the number of
matches. Each element in the "BioString" object is a match. To obtain
the start and end points of the matches, use `as.matrix`

on the
return value. See documentation for the "BioString" class for more
details.

### Author(s)

Saikat DebRoy

### References

Dan Gusfield - Algorithms on strings, trees, and sequences

### See Also

`BioString-class`

for the type of the return value.

### Examples

x <- DNAString("AAGCGCGATATG")
m1 <- matchDNAPattern("GCNNNAT", x)
m1
as.matrix(m1)
m2 <- matchDNAPattern("GCNNNAT", x, algorithm="forward-search")
m2
as.matrix(m2)
data('yeastSEQCHR1')
yeast1 <- DNAString(yeastSEQCHR1)
PpiI <- "GAACNNNNNCTC" # a restriction enzyme pattern
match1.PpiI <- matchDNAPattern(PpiI, yeast1)
match2.PpiI <- matchDNAPattern(PpiI, yeast1, algorithm="forward-search")
match1.PpiI
match2.PpiI
match3.PpiI <- matchDNAPattern(PpiI, yeast1, mismatch=1)
match3.PpiI

[Package

*Biostrings* version 1.4.0

Index]