BioString-class {Biostrings}R Documentation

Class "BioString", represents a biological sequence

Description

Class "BioString", contains an encoded string representing a biological sequence for a particular alphabet (RNA, DNA or amino acid). It represents zero or more substrings of the full string.

Objects from the Class

Objects can be created by calls of the form new("BioString", alphabet, end, start, values, initialized, ...). However, it is recommended that users should not call this directly. For now, use the function NucleotideString to create objects of class "BioString" that uses a nucleotide alphabet (RNA or DNA) and the function DNAString for objects using DNA alphabet.

Slots

alphabet:
Object of class "BioAlphabet", the alphabet used in the sequence.
initialized:
Object of class "logical", TRUE if the sequence initialized with values. Users should not modify this slot directly.
offsets:
Object of class "matrix" and storage mode "integer", this stores (in two columns) the start and end points of the substrings represented in x. Rows with the first value 1 and the second value{0} represent empty substrings.
values:
Object of class "externalptr", this internally stores the actual encoded sequence as a vector. As objects of class "externalptr" are passed by value in R, this saves copying of long sequences.

Methods

initialize(.Object, alphabet, offsets=cbind(1, 0), values=BioStringNewValues(alphabet, end), initialized=!missing(values))
Construct an object of class "BioString". Usually not called directly by users.
length(x)
Return the number of substrings represented by x.
x[i]
Return the substrings in x corresponding to index i.
x[[i]]
Return the substring in x corresponding to the index i. The index i must be of length 1.
nchar(x, type)
Return the number of characters in each substring represented in x. type is not used for now.
show(object)
Display object of class "BioString".
as.character(x)
Convert a "BioString" object to a character vector using its native alphabet.
as.matrix(x)
Return a two-column matrix of integers, the first column representing the start index and the scond column representing the end index of the substrings in the full string.
substr(x, start, stop)
Return another BioString object with value equivalent to substr(as.character(x), start, stop).
substring(text, first, last)
Return another BioString object with value equivalent to substring(as.character(text), first, last).
matchDNAPattern(pattern, x, algorithm, mismatch)
Match the DNA string x against pattern using algorithm. The pattern can use the letters A,C,G,T,- (the last being the gap character) and also the wildcards N (matching A,C,G,T), V (matching A,G,C), R (matching A,G) and Y (matching C,T).
allSameLetter(x, letter)
Return a logical vetor indicating which of the elements of x are entirely made up of the letter letter.

The structure of the values slot

The values slot of the "BioString" class is of class "externalptr". It always contains an R vector object in its tag field. The other fields are not used at present. The vector in the tag field is either a CHARSXP or an INTSXP. The exact type depends on the length of the alphabet. INTSXP is used if it is more than the number of bits in a C char type and CHARSXP is used otherwise.

We use the i-th bit in the char or int (depending on whether the vector is of type CHARSXP or INTSXP) to represent the i-th letter in the alphabet where i=0 represents the first bit. This effectively means that we can have at most 32 letters (including gap) in our alphabets for all standard computer architectures.

Author(s)

Saikat DebRoy

See Also

BioAlphabet-class and its subclasses for valid alphabet objects. DNAString for creating objects of class "BioString" representing DNA sequences. NucleotideString for creating objects of class "BioString" representing DNA or RNA sequences.

Examples

new("BioString", DNAAlphabet()) # creates an empty DNA string
x <- DNAString("AAGCTANA", gap="N")
x
as.character(x)
substr(x, 2, 4)
substring(x, 1, seq(length=nchar(x))) # all prefixes of x
substring(x, seq(length=nchar(x)), nchar(x)) # all suffixes of x
matchDNAPattern("GC", x)
x <- substring(x, 1:3, 3:5)
x[1:2]
x[-3] # same as x[1:2]
x[[3]]

[Package Biostrings version 1.4.0 Index]