encodeShortGenomeSeq
Syntax
encodeShortGenomeSeq(X)
Alias: encodeSGS
Arguments
X
is a scalar/vector of STRING/CHAR type.
Details
encodeShortGenomeSeq
encodes DNA sequences made up of A, T, C, G
letters. The encoding can reduce the storage space needed for DNA sequences and
improve performance.
Note:
-
When X is an empty string (""), the function returns 0.
-
When X contains any character other than A, T, C, G (case-sensitive), the function returns NULL.
-
When the length of X exceeds 28 characters, the function returns NULL.
Return Value: LONG or FAST LONG vector
Examples
a=encodeShortGenomeSeq("TCGATCG")
a;
// output
465691
typestr(a)
// output
LONG
b=encodeShortGenomeSeq("TCGATCG" "TCGATCGCCC")
// output
[465691,168216298]
typestr(b)
// output
FAST LONG VECTOR
//NULL is returned as the input exceeds 28 characters after "TCGATCG" is repeated 5 times.
encodeShortGenomeSeq(repeat("TCGATCG" "TCGAT", 5))
// output
[,1801916404867712433]
y=toCharArray("TCGATCGCCC")
encodeShortGenomeSeq(y)
// output
168216298
//NULL is returned in the following cases
encodeShortGenomeSeq("TC G")
encodeShortGenomeSeq("TCtG")
encodeShortGenomeSeq("NNNNNNNNTCGGGGCAT")
encodeShortGenomeSeq("TCGGGGCATNGCCCG")
encodeShortGenomeSeq("GCCCGATNNNNN")
Related functions: decodeShortGenomeSeq, genShortGenomeSeq