encodeShortGenomeSeq
Syntax
encodeShortGenomeSeq(X)
Alias: encodeSGS
Arguments
X
is a scalar/vector of STRING/CHAR type.
Details
encodeShortGenomeSeq
encodes DNA sequences made up of A, T, C, G
letters. The encoding can reduce the storage space needed for DNA sequences and
improve performance.
Note:
-
When X is an empty string (""), the function returns 0.
-
When X contains any character other than A, T, C, G (case-sensitive), the function returns NULL.
-
When the length of X exceeds 28 characters, the function returns NULL.
Return Value
LONG or FAST LONG VECTOR
Examples
$ a=encodeShortGenomeSeq("TCGATCG")
$ a;
465691
$ typestr(a)
LONG
$ b=encodeShortGenomeSeq("TCGATCG" "TCGATCGCCC")
[465691,168216298]
$ typestr(b)
FAST LONG VECTOR
//NULL is returned as the input exceeds 28 characters after "TCGATCG" is repeated 5 times.
$ encodeShortGenomeSeq(repeat("TCGATCG" "TCGAT", 5))
[,1801916404867712433]
$ y=toCharArray("TCGATCGCCC")
$ encodeShortGenomeSeq(y)
168216298
$ encodeShortGenomeSeq("TC G")
22l
$ encodeShortGenomeSeq("TCtG")
22l
//NULL is returned as the input contains letter "N".
$ encodeShortGenomeSeq("NNNNNNNNTCGGGGCAT")
22l
$ encodeShortGenomeSeq("TCGGGGCATNGCCCG")
22l
$ encodeShortGenomeSeq("GCCCGATNNNNN")
22l
Related functions: decodeShortGenomeSeq, genShortGenomeSeq