encodeShortGenomeSeq

Syntax

encodeShortGenomeSeq(X)

Alias: encodeSGS

Arguments

X is a scalar/vector of STRING/CHAR type.

Details

encodeShortGenomeSeq encodes DNA sequences made up of A, T, C, G letters. The encoding can reduce the storage space needed for DNA sequences and improve performance.

Note:
  • When X is an empty string (""), the function returns 0.

  • When X contains any character other than A, T, C, G (case-sensitive), the function returns NULL.

  • When the length of X exceeds 28 characters, the function returns NULL.

Return Value

LONG or FAST LONG VECTOR

Examples


    $ a=encodeShortGenomeSeq("TCGATCG")
    $ a;
    465691
    $ typestr(a)
    LONG
    
    $ b=encodeShortGenomeSeq("TCGATCG" "TCGATCGCCC") 
    [465691,168216298]
    $ typestr(b)
    FAST LONG VECTOR
    
    //NULL is returned as the input exceeds 28 characters after "TCGATCG" is repeated 5 times.
    $ encodeShortGenomeSeq(repeat("TCGATCG" "TCGAT", 5))
    [,1801916404867712433]
    
    $ y=toCharArray("TCGATCGCCC")
    $ encodeShortGenomeSeq(y)
    168216298
    
    $ encodeShortGenomeSeq("TC G") 
    22l
    $ encodeShortGenomeSeq("TCtG") 
    22l
    //NULL is returned as the input contains letter "N". 
    $ encodeShortGenomeSeq("NNNNNNNNTCGGGGCAT")
    22l
    $ encodeShortGenomeSeq("TCGGGGCATNGCCCG")
    22l
    $ encodeShortGenomeSeq("GCCCGATNNNNN")
    22l

Related functions: decodeShortGenomeSeq, genShortGenomeSeq