Data Types
Category | Data Type Name | ID | Examples | Symbol | Size | Range |
---|---|---|---|---|---|---|
VOID | VOID | 0 | NULL | 1 | ||
LOGICAL | BOOL | 1 | 1b, 0b, true, false | b | 1 | 0~1 |
INTEGRAL | CHAR | 2 | 'a', 97c | c | 1 | -2 7 +1~2 7 -1 |
SHORT | 3 | 122h | h | 2 | -2 15 +1~2 15 -1 | |
INT | 4 | 21 | i | 4 | -2 31 +1~2 31 -1 | |
LONG | 5 | 22l | l | 8 | -2 63 +1~2 63 -1 | |
COMPRESSED | 26 | 1 | -2 7 +1~2 7 -1 | |||
TEMPORAL | DATE | 6 | 2013.06.13 | d | 4 | |
MONTH | 7 | 2012.06M | M | 4 | ||
TIME | 8 | 13:30:10.008 | t | 4 | ||
MINUTE | 9 | 13:30m | m | 4 | ||
SECOND | 10 | 13:30:10 | s | 4 | ||
DATETIME | 11 | 2012.06.13 13:30:10 or 2012.06.13T13:30:10 | D | 4 | [1901.12.13T20:45:53, 2038.01.19T03:14:07] | |
TIMESTAMP | 12 | 2012.06.13 13:30:10.008 or 2012.06.13T13:30:10.008 | T | 8 | ||
NANOTIME | 13 | 13:30:10.008007006 | n | 8 | ||
NANOTIMESTAMP | 14 | 2012.06.13 13:30:10.008007006 or 2012.06.13T13:30:10.008007006 | N | 8 | [1677.09.21T00:12:43.145224193, 2262.04.11T23:47:16.854775807] | |
DATEHOUR | 28 | 2012.06.13T13 | 4 | |||
FLOATING | FLOAT | 15 | 2.1f | f | 4 | Sig. Fig. 06-09 |
DOUBLE | 16 | 2.1 | F | 8 | Sig. Fig. 15-17 | |
LITERAL | SYMBOL | 17 | S | 4 | ||
STRING | 18 | "Hello" or 'Hello' or `Hello | W | ≤ 65,535 | ||
BLOB | 32 | |||||
BINARY | INT128 | 31 | e1671797c52e15f763380b45e841ec32 | 16 | -2 127 +1~2 127 -1 | |
UUID | 19 | 5d212a78-cc48-e3b1-4235-b4d91473ee87 | 16 | |||
IPADDR | 30 | 192.168.1.13 | 16 | |||
POINT | 35 | (117.60972, 24.118418) | 16 | |||
SYSTEM | FUNCTIONDEF | 20 | def f1(a,b) {return a+b;} | |||
HANDLE | 21 | file handle, socket handle, and db handle | ||||
CODE | 22 | <1+2> | ||||
DATASOURCE | 23 | |||||
RESOURCE | 24 | |||||
DURATION | 36 | 1s, 3M, 5y, 200ms | 4 | |||
MIXED | ANY | 25 | (1,2,3) | |||
ANY DICTIONARY | 27 | {a:1,b:2} | ||||
OTHER | COMPLEX | 34 | 2.3+4.0i | 16 | ||
DECIMAL | DECIMAL32(S) | 37 | 3.1415926$DECIMAL32(3) | 4 | (-1*10^(9-S), 1*10^(9-S)) | |
DECIMAL64(S) | 38 | 3.1415926$DECIMAL64(3), 3.141P | P | 8 | (-1*10^(18-S), 1*10^(18-S)) | |
DECIMAL128(S) | 39 | 3.1415926$DECIMAL128(3) | 16 | (-1*10^(38-S), 1*10^(38-S)) | ||
ARRAY | Data types + the square bracket "[]", i.e., INT[], DOUBLE[], DECIMAL32(3)[], etc. | IDs of data types + 64 | array(INT[], 0, 10).append!([1 2 3, 4 5, 6 7 8, 9 10]) |
Note:
-
The following data types are not supported in table columns: VOID, FUNCTIONDEF, HANDLE, CODE, DATASOURCE, RESOURCE, COMPRESSED, DURATION.
-
DolphinDB uses STRING, BLOB and SYMBOL data types to represent strings.
-
SYMBOL is a special STRING type. When a column is of SYMBOL type, the number of unique values must be less than 2,097,152 (2^21), otherwise the error "One symbase's size can't exceed 2097152" will be reported.
-
Version 1.30.23/2.00.11 added size constraints for STRING, BLOB, and SYMBOL data written to distributed databases:
-
Data of STRING type must be smaller than 64 KB, otherwise it will be truncated to 65,535 bytes (i.e., 64 KB - 1 byte);
-
Data of BLOB type must be smaller than 64 MB, otherwise it will be truncated to 67,108,863 bytes (i.e., 64 MB - 1 byte);
-
Data of SYMBOL type must be smaller than 255 bytes, otherwise the system will throw an exception.
-
Note: Data exceeding these limits that has already been stored in the databases can still be accessed.
-
-
SYMBOL is a special STRING type. When a column is of SYMBOL type, the number of unique values must be less than 2097152 (2^21), otherwise the error "One symbase's size can't exceed 2097152" will be reported.
-
ANY DICTIONARY is the data type in DolphinDB for JSON.
-
COMPRESSED can only be generated with function compress .
-
Values of BLOB type do not participate in calculations.
-
The DURATION type can be generated with function
duration
or by combining an integer with a unit of time (case sensitive): y, M, w, d, B, H, m, s, ms, us, ns. The range of a DURATION value is -2 31 +1~2 31 -1. If a data type overflow occurs, the data is treated as NULL value. -
DolphinDB uses IEEE 754 standard for the data types DOUBLE and FLOAT. If a data type overflow occurs, the data is treated as NULL value.
-
The character "S" of DECIMAL32(S), DECIMAL64(S) and DECIMAL128(S) means scale, which determines the number of digits to the right of the decimal point. The value range of S is [0,9] for DECIMAL32(S), [0,18] for DECIMAL64(S), and [0,38] for DECIMAL128(S). DECIMAL32 is stored as the int32_t type and takes 4 bytes; DECIMAL64 is stored as the int64_t type and takes 8 bytes; DECIMAL128 is stored as the int128_t type and takes 16 bytes. DECIMAL(0) can represent integers in the range of [-999,999,999, 999,999,999], while the 4-byte integer (INT32) is in the range of [-2,147,483,648, 2,147,483,647]. Therefore, if the integral range of a numeric value exceeds the valid range of DECIMAL32 but is within the range of [-2147483648, 2147483647], it can still be converted to DECIMAL 32. However, when converting a string to DECIMAL32, if its length exceeds the range, the system raises an exception.
decimal32(1000000000, 0) // output 1000000000 decimal32(`1000000000, 0) // output Convert string to DECIMAL failed: Decimal math overflow
Type check
Data Range
x=-128c;
x;
// output
00c
typestr x;
// output
CHAR
Data Type Symbols
A data type symbol is used for declaring a data type of a constant. In the example below, without specifying a data type symbol, number 3 is stored in memory by default as an integer. If you would like to save it as a floating number, it should be declared as 3f(float) or 3F(double).
typestr 3;
// output
INT
typestr 3f;
// output
FLOAT
typestr 3F;
// output
DOUBLE
typestr 3l;
// output
LONG
typestr 3h;
// output
SHORT
typestr 3c;
// output
CHAR
typestr 3b;
// output
BOOL
typestr 3P;
// output
DECIMAL64
Symbol and String
In some circumstances it might be optimal to save strings as SYMBOL types in DolphinDB. SYMBOL types are stored as integers in DolphinDB to allow more efficient sorting and comparison. Therefore, SYMBOL types could potentially improve operating performance and save storage space. On the other hand, mapping strings to integers (hashing) takes time and the hash table consumes memory.
The following rules could help you decide whether to use SYMBOL types or not:
- Avoid using SYMBOL types if the data will not be sorted, searched or compared.
- Avoid using SYMBOL types if there are few duplicate values.
- Stock tickers in a trades or quotes table should use SYMBOL types because a stock usually has a large amount of rows in these tables, and because stocks tickers are frequently searched and compared.
- Descriptive fields should not use SYMBOL types because description seldom repeats and is rarely searched, sorted or compared.
Example 1: Sorting a symbol vector with 3 million records is 40 times faster than that of the same sized string vector.
n=3000000
strs=array(STRING,0,n)
strs.append!(rand(`IBM`C`MS`GOOG, n))
timer sort strs;
// output
Time elapsed: 482.027 ms
n=3000000
syms=array(SYMBOL,0,n)
syms.append!(rand(`IBM`C`MS`GOOG, n))
timer sort syms;
// output
Time elapsed: 12.001 ms
Example 2: Comparing a symbol vector with 3 million records is almost 15 times as fast as comparing the same sized string vector.
timer(100){strs>`C};
// output
Time elapsed: 4661.26 ms
timer(100){syms>`C};
// output
Time elapsed: 322.655 ms
Symbol Vector Creation
syms=array(SYMBOL, 0, 100);
// create an empty symbol array;
typestr syms;
// output
FAST SYMBOL VECTOR
syms.append!(`IBM`C`MS);
syms;
// output
["IBM","C","MS"]
(2) With type
conversionsyms=`IBM`C`MS;
typestr syms;
// output
STRING VECTOR
// converting to a symbol vector;
sym=syms$SYMBOL;
typestr sym;
// output
FAST SYMBOL VECTOR
typestr syms;
// output
STRING VECTOR
(3) With function randsyms=`IBM`C`MS;
symRand=rand(syms, 10);
//generate a random SYMBOL vector
symRand;
// output
["IBM","IBM","IBM","MS","C","C","MS","IBM","C","MS"]
typestr symRand;
// output
FAST SYMBOL VECTOR
Note: The rand
function takes a string vector and generates
a symbol vector. The rand
function doesn't change any other input
data types. We intentionally make this exception as when users generate a random
vector based on a string vector, in most cases they would like to get a symbol
vector. Integer Overflow
An overflow occurs when the result of an operation falls outside the valid range of the data type being used. In DolphinDB, an integer overflow returns null.
In the following example, the variable x is of INT type, and it is assigned the
maximum INT value, i.e., 2 31 - 1. The result of x+1
exceeds the upper limit of the INT type, therefore it returns NULL.
x=(pow(2,31)-1)$INT;
x+1;
// output
null