PROTOCOL_DDB
PROTOCOL_DDB is DolphinDB's protocol for serializing and deserializing data. It is widely adopted by DolphinDB's APIs (Python, C++, Java, etc.) to transmit data. Among DolphinDB's data serialization protocols, PROTOCOL_DDB supports the greatest variety of DolphinDB data forms and data types.
Note
- DolphinDB data forms refer to data structures, such as scalar, vector, table, etc.
- DolphinDB data types refer to specific data types, such as INT, DOUBLE, DATETIME, etc.
- In the following sections, the Python libraries NumPy and pandas will be referred to as np and pd, respectively.
Enabling PROTOCOL_DDB
To use PROTOCOL_DDB, we need to enable it in the DolphinDB session and DBConnectionPool objects by setting the protocol parameter to PROTOCOL_DDB.
import dolphindb as ddb
import dolphindb.settings as keys
s = ddb.Session(protocol=keys.PROTOCOL_DDB)
s.connect("localhost", 8848, "admin", "123456")
pool = ddb.DBConnectionPool("localhost", 8848, "admin", "123456", 10, protocol=keys.PROTOCOL_DDB)
Supported Data Forms
Additional Parameter | Data Form | Serialization | Deserialization |
---|---|---|---|
pickleTableToList=False | Scalar | √ | √ |
pickleTableToList=False | Vector | √ | √ |
pickleTableToList=False | Pair | × | √ |
pickleTableToList=False | Matrix | √ | √ |
pickleTableToList=False | Set | √ | √ |
pickleTableToList=False | Dict | √ | √ |
pickleTableToList=False | Table | √ | √ |
pickleTableToList=True | Table | × | √ |
Serialization: From Python to DolphinDB
In this section, we will demonstrate how Python objects are mapped to DolphinDB data types when uploaded to a DolphinDB server with PROTOCOL_DDB enabled. The upload()
method is used as an example.
Scalar
The following table shows the data type mapping when uploading scalars to DolphinDB server ("---" indicates that there is no matching Python data type.
Python Data Type | Python Example | DolphinDB Data Type | DolphinDB Example |
---|---|---|---|
NoneType | None | VOID | NULL |
bool | True | BOOL | true |
np.int8 | np.int8(12) | CHAR | char(12), 12c |
np.int16 | np.int16(12) | SHORT | short(12), 12h |
np.int32 | np.int32(12) | INT | int(12), 12 |
np.int64 | np.int64(12) | LONG | long(12), 12l |
int | 12 | LONG | long(12), 12l |
np.datetime64[D] | np.datetime64("2012-01-02", "D") | DATE | 2012.01.02 |
np.datetime64[M] | np.datetime64("2012-01", "M") | MONTH | 2012.01M |
--- | --- | TIME | --- |
--- | --- | MINUTE | --- |
--- | --- | SECOND | --- |
np.datetime64[s] | np.datetime64("2012-01-02T01:02:03", "s") | DATETIME | datetime(2012.01.02T01:02:03) |
np.datetime64[ms] | np.datetime64("2012-01-02T01:02:03.123", "ms") | TIMESTAMP | timestamp(2012.01.02T01:02:03.123) |
--- | --- | NANOTIME | --- |
np.datetime64[ns] | np.datetime64("2012-01-02T01:02:03.123456789", "ns") | NANOTIMESTAMP | nanotimestamp(2012.01.02T01:02:03.123456789) |
np.datetime64 | np.datetime64("") | NANOTIMESTAMP | nanotimestamp(NULL) |
pd.Timestamp | pd.Timestamp("2012-01-02T01:02:03.123456789") | NANOTIMESTAMP | nanotimestamp(2012.01.02T01:02:03.123456789) |
pd.NaTType | pd.NaT | NANOTIMESTAMP | nanotimestamp(NULL) |
np.float32 | np.float32(1.1) | FLOAT | float(1.1), 1.1f |
np.float64 | np.float64(1.2) | DOUBLE | double(1.2) |
float | 1.2 | DOUBLE | double(1.2) |
float | np.nan | DOUBLE | double(NULL) |
str | "abc" | STRING | "abc" |
str | "" | STRING | "" |
--- | --- | SYMBOL | --- |
--- | --- | UUID | --- |
np.datetime64[h] | np.datetime64("2012-01-02T01", "h") | DATEHOUR | 2012.01.02T01 |
--- | --- | IPADDR | --- |
bytes | bytes("abc", encoding="UTF-8") | BLOB | "abc" |
--- | --- | DECIMAL32 | --- |
Decimal | decimal.Decimal("-10.21") | DECIMAL64 | decimal64(-10.21, 2) |
Decimal | decimal.Decimal("NaN") | DECIMAL64 | decimal64(NULL, 2) |
Decimal | decimal.Decimal("-10.00000000000000000021") | DECIMAL128 | decimal128("-10.00000000000000000021", 20) |
Note:
- Starting from DolphinDB Python API version 1.30.22.6, the PROTOCOL_DDB supports uploading Decimal128 data. Scalar values with a scale less than or equal to 17 will be uploaded as Decimal64 data. Scalar values with a scale greater than 17 will be uploaded as Decimal128 data.
- Starting from version 2.0.11.0, the length of the uploaded BLOB data is no longer limited and the length of the uploaded SYMBOL/STRING data is limited to less than 256 KB.
Vector
When a vector-like data structure is uploaded to DolphinDB, its data type in DolphinDB is determined as follows:
(1) Check if the uploaded Python object is a vector-like data structure (e.g., tuple, list, np.ndarray, pd.Series, etc.);
(2) If the object is type-specific (e.g. np.ndarray) and its dtype is not 'object', perform a direct data type conversion. This is the most efficient case.
(3) If the object is non-typed (e.g. list) or its dtype is 'object' (np.ndarray), iterate through the data: (a) If the object contains null values (None, np.nan, pd.NaT), use the type of the first null value as the object’s type; (b) If two or more data types (see Notes below) are detected, or the object contains embedded vector-like data structures, the object is uploaded as a DolphinDB ANY vector and each element will be converted independently. This requires another iteration and reduces performance.
(4) Data types of vector-like data structures are determined by mostly the same conversion rules as scalars. Similarly, Python objects cannot be directly uploaded as DolphinDB TIME, UUID or other unsupported vector types.
(5) If an np.ndarray's elements are vector-like data structures of the same length, upload it as a DolphinDB matrix.
(6) If the uploaded object is not an np.ndarray, but its elements are vector-like data structures of the same length, upload it as a DolphinDB ANY vector.
(7) If an object's elements are all nulls of different types, convert them as follows:
Types of Null Values in the Array / List | Python Data Type | DolphinDB Column Type |
---|---|---|
None | object | STRING |
np.NaN and None | float64 | DOUBLE |
pd.NaT and None | datetime64 | NANOTIMESTAMP |
np.NaN and pd.NaT | datetime64 | NANOTIMESTAMP |
None, np.NaN and pd.NaT | datetime64 | NANOTIMESTAMP |
None / pd.NaT / np.nan and non-null values | - | the data type of the non-null values |
Notes
(1) "Two or more data types" refers to DolphinDB data types. For example, np.array([np.float64(12),13])
contains only one data type.
(2) The DolphinDB Python API does not support pd.array vectors in the current version.
(3) Null values from a list or np.ndarray with dtype='object' (such as None/np.nan/pd.NaT; but decimal.Decimal(NaN
) is not included) are treated as the same data type,
(4) Uploading objects as DolphinDB array vectors is not supported. Data such as np.array([[1], [2, 3]])
will be uploaded as DolphinDB ANY Vectors.
(5) For DolphinDB Python API versions lower than 3.0.1.0, Ensure all values in a DECIMAL column have the same number of decimal digits after the decimal point. Otherwise, the column's scale will default to match the first non-null value. The following script demonstrates how to align the scale of DECIMAL data:
>>> b = decimal.Decimal("1.23")
>>> b
Decimal('1.23')
>>> b = b.quantize(decimal.Decimal("0.000"))
>>> b
Decimal('1.230')
For versions 3.0.1.0 and above, there's no need to ensure that all data in DECIMAL type columns have the same number of decimal places. The system will automatically adjust the precision based on the first non-null DECIMAL value encountered.
(6) Uploading Python bytes values as DolphinDB BLOB values is not currently supported.
Below are common examples of uploading vector-like data structures to the DolphinDB.
Example 1. Upload BOOL, INT, DOUBLE, STRING and DATE vectors without null values.
>>> s.upload({'bool_v': np.array([True, False, False], dtype="bool")})
>>> s.upload({'int_v': np.array([1, 2, 4], dtype="int32")})
>>> s.upload({'double_v': [1.2, 2.456]})
>>> s.upload({'string_v': np.array(["abc", "123"], dtype="object")})
>>> s.upload({'date_v': np.array(["2012-01-02"], dtype="datetime64[D]")})
To check an uploaded object's data type in DolphinDB, call the server function typestr()
.
>>> s.run("typestr(bool_v)")
FAST BOOL VECTOR
>>> s.run("typestr(int_v)")
FAST INT VECTOR
>>> s.run("typestr(double_v)")
FAST DOUBLE VECTOR
>>> s.run("typestr(string_v)")
STRING VECTOR
>>> s.run("typestr(date_v)")
FAST DATE VECTOR
Example 2. Upload BOOL, INT, DOUBLE, STRING and DATE vectors containing Null values.
>>> s.upload({'bool_v': [True, None, False]})
>>> s.upload({'int_v': np.array([None, np.int32(2), np.int32(12)], dtype="object")})
>>> s.upload({'double_v': np.array([1.1, np.nan, 3.456])})
>>> s.upload({'string_v': ["", "abc", "123"]})
>>> s.upload({'date_v': [pd.NaT, None, np.nan, np.datetime64("2012-01-03", "D")]})
Check the uploaded object's data type in DolphinDB:
>>> s.run("typestr(bool_v)")
FAST BOOL VECTOR
>>> s.run("typestr(int_v)")
FAST INT VECTOR
>>> s.run("typestr(double_v)")
FAST DOUBLE VECTOR
>>> s.run("typestr(string_v)")
STRING VECTOR
>>> s.run("typestr(date_v)")
FAST DATE VECTOR
Example 3. Upload ANY vectors
There are two options to upload data as ANY vectors:
- Specify
dtype=object
when constructing np.ndarray. - Upload data in a list or tuple object.
Note that both options require the the uploaded data to contain two or more data types, or contain nested vector-like data structures.
>>> s.upload({'list_v': [1.2, "abc"])})
>>> s.upload({'array_v': np.array([1, 1.2], dtype="object"})
>>> s.upload({'list_av': [[1, 2], [3]]})
>>> s.upload({'array_av': np.array([[1], [2, 3]], dtype="object")})
Check the uploaded object's data type in DolphinDB:
>>> s.run("typestr(list_v)")
ANY VECTOR
>>> s.run("typestr(array_v)")
ANY VECTOR
>>> s.run("typestr(list_av)")
ANY VECTOR
>>> s.run("typestr(array_av)")
ANY VECTOR
Pair
It is currently not supported to upload Python objects as Pairs using PROTOCOL_DDB.
Matrix
If an np.ndarray's elements are vector-like data structures of the same length, it will be treated as a DolphinDB matrix; If the uploaded object is not an np.ndarray, but its elements are vector-like data structures of the same length, it will be uploaded as a DolphinDB ANY vector.
Example
>>> s.upload({'int_m': np.array([[1, 2], [2, 3], [3, 4]])})
>>> s.run("typestr(int_m)")
FAST LONG MATRIX
>>> s.upload({'any_vec': [[1, 2], [2, 3], [3, 4]]})
>>> s.run("typestr(any_vec)")
ANY VECTOR
Note: Two-dimensional ndarrays with only one row will be uploaded as a vector instead of a matrix.
Set
When uploading a Python set object, DolphinDB iterates through its elements for type conversion, and then uploads the object as a DolphinDB set. The data type of the resulting DolphinDB set is determined by the converted type of its elements.
Example
>>> s.upload({'long_set': {1, 2}})
>>> s.run("typestr(long_set)")
LONG SET
>>> s.upload({'double_set': {1.2, np.double(5.5), pd.NaT}})
>>> s.run("typestr(double_set)")
DOUBLE SET
Note
(1) DolphinDB sets do not support elements of multiple data types, or vector elements.
(2) During type conversion, null values are not treated as any data type. Also, a set comprised only of null values is not allowed.
Dictionary
Similar to sets, when uploading a Python dict object, DolphinDB iterates through the dict's elements and converts their types. It then uploads the object as a DolphinDB dictionary.
Example
>>> s.upload({'long_dict': {'a': None, 'b': 1}})
>>> s.run("typestr(long_dict)")
STRING->LONG DICTIONARY
>>> s.upload({'any_dict1': {'a': 1, 'b': [1.1, 2.4], 'c': np.array([1, "a"], dtype="object")}})
>>> s.run("typestr(any_dict1)")
STRING->ANY DICTIONARY
>>> s.upload({'any_dict2': {1: [[1], [2, 3]], 2: [[1.1, np.nan], [3.3]]}})
>>> s.run("typestr(any_dict2)")
STRING->ANY DICTIONARY
Note
(1) DolphinDB dictionaries do not allow keys of multiple data types, but the values can be of different data types.
(2) When converting a dictionary’s elements, vector-like elements with embedded vector-like elements are converted as DolphinDB ANY vectors, not array vectors.
(3) During type conversion, null values are not treated as any data type. Also, a dictionary comprised only of null values is not allowed.
Table
Python uses pd.DataFrame to represent tables. When converting a pd.DataFrame, DolphinDB converts the object column-by-column as vector-like data structures. Unlike when converting Python vector-like data structures, the vector-like objects in pd.DataFrame columns are treated as DolphinDB array vectors; columns cannot be of the ANY vector type.
Null values in a table will be converted in the same logic as the vector-like objects. Direct upload of data into certain DolphinDB data types like BLOB or IN128 is not supported. To upload these data types, a __DolphinDB_Type__
attribute must be specified when uploading tables to explicitly cast columns into particular DolphinDB data types (see Explicit Type Conversion).
Note:
pd.DataFrames only support np.datetime64[ns] as a temporal data type. If time values are uploaded to DolphinDB directly, they will all be converted to the DolphinDB’s NANOTIMESTAMP type.
Example 1
>>> df1 = pd.DataFrame({
... 'int_v': [1, 2, 3],
... 'long_v': np.array([None, 3, np.int64(3)], dtype="object"),
... 'float_v': np.array([np.nan, 1.2, 3.3], dtype="float32")
... })
>>> s.upload({'df1': df1})
>>> s.run("schema(df1)")['colDefs']
name typeString typeInt extra comment
0 int_v LONG 5 NaN
1 long_v LONG 5 NaN
2 float_v FLOAT 15 NaN
When an uploaded object's dtype is "object", DolphinDB checks the data type of its elements and takes the type of the first non-null element as the object’s type. Null values (None, pd.NaT,np.nan) are not assigned a data type.
In the example above, the int values in the "int_v" column (without nulls) are converted to int64 by Pandas, so are eventually uploaded as the DolphinDB’s LONG type. The "long_v" column’s dtype is "object", so its type becomes that of its first non-null value, Python int 3, which maps to the DolphinDB’s LONG type. The "float_v" column’s dtype is "float", so it's uploaded as DolphinDB’s FLOAT type regardless of any null values.
Note: In NumPy and Pandas, null values are not allowed when an object's dtype is "bool", "int8", "int16", "int32" or "int64". Therefore, to upload table columns containing null values to DolphinDB, specify the dtype as "object" when constructing a pd.DataFrame.
Example 2
>>> df2 = pd.DataFrame({
... 'day_v': np.array(["2012-01-02", "2022-02-05"], dtype="datetime64[D]"),
... 'month_v': np.array([np.datetime64("2012-01", "M"), None], dtype="datetime64[M]"),
... })
>>> s.upload({'df2': df2})
>>> s.run("schema(df2)")['colDefs']
name typeString typeInt extra comment
0 day_v NANOTIMESTAMP 14 NaN
1 month_v NANOTIMESTAMP 14 NaN
Pandas supports only datetime64[ns] as temporal type, so data cannot be directly uploaded as some DolphinDB data types, such as DATE and MONTH. You can specify the __DolphinDB_Type__
attribute to explicitly cast columns into particular DolphinDB data types.
>>> import dolphindb.settings as keys
>>> df2.__DolphinDB_Type__ = {
... "day_v": keys.DT_DATE,
... "month_v": keys.DT_MONTH,
... }
...
>>> s.upload({'df2': df2})
>>> s.run("schema(df2)")['colDefs']
name typeString typeInt extra comment
0 day_v DATE 6 NaN
1 month_v MONTH 7 NaN
Example 3
>>> df3 = pd.DataFrame({
... 'long_av': [[1, None], [3]],
... 'double_av': np.array([[1.1], [np.nan, 3.3]], dtype="object")
... })
>>> s.upload({'df3': df3})
>>> s.run("schema(df3)")['colDefs']
name typeString typeInt extra comment
0 long_av LONG[] 69 NaN
1 double_av DOUBLE[] 80 NaN
In this example, when a column from a pd.DataFrame contains vector-like data structures, this column is converted as a DolphinDB array vector, not an ANY vector.
Example 4
Since version 3.0.0.0, Python API supports uploading the following pandas ExtensionDtype: Boolean/Int8/Int16/Int32/Int64/Float32/Float64/String.
>>> import pandas as pd
>>> df4 = pd.DataFrame({
... 'bool': pd.Series([True, False, None], dtype=pd.BooleanDtype()),
... 'int64': pd.Series([1, -100, None], dtype=pd.Int64Dtype()),
... 'float64': pd.Series([1.1, -0.23, None], dtype=pd.Float64Dtype()),
... 'string': pd.Series(["abc", "def", None], dtype=pd.StringDtype()),
... })
...
>>> df4.dtypes
bool boolean
int64 Int64
float64 Float64
string string[python]
dtype: object
>>> s.upload({'df4': df4})
>>> s.run("schema(df4)")['colDefs']
name typeString typeInt extra comment
0 bool BOOL 1 NaN
1 int64 LONG 5 NaN
2 float64 DOUBLE 16 NaN
3 string STRING 18 NaN
The following table shows the data type mapping when uploading pandas ExtensionDtype to DolphinDB server. For detailed information on explicit type conversion, refer to Explicit Type Conversion.
pandas ExtensionDtype | DolphinDB Data Type | Note |
---|---|---|
BooleanDtype | BOOL | |
Int8Dtype | CHAR | |
Int16Dtype | SHORT | |
Int32Dtype | INT | |
Int64Dtype | LONG | |
Float32Dtype | FLOAT | |
Float64Dtype | DOUBLE | |
StringDtype | SYMBOL | explicit type conversion required |
StringDtype | STRING | |
StringDtype | UUID | explicit type conversion required |
StringDtype | IPADDR | explicit type conversion required |
StringDtype | INT128 | explicit type conversion required |
StringDtype | BLOB | explicit type conversion required |
Example 5
From DolphinDB Python API 1.30.22.4, you can upload Pandas 2.0 DataFrames that use PyArrow as the storage backend.
>>> import pandas as pd # pandas version >= 2.0.0
>>> df4 = pd.DataFrame({
... 'int64': pd.Series([1, 2, None, 4], dtype="int64[pyarrow]"),
... 'float64': pd.Series([1.1, 2.2, None, 4.4], dtype="float64[pyarrow]"),
... 'string': pd.Series(["aa", "bb", None, "cc"], dtype="string[pyarrow]"),
... })
...
>>> df5.dtyps
int64 int64[pyarrow]
float64 double[pyarrow]
string string[pyarrow]
dtype: object
>>> s.upload({'df5': df5})
>>> s.run("schema(df5)")['colDefs']
name typeString typeInt extra comment
0 int64 LONG 5 NaN
1 float64 DOUBLE 16 NaN
2 string STRING 18 NaN
The table below shows the data type mappings between DataFrame/Series, PyArrow and DolphinDB. For details on explicit type conversion, see Explicit Type Conversion.
DataFrame/Series | PyArrow | DolphinDB | Note |
---|---|---|---|
bool[pyarrow] | pa.bool_() | BOOL | |
int8[pyarrow] | pa.int8() | CHAR | |
int16[pyarrow] | pa.int16() | SHORT | |
int32[pyarrow] | pa.int32() | INT | |
int64[pyarrow] | pa.int64() | LONG | |
date32[day][pyarrow] | pa.date32() | DATE | |
date32[day][pyarrow] | pa.date32() | MONTH | explicit type conversion required |
time32[ms][pyarrow] | pa.time32("ms") | TIME | |
time32[s][pyarrow] | pa.time32("s") | MINUTE | explicit type conversion required |
time32[s][pyarrow] | pa.time32("s") | SECOND | |
timestamp[s][pyarrow] | pa.timestamp("s") | DATETIME | |
timestamp[ms][pyarrow] | pa.timestamp("ms") | TIMESTAMP | |
time64[ns][pyarrow] | pa.time64("ns") | NANOTIME | |
timestamp[ns][pyarrow] | pa.timestamp("ns") | NANOTIMESTAMP | |
float[pyarrow] | pa.float32() | FLOAT | |
double[pyarrow] | pa.float64() | DOUBLE | |
dictionary<values=string, indices=int32, ordered=0>[pyarrow] | pa.dictionary(pa.int32(), pa.utf8()) | SYMBOL | |
string[pyarrow] | pa.utf8() | STRING | |
fixed_size_binary[16][pyarrow] | pa.binary(16) | UUID | explicit type conversion required |
timestamp[s][pyarrow] | pa.timestamp("s") | DATEHOUR | explicit type conversion required |
string[pyarrow] | pa.utf8() | IPADDR | explicit type conversion required |
fixed_size_binary[16][pyarrow] | pa.binary(16) | INT128 | |
large_binary[pyarrow] | pa.large_binary() | BLOB | |
decimal128(38, S)[pyarrow] | pa.decimal128(38, S) | DECIMAL32(S) | explicit type conversion required |
decimal128(38, S)[pyarrow] | pa.decimal128(38, S) | DECIMAL64(S) | |
decimal128(38, S)[pyarrow] | pa.decimal128(38,S) | DECIMAL128(S) | |
list<item: T>[pyarrow], e.g.,list<item: int32>[pyarrow] | pa.list_(T), e.g., pa.list_(pa.int32() ) | ARRAYVECTORINT ARRAYVECTOR | List arrays created by pa.list_ maps to DolphinDB array vectors |
Note: Starting from DolphinDB Python API version 1.30.22.6, the PROTOCOL_DDB supports uploading Decimal128 data. In earlier versions, pyarrow.decimal128 values were uploaded as Decimal64. Now pyarrow.decimal128 values are uploaded as Decimal128 by default. Alternatively, you can choose to upload pyarrow.decimal128 data as Decimal32 or Decimal64 by explicitly specifying those types.
Deserialization: From DolphinDB to Python (When pickleTableToList=False)
In this section, we will demonstrate how various DolphinDB data types and values are downloaded as Python objects with PROTOCOL_DDB enabled. We will use the run()
method as an example.
Note: In the following tables, "np.datetime64[D]" in the "Python Data Type" column indicates that the object’s type is np.datetime64 and dtype=datetime64[D].
Scalar
DolphinDB Data Type | DolphinDB Example | Python Data Type | Python Example |
---|---|---|---|
VOID | NULL | NoneType | None |
INT | int(NULL) | NoneType | None |
STRING | string(NULL) | NoneType | None |
BOOL | true | bool | True |
CHAR | 'a' | int | 97 |
SHORT | 224h | int | 224 |
INT | 16 | int | 16 |
LONG | 3000l | int | 3000 |
DATE | 2013.06.13 | np.datetime64[D] | 2013-06-13 |
MONTH | 2012.06M | np.datetime64[M] | 2012-06 |
TIME | 13:30:10.008 | np.datetime64[ms] | 1970-01-01T13:30:10.008 |
MINUTE | 13:30m | np.datetime64[m] | 1970-01-01T13:30 |
SECOND | 13:30:10 | np.datetime64[s] | 1970-01-01T13:30:10 |
DATETIME | 2012.06.13T13:30:10 | np.datetime64[s] | 2012-06-13T13:30:10 |
TIMESTAMP | 2012.06.13T13:30:10.008 | np.datetime64[ms] | 2012-06-13T13:30:10.008 |
NANOTIME | 13:30:10.008007006 | np.datetime64[ns] | 1970-01-01T13:30:10.008007006 |
NANOTIMESTAMP | 2012.06.13T13:30:10.008007006 | np.datetime64[ns] | 2012-06-13T13:30:10.008007006 |
FLOAT | 2.1f | float | 2.0999999046325684 |
DOUBLE | 2.1 | float | 2.1 |
SYMBOL | --- | --- | --- |
STRING | "Hello" | str | "Hello" |
UUID | uuid("5d212a78-cc48-e3b1-4235-b4d91473ee87") | str | "5d212a78-cc48-e3b1-4235-b4d91473ee87" |
DATEHOUR | datehour(2012.06.13T13:30:10) | np.datetime64[h] | 2012-06-13T13 |
IPADDR | ipaddr("192.168.1.13") | str | "192.168.1.13" |
INT128 | int128("e1671797c52e15f763380b45e841ec32") | str | "e1671797c52e15f763380b45e841ec32" |
BLOB | blob("xxxyyyzzz") | bytes | b'xxxyyyzzz' |
DECIMAL32 | decimal32(1.111, 4) | decimal.Decimal | 1.1110 |
DECIMAL64 | decimal64(1.123456789, 5) | decimal.Decimal | 1.12345 |
Note 1: In DolphinDB, null values can have data types other than VOID, such as INT and STRING. When downloaded to Python, DolphinDB's null scalar values will be converted to Python's None.
Note 2: DolphinDB does not have scalar SYMBOL values.
Note 3: Starting from DolphinDB Python API version 3.0.2.1, the Python data type corresponding to the DolphinDB BLOB type has been changed from str to bytes.
Vector
DolphinDB vectors are usually mapped to NumPy's numpy.ndarray in Python. However, DolphinDB's ANY vectors are mapped to Python's lists.
The following table shows how different types of vectors in DolphinDB are mapped to the dtype of numpy.ndarray in Python:
DolphinDB Data Type | np.dtype |
---|---|
BOOL (without Nulls) | bool |
CHAR (without Nulls) | int8 |
SHORT (without Nulls) | int16 |
INT (without Nulls) | int32 |
LONG (without Nulls) | int64 |
DATE | datetime64[D] |
MONTH | datetime64[M] |
TIME、TIMESTAMP | datetime64[ms] |
MINUTE | datetime64[m] |
SECOND、DATETIME | datetime64[s] |
NANOTIME、NANOTIMESTAMP | datetime64[ns] |
FLOAT | float32 |
DOUBLE, CHAR (with Nulls), SHORT (with Nulls), INT (with Nulls), LONG(with Nulls) | float64 |
DATEHOUR | datetime64[h] |
BOOL(with Nulls), SYMBOL, STRING, UUID, IPADDR, INT128, BLOB, DECIMAL32, DECIMAL64, Array Vector | object |
Note 1: NumPy's np.ndarray does not support INT null values. If a DolphinDB INT vector contains null values, the vector will be converted to float64 and the null values to np.nan.
Note 2: If a BOOL vector contains null values, the vector will be converted to np.ndarray with dtype=object
rather than dtype=bool
.
Note 3: DolphinDB's array vectors will be downloaded as Python objects with dtype=object
. Each element of the Python object is an np.ndarray converted from one of the original array vector's elements.
Note 4: For DolphinDB Python API 1.30.17.2 or later, DolphinDB ANY vectors are downloaded as Python lists. For earlier versions, DolphinDB ANY vectors are downloaded as np.ndarray in Python.
Note 5: Versions starting from DolphinDB Python API 1.30.22.3 support downloading Decimal32 array vectors and Decimal64 array vectors.
Example
>>> re = s.run("[true, false]")
>>> re
[ True False]
>>> type(re)
<class 'numpy.ndarray'>
>>> re.dtype
bool
>>> re = s.run("[true, None]")
>>> re
[True None]
>>> re.dtype
object
In this example, we first download a BOOL vector without null values from DolphinDB. The result is a NumPy np.ndarray object with dtype=bool
in Python. Then we download a Bool vector containing a null value, the downloaded object’s dtype becomes "object".
In contrast, when downloading INT vectors containing null values, the downloaded object’s dtype is float64:
>>> re = s.run("[1, 2, 3, NULL]")
>>> re
[ 1. 2. 3. nan]
>>> re.dtype
float64
When downloading a DolphinDB array vector, DolphinDB iterates through the elements of the array vector and converts the data type of each element as if it were a standard vector. In the following example, one element from the INT array vector to be downloaded contains a NULL. As a result, the other array vector elements were downloaded with dtype=int23, whereas the element with a NULL was downloaded with dtype=float64.
>>> s.run("arrayVector(2 3 4, [1, 2, 3, NULL])")
[array([1, 2], dtype=int32) array([3], dtype=int32) array([nan])]
When downloading an ANY vector, DolphinDB iterates through the elements of the ANY vector and converts each element based on the associated rules. In the following example, we download a DolphinDB ANY vector containing another ANY vector, and both are converted to a Python list.
>>> re = s.run('''(1, 2, [12, "aaa"])''')
>>> re
[1, 2, [12, 'aaa']]
>>> type(re)
<class 'list'>
>>> type(re[2])
<class 'list'>
For DolphinDB Python API 1.30.17.1 or earlier, ANY vectors are converted to np.ndarray with dtype=object in Python.
Pair
DolphinDB pairs are mapped to Python lists where each element is converted based on the rules for converting DolphinDB scalars.
Example
>>> s.run("100:0")
[100, 0]
Matrix
DolphinDB matrices are mapped to Python np.ndarray. The following table shows the mappings:
DolphinDB Data Type | np.dtype |
---|---|
BOOL (without nulls) | bool |
CHAR (without nulls) | int8 |
SHORT (without nulls) | int16 |
INT (without nulls) | int32 |
LONG (without nulls) | int64 |
DATE | datetime64[D] |
MONTH | datetime64[M] |
TIME、TIMESTAMP | datetime64[ms] |
MINUTE | datetime64[m] |
SECOND、DATETIME | datetime64[s] |
NANOTIME、NANOTIMESTAMP | datetime64[ns] |
FLOAT | float32 |
DOUBLE, CHAR (with nulls), SHORT (with nulls), INT (with nulls), LONG (with nulls) | float64 |
DATEHOUR | datetime64[h] |
BOOL (with nulls) | object |
Example
>>> s.run("""
... mtx = 1..12$4:3;
... mtx.rename!(1 2 3 4, `c1`c2`c3);
... mtx
... """)
[array([[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11],
[ 4, 8, 12]], dtype=int32), array([1, 2, 3, 4], dtype=int32), array(['c1', 'c2', 'c3'], dtype=object)]
DolphinDB matrices are downloaded to Python lists with column and row names retained. If the matrix doesn’t contain row or column names, the corresponding element in the Python list is filled with a None.
Note: For matrices of the temporal data type, its conversion rules are similar to converting vectors with PROTOCOL_DDB. With the PROTOCOL_PICKLE, the time values are all mapped to datetime64[ns]. For more information, see PROTOCOL_PICKLE.
Set
DolphinDB sets are mapped to sets in Python. Each element in the set is converted based on the rules for converting scalars (see section "Scalar").
Note: Only sets of the CHAR, SHORT, INT, LONG, FLOAT, DOUBLE, STRING or SYMBOL type can be downloaded to Python.
Example
>>> re = s.run("set(1..5)")
>>> re
{1, 2, 3, 4, 5}
>>> type(re)
<class 'set'>
Dict
DolphinDB dictionaries are mapped to dicts in Python. During the conversion, DolphinDB iterates through the key-value pairs in the dictionary. Each key is converted based on the rules for converting scalars. Each value is converted based on its data form and type, as described in section Deserialization: "From DolphinDB to Python (When pickleTableToList=False)".
Example
>>> re = s.run('''{"a": 123, "b": [1.1, 2.2]}''')
>>> re
{'b': array([1.1, 2.2]), 'a': 123}
>>> type(re)
<class 'dict'>
Table
DolphinDB tables are mapped to pandas.DataFrame in Python. During conversion, each table column is converted based on the rules for converting vectors.
However, unlike when converting vectors, time values in the table are all converted to the datetime64[ns] type, which is the only time type supported by Python pandas.
Note: The DolphinDB Python API only supports downloading array vectors. Downloading ANY vector from DolphinDB is not supported.
Deserialization: From DolphinDB to Python (When pickleTableToList=True)
When the additional parameter pickleTableToList is enabled, if the return value of the executed script is a table, it will be downloaded as a Python list instead of a pd.DataFrame, where each element of the list (np.ndarray) represents a column of the table.
Table
Unlike converting the vector-like data structures, when converting a DolphinDB table, each column is not treated as a separate vector, but rather part of the table, so the time type is converted to datetime[ns].
Note: When downloading a table containing a column of array vectors, ensure that each element in the array vector has the same size.
Example
>>> re = s.run("table([1, NULL] as a, [2012.01.02, NULL] as b)", pickleTableToList=True)
>>> re
[array([ 1., nan]), array(['2012-01-02T00:00:00.000000000', 'NaT'],
dtype='datetime64[ns]')]
>>> type(re)
<class 'list'>
>>> re[0].dtype
float64
>>> re[1].dtype
datetime64[ns]
>>> s.run("table(arrayVector(1 2 3, [1, 2, 3]) as a)", pickleTableToList=True)
[array([[1],
[2],
[3]], dtype=int32)]
In the example above, with pickleTableToList enabled, the table is downloaded to Python as a list with each element being an np.ndarray in Python. INT columns containing null values are downloaded with dtype=float64. Columns of time values are downloaded with dtype=datetime64[ns].