PROTOCOL_PICKLE
The pickle module implements binary protocols for serializing and de-serializing a Python object structure. It enables data type conversion between Python object and byte stream. The DolphinDB provides the deserialization solution PROTOCOL_PICKLE, which is based on the Python pickle module with DolphinDB customizations. PROTOCOL_PICKLE is used in DolphinDB Python API only and supports a limited variety of DolphinDB data forms and data types.
Note
- DolphinDB data forms refer to data structures, such as scalar, vector, table, etc. (See DolphinDB User Manual - Data Forms)
- DolphinDB data types refer to specific data types, such as INT, DOUBLE, DATETIME, etc. (See DolphinDB User Manual - Data Types)
- In the following sections, the Python libraries NumPy and pandas will be referred to as np and pd, respectively.
Enabling PROTOCOL_PICKLE
To use PROTOCOL_PICKLE, we need to enable it in the DolphinDB session and DBConnectionPool objects by setting the protocol parameter to PROTOCOL_PICKLE. In the current Python API version, PROTOCOL_PICKLE is used in default and is equivalent to PROTOCOL_DEFAULT.
import dolphindb as ddb
import dolphindb.settings as keys
s = ddb.session(protocol=keys.PROTOCOL_PICKLE)
s.connect("localhost", 8848, "admin", "123456")
pool = ddb.DBConnectionPool("localhost", 8848, "admin", "123456", 10, protocol=keys.PROTOCOL_PICKLE)
Supported Data Forms
Additional Parameter | Data Form | Serialization | Deserialization |
---|---|---|---|
pickleTableToList=False | Matrix | × | √ |
pickleTableToList=False | Table | × | √ |
pickleTableToList=True | Table | × | √ |
Deserialization: From DolphinDB to Python (When pickleTableToList=False)
Matrix
DolphinDB matrices map to Python np.ndarrays. The following table shows the data type mappings:
DolphinDB Data Type | np.dtype |
---|---|
BOOL (without nulls) | bool |
CHAR (without nulls) | int8 |
SHORT (without nulls) | int16 |
INT (without nulls) | int32 |
LONG (without nulls) | int64 |
DATE, MONTH, TIME, TIMESTAMP, MINUTE, SECOND, DATETIME, NANOTIME, NANOTIMESTAMP, DATEHOUR | datetime64[ns] |
FLOAT | float32 |
DOUBLE, CHAR (with nulls), SHORT (with nulls), INT (with nulls), LONG (with nulls) | float64 |
BOOL (with nulls) | object |
When using the PROTOCOL_PICKLE protocol, a DolphinDB matrix is downloaded as a list of three elements, similar to PROTOCOL_DDB. The first element is a np.ndarray containing the matrix data. The second and third elements represent the row and columns names, if specified. Otherwise, None is used in place of row or column names.
Example
>>> s.run("date([2012.01.02, 2012.02.03])$1:2")
[array([['2012-01-02T00:00:00.000000000', '2012-02-03T00:00:00.000000000']],
dtype='datetime64[ns]'), None, None]
Note: If PROTOCOL_DDB is specified to download a DolphinDB matrix of time values, the dtype of np.ndarray will match the time granularity of the matrix, e.g., datetime64[D] / datetime64[ms] / datetime64[M] /…; If PROTOCOL_PICKLE is specified, the dtype of the np.ndarray is always datetime64[ns].
Table
The following table shows the data type mappings for table columns:
DolphinDB Data Type | np.dtype |
---|---|
BOOL (without nulls) | bool |
CHAR (without nulls) | int8 |
SHORT (without nulls) | int16 |
INT (without nulls) | int32 |
LONG (without nulls) | int64 |
DATE, MONTH, TIME, TIMESTAMP, MINUTE, SECOND, DATETIME, NANOTIME, NANOTIMESTAMP, DATEHOUR | datetime64[ns] |
FLOAT | float32 |
DOUBLE, CHAR (with nulls), SHORT (with nulls), INT (with nulls), LONG (with nulls) | float64 |
BOOL (with nulls), SYMBOL, STRING, UUID, IPADDR, INT128, Array Vector | object |
Note
(1) BLOB, DECIMAL32 and DECIMAL 64 columns are currently not supported by PROTOCOL_PICKLE. (2) Array vectors of UUID, IPADDR and INT128 types are currently not supported by PROTOCOL_PICKLE.
Example
>>> re = s.run("table([1, NULL] as a, [2012.01.02, 2012.01.05] as b)")
>>> re
a b
0 1.0 2012-01-02
1 NaN 2012-01-05
>>> re['a'].dtype
float64
>>> re['b'].dtype
datetime64[ns]
Deserialization: From DolphinDB to Python (When pickleTableToList=True)
Table
When PROTOCOL_PICKLE is specified, and the additional parameter pickleTableToList to set to True when calling the run
method, a DolphinDB table will be downloaded as a list of np.ndarrays, each representing a table column. When downloading a table containing a column of array vectors, ensure that each element in the array vector has the same size - array vector columns are downloaded as two-dimensional np.ndarrays in Python.
The type conversion rules are the same as described in PROTOCOL_DDB.