PROTOCOL_PICKLE

The pickle module implements binary protocols for serializing and de-serializing a Python object structure. It enables data type conversion between Python object and byte stream. The DolphinDB provides the deserialization solution PROTOCOL_PICKLE, which is based on the Python pickle module with DolphinDB customizations. PROTOCOL_PICKLE is used in DolphinDB Python API only and supports a limited variety of DolphinDB data forms and data types.

Note

  • DolphinDB data forms refer to data structures, such as scalar, vector, table, etc. (See DolphinDB User Manual - Data Forms)
  • DolphinDB data types refer to specific data types, such as INT, DOUBLE, DATETIME, etc. (See DolphinDB User Manual - Data Types)
  • In the following sections, the Python libraries NumPy and pandas will be referred to as np and pd, respectively.

Enabling PROTOCOL_PICKLE

To use PROTOCOL_PICKLE, we need to enable it in the DolphinDB session and DBConnectionPool objects by setting the protocol parameter to PROTOCOL_PICKLE. In the current Python API version, PROTOCOL_PICKLE is used in default and is equivalent to PROTOCOL_DEFAULT.

import dolphindb as ddb
import dolphindb.settings as keys

s = ddb.session(protocol=keys.PROTOCOL_PICKLE)
s.connect("localhost", 8848, "admin", "123456")

pool = ddb.DBConnectionPool("localhost", 8848, "admin", "123456", 10, protocol=keys.PROTOCOL_PICKLE)

Supported Data Forms

Additional ParameterData FormSerializationDeserialization
pickleTableToList=FalseMatrix×
pickleTableToList=FalseTable×
pickleTableToList=TrueTable×

Deserialization: From DolphinDB to Python (When pickleTableToList=False)

Matrix

DolphinDB matrices map to Python np.ndarrays. The following table shows the data type mappings:

DolphinDB Data Typenp.dtype
BOOL (without nulls)bool
CHAR (without nulls)int8
SHORT (without nulls)int16
INT (without nulls)int32
LONG (without nulls)int64
DATE, MONTH, TIME, TIMESTAMP, MINUTE, SECOND, DATETIME, NANOTIME, NANOTIMESTAMP, DATEHOURdatetime64[ns]
FLOATfloat32
DOUBLE, CHAR (with nulls), SHORT (with nulls), INT (with nulls), LONG (with nulls)float64
BOOL (with nulls)object

When using the PROTOCOL_PICKLE protocol, a DolphinDB matrix is downloaded as a list of three elements, similar to PROTOCOL_DDB. The first element is a np.ndarray containing the matrix data. The second and third elements represent the row and columns names, if specified. Otherwise, None is used in place of row or column names.

Example

>>> s.run("date([2012.01.02, 2012.02.03])$1:2")
[array([['2012-01-02T00:00:00.000000000', '2012-02-03T00:00:00.000000000']],
      dtype='datetime64[ns]'), None, None]

Note: If PROTOCOL_DDB is specified to download a DolphinDB matrix of time values, the dtype of np.ndarray will match the time granularity of the matrix, e.g., datetime64[D] / datetime64[ms] / datetime64[M] /…; If PROTOCOL_PICKLE is specified, the dtype of the np.ndarray is always datetime64[ns].

Table

The following table shows the data type mappings for table columns:

DolphinDB Data Typenp.dtype
BOOL (without nulls)bool
CHAR (without nulls)int8
SHORT (without nulls)int16
INT (without nulls)int32
LONG (without nulls)int64
DATE, MONTH, TIME, TIMESTAMP, MINUTE, SECOND, DATETIME, NANOTIME, NANOTIMESTAMP, DATEHOURdatetime64[ns]
FLOATfloat32
DOUBLE, CHAR (with nulls), SHORT (with nulls), INT (with nulls), LONG (with nulls)float64
BOOL (with nulls), SYMBOL, STRING, UUID, IPADDR, INT128, Array Vectorobject

Note

(1) BLOB, DECIMAL32 and DECIMAL 64 columns are currently not supported by PROTOCOL_PICKLE. (2) Array vectors of UUID, IPADDR and INT128 types are currently not supported by PROTOCOL_PICKLE.

Example

 >>> re = s.run("table([1, NULL] as a, [2012.01.02, 2012.01.05] as b)")
>>> re
     a          b
0  1.0 2012-01-02
1  NaN 2012-01-05
>>> re['a'].dtype
float64
>>> re['b'].dtype
datetime64[ns]

Deserialization: From DolphinDB to Python (When pickleTableToList=True)

Table

When PROTOCOL_PICKLE is specified, and the additional parameter pickleTableToList to set to True when calling the run method, a DolphinDB table will be downloaded as a list of np.ndarrays, each representing a table column. When downloading a table containing a column of array vectors, ensure that each element in the array vector has the same size - array vector columns are downloaded as two-dimensional np.ndarrays in Python.

The type conversion rules are the same as described in PROTOCOL_DDB.