PROTOCOL_ARROW

Apache Arrow is a protocol for serializing and deserializing large datasets. It is ideal when you need to efficiently transmit data across platforms or languages. By installing the DolphinDB Arrow plugin, you can use the Arrow format to communicate between the DolphinDB server and DolphinDB Python API.

Note: Please make sure the DolphinDB Arrow plugin has been installed before you enable PROTOCOL_ARROW in DolphinDB Python API. Otherwise, the API will still use the default PROTOCOL_DDB for communication and download DolphinDB tables as DataFrames.

Supported Data Form

PROTOCOL_ARROW currently only supports deserializing DolphinDB tables without compression.

Additional ParameterData FormSerializationDeserialization
NoneTable×

Enabling PROTOCOL_ARROW

Prerequisites

To use PROTOCOL_ARROW, we need to enable it in the DolphinDB session and DBConnectionPool objects by setting the protocol parameter to PROTOCOL_ARROW.

import dolphindb as ddb
import dolphindb.settings as keys

s = ddb.Session(protocol=keys.PROTOCOL_ARROW)
s.connect("localhost", 8848, "admin", "123456")

pool = ddb.DBConnectionPool("localhost", 8848, "admin", "123456", 10, protocol=keys.PROTOCOL_ARROW)

Deserialization: From DolphinDB to Python

Table

DolphinDB tables map tp Python pyarrow.Tables. The following table shows the data type mappings:

DolphinDB Data TypeArrow Data Type
BOOLboolean
CHARint8
SHORTint16
INTint32
LONGint64
DATEdate32
MONTHdate32
TIMEtime32(ms)
MINUTEtime32(s)
SECONDtime32(s)
DATETIMEtimestamp(s)
TIMESTAMPtimestamp(ms)
NANOTIMEtime64(ns)
NANOTIMESTAMPtimestamp(ns)
DATEHOURtimestamp(s)
FLOATfloat32
DOUBLEfloat64
SYMBOLdictionary(int32, utf8)
STRINGutf8
IPADDRutf8
UUIDfixed_size_binary(16)
INT128fixed_size_binary(16)
BLOBlarge_binary
DECIMAL32(X)decimal128(38, X)
DECIMAL64(X)decimal128(38, X)

Note

(1) PROTOCOL_ARROW also supports array vectors of the data types listed above (excluding DECIMAL32 and DECIMAL64). (2) As DolphinDB’s NANOTIME type maps to Arrow’s time64(ns), to convert a pyarrow.Table downloaded with PROTOCOL_ARROW to a pandas.DataFrame, the fractional values to be converted must be a multiple of 0.001. Otherwise, an error is reported: Value xxxxxxx has non-zero nanoseconds.

Example

>>> s.run("table(1..3 as a)")
pyarrow.Table
a: int32
----
a: [[1,2,3]]