PROTOCOL_ARROW
Apache Arrow is a protocol for serializing and deserializing large datasets. It is ideal when you need to efficiently transmit data across platforms or languages. By installing the DolphinDB Arrow plugin, you can use the Arrow format to communicate between the DolphinDB server and DolphinDB Python API.
Note: Please make sure the DolphinDB Arrow plugin has been installed before you enable PROTOCOL_ARROW in DolphinDB Python API. Otherwise, the API will still use the default PROTOCOL_DDB for communication and download DolphinDB tables as DataFrames.
Supported Data Form
PROTOCOL_ARROW currently only supports deserializing DolphinDB tables without compression.
Additional Parameter | Data Form | Serialization | Deserialization |
---|---|---|---|
None | Table | √ | × |
Enabling PROTOCOL_ARROW
Prerequisites
- Install pyarrow 9.0.0 or later in Python environment.
- Installing the DolphinDB Arrow plugin
To use PROTOCOL_ARROW, we need to enable it in the DolphinDB session and DBConnectionPool objects by setting the protocol parameter to PROTOCOL_ARROW.
import dolphindb as ddb
import dolphindb.settings as keys
s = ddb.Session(protocol=keys.PROTOCOL_ARROW)
s.connect("localhost", 8848, "admin", "123456")
pool = ddb.DBConnectionPool("localhost", 8848, "admin", "123456", 10, protocol=keys.PROTOCOL_ARROW)
Deserialization: From DolphinDB to Python
Table
DolphinDB tables map tp Python pyarrow.Tables. The following table shows the data type mappings:
DolphinDB Data Type | Arrow Data Type |
---|---|
BOOL | boolean |
CHAR | int8 |
SHORT | int16 |
INT | int32 |
LONG | int64 |
DATE | date32 |
MONTH | date32 |
TIME | time32(ms) |
MINUTE | time32(s) |
SECOND | time32(s) |
DATETIME | timestamp(s) |
TIMESTAMP | timestamp(ms) |
NANOTIME | time64(ns) |
NANOTIMESTAMP | timestamp(ns) |
DATEHOUR | timestamp(s) |
FLOAT | float32 |
DOUBLE | float64 |
SYMBOL | dictionary(int32, utf8) |
STRING | utf8 |
IPADDR | utf8 |
UUID | fixed_size_binary(16) |
INT128 | fixed_size_binary(16) |
BLOB | large_binary |
DECIMAL32(X) | decimal128(38, X) |
DECIMAL64(X) | decimal128(38, X) |
Note
(1) PROTOCOL_ARROW also supports array vectors of the data types listed above (excluding DECIMAL32 and DECIMAL64).
(2) As DolphinDB’s NANOTIME type maps to Arrow’s time64(ns), to convert a pyarrow.Table downloaded with PROTOCOL_ARROW to a pandas.DataFrame, the fractional values to be converted must be a multiple of 0.001. Otherwise, an error is reported: Value xxxxxxx has non-zero nanoseconds
.
Example
>>> s.run("table(1..3 as a)")
pyarrow.Table
a: int32
----
a: [[1,2,3]]