Arrow
Apache Arrow defines a columnar memory format, which combines the benefits of columnar data structures with in-memory computing. With the DolphinDB Arrow plugin, you can use the Arrow format to interact with the DolphinDB server through Python API with automatic data type conversion.
Note:
- Starting from 2.00.11, the plugin name has been changed from "formatArrow" to "Arrow".
- Since version 2.00.12, the Arrow plugin can be directly downloaded from the plugin repository and loaded using the loadPlugin function. For versions 2.00.11 and earlier, the loadFormatPlugin function is required, which is used in the same way as loadPlugin but is specifically for loading data format plugins.
Installation (with installPlugin
)
Required server version: DolphinDB 2.00.12 or higher
Supported OS: Windows x86-64 and Linux x86-64.
Installation Steps:
(1) Use listRemotePlugins to check plugin information in the plugin repository.
Note: For plugins not included in the provided list, you can install through precompiled binaries or compile from source. These files can be accessed from our GitHub repository by switching to the appropriate version branch.
login("admin", "123456")
listRemotePlugins("arrow")
(2) Invoke installPlugin for plugin installation.
installPlugin("arrow")
(3) Use loadPlugin to load the plugin before using the plugin methods.
loadPlugin("arrow")
Method References
The Arrow plugin provides no user-callable interfaces.
The interfaces returned by the loadPlugin
function are only for internal use within DolphinDB and cannot be called by users through scripts.
Data Type Mappings
The Arrow plugin only supports one-way data transfer from DolphinDB to APIs and does not support receiving Arrow-formatted data from APIs.
Currently, only the Python API can download Arrow-formatted data using the PROTOCOL_ARROW protocol.
DolphinDB to Arrow
The plugin currently only supports serializing and transferring DolphinDB tables as Arrow tables. The data type mappings between DolphinDB and Arrow are as follows:
DolphinDB | Arrow |
---|---|
BOOL | boolean |
CHAR | int8 |
SHORT | int16 |
INT | int32 |
LONG | int64 |
DATE | date32 |
MONTH | date32 |
TIME | time32(ms) |
MINUTE | time32(s) |
SECOND | time32(s) |
DATETIME | timestamp(s) |
TIMESTAMP | timestamp(ms) |
NANOTIME | time64(ns) |
NANOTIMESTAMP | timestamp(ns) |
DATEHOUR | timestamp(s) |
FLOAT | float32 |
DOUBLE | float64 |
SYMBOL | dictionary(int32, utf8) |
STRING | utf8 |
IPADDR | utf8 |
UUID | fixed_size_binary(16) |
INT128 | fixed_size_binary(16) |
BLOB | large_binary |
DECIMAL32(X) | decimal128(38, X) |
DECIMAL64(X) | decimal128(38, X) |
Note:
- Array vectors of the types listed above (excluding the Decimal types) are also supported.
- Starting from version 2.00.11, the byte order of downloaded UUID/INT128 data matches the upload order, instead of reversing it.
Usage Example
DolphinDB server
login("admin", "123456");
loadPlugin("arrow");
Python API
loadFormatPlugin("path/to/Arrow/PluginArrow.txt")
import dolphindb as ddb
import dolphindb.settings as keys
s = ddb.session("192.168.1.113", 8848, "admin", "123456", protocol=keys.PROTOCOL_ARROW)
pat = s.run("table(1..10 as a)")
print(pat)
-------------------------------------------
pyarrow.Table
a: int32
----
a: [[1,2,3,4,5,6,7,8,9,10]]
Note: Currently, the DolphinDB server does not support enabling compression when the Arrow protocol is used.