Feather
Feather uses the Apache Arrow columnar memory format for data, which is organized for efficient analytic operations. The DolphinDB feather plugin supports efficient import and export of Feather files with automatic data type conversion. This plugin uses the read-write interface for Feather of the Arrow open-source library.
Installation (with installPlugin
)
Required server version: DolphinDB 2.00.10 or higher
Supported OS: Windows x86-64 and Linux x86-64
Installation Steps:
(1) Use listRemotePlugins to check plugin information in the plugin repository.
Note: For plugins not included in the provided list, you can install through precompiled binaries or compile from source. These files can be accessed from our GitHub repository by switching to the appropriate version branch.
login("admin", "123456")
listRemotePlugins()
(2) Invoke installPlugin for plugin installation
installPlugin("feather")
(3) Use loadPlugin to load the plugin before using the plugin methods.
loadPlugin("feather")
Method References
extractSchema
Syntax
extractSchema(filePath)
Details
Get the schema of a Feature file and return a table containing the following three columns:
- name: Column names
- type: Data type of Arrow
- DolphinDBType: Data type of DolphinDB
Note: If the value of a cell in column DolphinDBType is VOID, it indicates that the corresponding data type in Arrow cannot be converted.
Parameters
- filePath: A STRING scalar indicating the Feather file path.
Examples
feather::extractSchema("path/to/data.feather");
feather::extractSchema("path/to/data.compressed.feather");
load
Syntax
load(filePath, [columns])
Details
Load a Feather file to a DolphinDB in-memory table. Regarding data type conversion, see "Data Type Mappings".
Note:
- Since the minimum of DolphinDB integral type is a NULL character, the minimum of Arrow int8, int16, int32, and int64 cannot be imported into DolphinDB.
- The infinities and NaNs (not a number) of floating-point numbers are converted to NULL values in DolphinDB.
Parameters
- filePath: A STRING scalar indicating the Feather file path.
- columns (optional): A STRING vector indicating the name of the columns to be loaded.
Examples
table = feather::load("path/to/data.feather");
table_part = feather::load("path/to/data.feather", [ "col1_name","col2_name"]);
save
Syntax
save(table, filePath, [compressMethod], [compressionLevel])
Details
Parameters
- table: The table to be exported.
- filePath: A STRING scalar indicating the Feather file path.
- compression (optional): A STRING scalar indicating the following three compression methods: "uncompressed", "lz4", and "zstd" (case insensitive). The default is "lz4".
- compressionLevel (optional): An integer specifying the compression level. It is only effective when the parameter compression is set to "zstd".
Examples
feather::save(table, "path/to/save/data.feather");
feather::save(table, "path/to/save/data.feather", "lz4");
feather::save(table, "path/to/save/data.feather", "zstd", 2);
Data Type Mappings
Import
The following is the data type mappings when a Feather file is imported to DolphinDB:
Arrow | DolphinDB |
---|---|
bool | BOOL |
int8 | CHAR |
uint8 | SHORT |
int16 | SHORT |
uint16 | INT |
int32 | INT |
uint32 | LONG |
int64 | LONG |
uint64 | LONG |
float | FLOAT |
double | DOUBLE |
string | STRING |
date32 | DATE |
date64 | TIMESTAMP |
timestamp(ms) | TIMESTAMP |
timestamp(ns) | NANOTIMESTAMP |
time32(s) | SECOND |
time32(ms) | TIME |
time64(ns) | NANOTIME |
The following Arrow types are not supported for conversion: binary, fixed_size_binary, half_float, timestamp(us), time64(us), interval_months, interval_day_time, decimal128, decimal, decimal256, list, struct, sparse_union, dense_union, dictionary, map, extension, fixed_size_list, large_string, large_binary, large_list, interval_month_day_nano, max_id.
Export
The following is the data type mappings when exporting data from DolphinDB to a Feather file:
DolphinDB | Arrow |
---|---|
BOOL | bool |
CHAR | int8 |
SHORT | int16 |
INT | int32 |
LONG | int64 |
DATE | date32 |
TIME | time32(ms) |
SECOND | time32(s) |
TIMESTAMP | timestamp(ms) |
NANOTIME | time64(ns) |
NANOTIMESTAMP | timestamp(ns) |
FLOAT | float |
DOUBLE | double |
STRING | string |
SYMBOL | string |
The following DolphinDB data types are not supported for conversion: MINUTE, MONTH, DATETIME, UUID, FUNCTIONDEF, HANDLE, CODE, DATASOURCE, RESOURCE, ANY, COMPRESS, ANY DICTIONARY, DATEHOUR, IPADDR, INT128, BLOB, COMPLEX, POINT, DURATION.
Note:
You may encounter some problems when reading Feather files using Python.
Scenario 1: The error Value XXXXXXXXXXXXX has non-zero nanoseconds
is raised when reading the Feather file that contains data of type time64(ns) using pyarrow.feather.read_feather()
. When a table is converted to a DataFrame, the time64(ns) type is converted to the datetime.time type, which does not support temporal data in nanoseconds.
Solution: It is recommended to read with function pyarrow.feather.read_table()
.
Scenario 2: Use pyarrow.feather.read_feather()
to read Feather files that contain null integer columns will convert the integer columns to floating point types.
Solution: It is recommended to read Feather files into the pyarrow table and convert the data type by specifying types_mapper
.
pa_table = feather.read_table("path/to/feather_file")
df = pa_table.to_pandas(types_mapper={pa.int64(): pd.Int64Dtype()}.get)