Explicit Type Conversion
When uploading a pandas.DataFrame to DolphinDB using the upload
method, some DolphinDB data types cannot be directly mapped from Python. Types like UUID, IPADDR, and SECOND do not have exact Python equivalents.
Starting in DolphinDB Python API version 1.30.22.1, explicit type conversion is supported. You can specify a __DolphinDB_Type__
attribute on the pandas.DataFrame, instructing how columns should be handled. The DolphinDB_Type attribute is a dictionary: the keys are column names and the values are the DolphinDB data types to convert those columns to.
Example (without explicit type conversion)
import dolphindb as ddb
import pandas as pd
import numpy as np
s = ddb.Session()
s.connect("localhost", 8848, "admin", "123456")
df = pd.DataFrame({
'cint': [1, 2, 3],
'csymbol': ["aaa", "bbb", "aaa"],
'cblob': ["a1", "a2", "a3"],
})
s.upload({"df_wrong": df})
print(s.run("schema(df_wrong)")['colDefs'])
Output:
name typeString typeInt extra comment
0 cint LONG 5 NaN
1 csymbol STRING 18 NaN
2 cblob STRING 18 NaN
As explained in PROTOCOL_DDB, if df
is uploaded without explicit type conversion, the "cint" column (dtype int64) will be converted to LONG in DolphinDB. The columns "csymbol" and "cblob" will be converted to STRING type in DolphinDB.
Import dolphindb.settings
. Specify the __DolphinDB_Type__
attribute on the pandas.DataFrame with a dictionary.The dictionary keys represent column names. The dictionary values define column types: int is supported in all versions, while str is available starting from version 3.0.2.3.
import dolphindb.settings as keys
df.__DolphinDB_Type__ = {
'cint': keys.DT_INT,
'csymbol': keys.DT_SYMBOL,
'cblob': keys.DT_BLOB,
}
s.upload({"df_true": df})
print(s.run("schema(df_true)")['colDefs'])
For version 3.0.2.3 and later, the dictionary values in the above script can be specified as str:
import dolphindb.settings as keys
df.__DolphinDB_Type__ = {
'cint': "INT",
'csymbol': "SYMBOL",
'cblob': "BLOB",
}
s.upload({"df_true": df})
print(s.run("schema(df_true)")['colDefs'])
Output:
name typeString typeInt extra comment
0 cint INT 4 NaN
1 csymbol SYMBOL 17 NaN
2 cblob BLOB 32 NaN
Now all columns of the pandas.DataFrame are converted to the specified data type.
Starting DolphinDB Python API 1.30.22.4, explicit type conversions to Decimal32 and Decimal64 support specifying scale. For example:
from decimal import Decimal
df = pd.DataFrame({
'decimal32': [Decimal("NaN"), Decimal("1.22")],
'decimal64': [Decimal("1.33355"), Decimal("NaN")],
})
df.__DolphinDB_Type__ = {
'decimal32': [keys.DT_DECIMAL32, 2],
'decimal64': [keys.DT_DECIMAL64, 5],
}
s.upload({'df': df})
print(s.run("schema(df)")['colDefs'])
print('-' * 30)
print(s.run("df"))
Output:
name typeString typeInt extra comment
0 decimal32 DECIMAL32(2) 37 2
1 decimal64 DECIMAL64(5) 38 5
------------------------------
decimal32 decimal64
0 NaN 1.33355
1 1.22 NaN
Starting from version 3.0.2.3, the new get_types_from_schema interface has been added to retrieve table schema information. When uploading a pandas.DataFrame, this interface can be used to set the DolphinDB_Type attribute.
Syntax
get_types_from_schema(df_schema_info)
- df_schema_info: A required DataFrame containing the following columns:
- ‘name’: required, str, column name.
- 'typeInt': required, int, column type.
- ‘extra’: optional, must be convertible to int, precision of Decimal data.
Example
The following example first creates a table t and retrieves its schema. Then it uses get_types_from_schema to obtain the table schema information in a dictionary, type_dict. Finally, it assigns type_dict as an attribute to the pandas.DataFrame.
from decimal import Decimal
s.run('t = table(1:0, ["col1", "col2"], [INT, DECIMAL32(3)])')
schema_info = s.run("schema(t)")['colDefs']
type_dict = ddb.utils.get_types_from_schema(schema_info)
df = pd.DataFrame({
'col1': [1],
'col2': [Decimal("3.141")],
})
df.__DolphinDB_Type__ = type_dict
s.upload({'df': df})
print(s.run("schema(df)")['colDefs'])
Output:
{'col1': ['INT', None], 'col2': ['DECIMAL32', 3]}
---------------------------------------------
name typeString typeInt extra comment
0 col1 INT 4 NaN
1 col2 DECIMAL32(3) 37 3.0
Each column in the pandas.DataFrame is correctly converted to the specified type.