Explicit Type Conversion

When uploading a pandas.DataFrame to DolphinDB using the upload method, some DolphinDB data types cannot be directly mapped from Python. Types like UUID, IPADDR, and SECOND do not have exact Python equivalents.

Starting in DolphinDB Python API version 1.30.22.1, explicit type conversion is supported. You can specify a __DolphinDB_Type__ attribute on the pandas.DataFrame, instructing how columns should be handled. The DolphinDB_Type attribute is a dictionary: the keys are column names and the values are the DolphinDB data types to convert those columns to.

Example (without explicit type conversion)

import dolphindb as ddb
import pandas as pd
import numpy as np

s = ddb.Session()
s.connect("localhost", 8848, "admin", "123456")
df = pd.DataFrame({
    'cint': [1, 2, 3],
    'csymbol': ["aaa", "bbb", "aaa"],
    'cblob': ["a1", "a2", "a3"],
})

s.upload({"df_wrong": df})
print(s.run("schema(df_wrong)")['colDefs'])

Output:

      name typeString  typeInt  extra comment
0     cint       LONG        5    NaN        
1  csymbol     STRING       18    NaN        
2    cblob     STRING       18    NaN  

As explained in PROTOCOL_DDB, if df is uploaded without explicit type conversion, the "cint" column (dtype int64) will be converted to LONG in DolphinDB. The columns "csymbol" and "cblob" will be converted to STRING type in DolphinDB.

Import dolphindb.settings. Specify the __DolphinDB_Type__ attribute on the pandas.DataFrame with a dictionary.The dictionary keys represent column names. The dictionary values define column types: int is supported in all versions, while str is available starting from version 3.0.2.3.

import dolphindb.settings as keys

df.__DolphinDB_Type__ = {
    'cint': keys.DT_INT,
    'csymbol': keys.DT_SYMBOL,
    'cblob': keys.DT_BLOB,
}

s.upload({"df_true": df})
print(s.run("schema(df_true)")['colDefs'])

For version 3.0.2.3 and later, the dictionary values in the above script can be specified as str:

import dolphindb.settings as keys

df.__DolphinDB_Type__ = {
    'cint': "INT",
    'csymbol': "SYMBOL",
    'cblob': "BLOB",
}

s.upload({"df_true": df})
print(s.run("schema(df_true)")['colDefs'])

Output:

      name typeString  typeInt  extra comment
0     cint        INT        4    NaN        
1  csymbol     SYMBOL       17    NaN        
2    cblob       BLOB       32    NaN       

Now all columns of the pandas.DataFrame are converted to the specified data type.

Starting DolphinDB Python API 1.30.22.4, explicit type conversions to Decimal32 and Decimal64 support specifying scale. For example:

from decimal import Decimal
df = pd.DataFrame({
    'decimal32': [Decimal("NaN"), Decimal("1.22")],
    'decimal64': [Decimal("1.33355"), Decimal("NaN")],
})
df.__DolphinDB_Type__ = {
    'decimal32': [keys.DT_DECIMAL32, 2],
    'decimal64': [keys.DT_DECIMAL64, 5],
}

s.upload({'df': df})
print(s.run("schema(df)")['colDefs'])
print('-' * 30)
print(s.run("df"))

Output:

        name    typeString  typeInt  extra comment
0  decimal32  DECIMAL32(2)       37      2        
1  decimal64  DECIMAL64(5)       38      5        
------------------------------
   decimal32  decimal64
0        NaN    1.33355
1       1.22        NaN

Starting from version 3.0.2.3, the new get_types_from_schema interface has been added to retrieve table schema information. When uploading a pandas.DataFrame, this interface can be used to set the DolphinDB_Type attribute.

Syntax

get_types_from_schema(df_schema_info)
  • df_schema_info: A required DataFrame containing the following columns:
    • ‘name’: required, str, column name.
    • 'typeInt': required, int, column type.
    • ‘extra’: optional, must be convertible to int, precision of Decimal data.

Example

The following example first creates a table t and retrieves its schema. Then it uses get_types_from_schema to obtain the table schema information in a dictionary, type_dict. Finally, it assigns type_dict as an attribute to the pandas.DataFrame.

from decimal import Decimal

s.run('t = table(1:0, ["col1", "col2"], [INT, DECIMAL32(3)])')
schema_info = s.run("schema(t)")['colDefs']

type_dict = ddb.utils.get_types_from_schema(schema_info)

df = pd.DataFrame({
    'col1': [1],
    'col2': [Decimal("3.141")],
})
df.__DolphinDB_Type__ = type_dict

s.upload({'df': df})
print(s.run("schema(df)")['colDefs'])

Output:

{'col1': ['INT', None], 'col2': ['DECIMAL32', 3]}
---------------------------------------------
   name    typeString  typeInt  extra comment
0  col1           INT        4    NaN        
1  col2  DECIMAL32(3)       37    3.0      

Each column in the pandas.DataFrame is correctly converted to the specified type.