Explicit Type Conversion

When uploading a pandas.DataFrame to DolphinDB using the upload method, some DolphinDB data types cannot be directly mapped from Python. Types like UUID, IPADDR, and SECOND do not have exact Python equivalents.

Starting in DolphinDB Python API version 1.30.22.1, explicit type conversion is supported. You can specify a __DolphinDB_Type__ attribute on the pandas.DataFrame, instructing how columns should be handled. The DolphinDB_Type attribute is a dictionary: the keys are column names and the values are the DolphinDB data types to convert those columns to.

Example (without explicit type conversion)

import dolphindb as ddb
import pandas as pd
import numpy as np

s = ddb.Session()
s.connect("localhost", 8848, "admin", "123456")
df = pd.DataFrame({
    'cint': [1, 2, 3],
    'csymbol': ["aaa", "bbb", "aaa"],
    'cblob': ["a1", "a2", "a3"],
})

s.upload({"df_wrong": df})
print(s.run("schema(df_wrong)")['colDefs'])

Output:

      name typeString  typeInt  extra comment
0     cint       LONG        5    NaN        
1  csymbol     STRING       18    NaN        
2    cblob     STRING       18    NaN  

As explained in PROTOCOL_DDB, if df is uploaded without explicit type conversion, the "cint" column (dtype int64) will be converted to LONG in DolphinDB. The columns "csymbol" and "cblob" will be converted to STRING type in DolphinDB.

Import dolphindb.settings. Specify the __DolphinDB_Type__ attribute on the pandas.DataFrame with a dictionary. The keys are the column names.

import dolphindb.settings as keys

df.__DolphinDB_Type__ = {
    'cint': keys.DT_INT,
    'csymbol': keys.DT_SYMBOL,
    'cblob': keys.DT_BLOB,
}

s.upload({"df_true": df})
print(s.run("schema(df_true)")['colDefs'])

Output:

      name typeString  typeInt  extra comment
0     cint        INT        4    NaN        
1  csymbol     SYMBOL       17    NaN        
2    cblob       BLOB       32    NaN       

Now all columns of the pandas.DataFrame are converted to the specified data type.

Starting DolphinDB Python API 1.30.22.4, explicit type conversions to Decimal32 and Decimal64 support specifying scale. For example:

from decimal import Decimal
df = pd.DataFrame({
    'decimal32': [Decimal("NaN"), Decimal("1.22")],
    'decimal64': [Decimal("1.33355"), Decimal("NaN")],
})
df.__DolphinDB_Type__ = {
    'decimal32': [keys.DT_DECIMAL32, 2],
    'decimal64': [keys.DT_DECIMAL64, 5],
}

s.upload({'df': df})
print(s.run("schema(df)")['colDefs'])
print('-' * 30)
print(s.run("df"))

Output:

        name    typeString  typeInt  extra comment
0  decimal32  DECIMAL32(2)       37      2        
1  decimal64  DECIMAL64(5)       38      5        
------------------------------
   decimal32  decimal64
0        NaN    1.33355
1       1.22        NaN