Import Data
Data can be imported into DolphinDB in the following ways:
-
From text files
-
With HDF5 plugin
-
With ODBC plugin
-
With MySQL plugin
From Text Files
DolphinDB provides the following 3 functions to import data from text files:
-
loadText: load text files into memory.
-
ploadText: load text files into memory in parallel. It is faster than loadText.
-
loadTextEx: load text files into DolphinDB database.
loadText and ploadText need to be used with function append! and tableInsert to load text files into DolphinDB database.
t = ploadText("/stockData/trades.csv")
db=database("dfs://stock",VALUE,2019.08.01..2019.08.10)
pt=db.createPartitionedTable(t,`pt,`timestamp)
pt.append!(t)
loadText
and ploadText
load text files into memory
before saving the data to database on disk. Therefore they cannot be used to load a
text file larger than available memory on the local server. In comparison, function
loadTextEx loads a text file in many batches into memory and the . Therefore it can
import a text file much larger than available memory.
db=database("dfs://stock",VALUE,2019.08.01..2019.08.10)
loadTextEx(db,`pt,`timestamp,"/stockData/trades.csv")
In the following example, stock quotes data of each day since May 2018 is saved in a CSV file under the folder /stockData. Create a database and import the csv files into the database with the followng steps:
Create a distributed database (dfs://stockDB) with composite domain. It has a value domain based on date and a range domain based on sym. Considering that more data will be added to the database in the future, we extend the partitioning scheme of date.
t = ploadText("/stockData/TAQ20180501.csv")
tmp = select count(*) as count from t group by sym order by sym;
buckets = cutPoints(tmp.sym, 128, tmp.count)
dateDomain = database("", VALUE, 2018.05.01..2030.07.01)
symDomain = database("", RANGE, buckets)
stockDB = database("dfs://stockDB", COMPO, [dateDomain, symDomain])
stockDB.createPartitionedTable(t, "quotes", `date`sym)
Define a function loadCsv to import the CSV files into database.
def loadCsv(){
fileDir='/stockData'
filenames = exec filename from files(fileDir)
db = database("dfs://stockDB")
for(fname in filenames){
jobId = fname.strReplace(".csv", "")
submitJob(jobId, , loadTextEx{db, "quotes", `date`sym, fileDir+'/'+fname})
}
}
loadCsv()
With HDF5 Plugin
The HDF5 plugin provides the following methods:
-
hdf5::ls: list all groups and datasets in an HDF5 file.
-
hdf5::lsTable: list all datasets in an HDF5 file.
-
hdf5::hdf5DS: list the metadata of a dataset in an HDF5 file.
-
hdf5::loadHdf5: import HDF5 files into memory.
-
hdf5::loadHdf5Ex: import HDF5 files into distributed database.
-
hdf5::extractHdf5Schema: get schema of an HDF5 file.
Download HDF5 plugin, then save it under the directory /server/plugins.
To load the HDF5 plugin:
loadPlugin("plugins/hdf5/PluginHdf5.txt")
To call a plugin method, we can put its namespace before the method, such as
hdf5::loadHdf5
. Another way is to use keyword "use":
use hdf5
loadHdf5(filePath,tableName)
To import an HDF5 file candle_201801.h5 that includes Dataset candle_201801:
dataFilePath = "/home/data/candle_201801.h5"
datasetName = "candle_201801"
tmpTB = hdf5::loadHdf5(dataFilePath,datasetName)
If we need to specify schema for the imported table, we can use
hdf5::extractHdf5Schema
:
dataFilePath = "/home/data/candle_201801.h5"
datasetName = "candle_201801"
schema=hdf5::extractHdf5Schema(dataFilePath,datasetName)
update schema set type=`LONG where name=`volume
tt=hdf5::loadHdf5(dataFilePath,datasetName,schema)
To import an HDF5 file that is larger than available memory, use
hdf5::loadHdf5Ex
.
dataFilePath = "/home/data/candle_201801.h5"
datasetName = "candle_201801"
dfsPath = "dfs://dataImportHDF5DB"
db=database(dfsPath,VALUE,2018.01.01..2018.01.31)
hdf5::loadHdf5Ex(db, "cycle", "tradingDay", dataFilePath,datasetName)
With ODBC Plugin
DolphinDB supports ODBC interface to connect to third-party databases. Using this plugin, you can easily migrate data from ODBC-supported databases to DolphinDB.
The ODBC plugin provides the following methods:
-
odbc::connect: create connection.
-
odbc::close: close connection.
-
odbc::query: execute the given query and return the result as an in-memory table in DolphinDB.
-
odbc::execute: execute the given query but does not return the result.
Before using ODBC plugin, we need to install ODBC driver.
In the following example, we connect to the following SQL Server with ODBC plugin:
-
IP address: 172.18.0.15
-
Username: sa
-
Password: 123456
-
Database: SZ_TAQ
Download the plugin and copy all files under directory plugins/odbc to directory server/plugins/odbc. Then load the plugin and connect to SQL Server:
loadPlugin("plugins/odbc/odbc.cfg")
conn=odbc::connect("Driver=ODBC Driver 17 for SQL Server;Server=172.18.0.15;Database=SZ_TAQ;Uid=sa;Pwd=123456;")
Create DolphinDB database dfs://dataImportODBC. Use the schema of table candle_201801 in SQL Server to create table cycle in dfs://dataImportODBC.
tb = odbc::query(conn,"select top 1 * from candle_201801")
db=database("dfs://dataImportODBC",VALUE,2018.01.01..2018.01.31)
db.createPartitionedTable(tb, "cycle", "tradingDay")
Import data from SQL Server and save to table cycle:
tb = database("dfs://dataImportODBC").loadTable("cycle")
data = odbc::query(conn,"select * from candle_201801")
tb.append!(data);
With MySQL Plugin
To import data from MySQL databases, it's faster to use DolphinDB MySQL plugin than to use ODBC. It's also more convenient to use MySQL plugin as the data source does not need to be configured.
MySQL plugin provides the following methods:
-
mysql::connect: create connection.
-
mysql::showTables: list all tables in MySQL database.
-
mysql::extractSchema: get the schema of a MySQL table.
-
mysql::load: import data from MySQL into memory
-
mysql::loadEx: import data from MySQL into distributed database.
Download the plugin and copy all files under directory plugins/mysql to directory server/plugins/mysql.
loadPlugin("plugins/PluginMySQL.txt")
Connect to the local MySQL database employees:
conn=connect("127.0.0.1",3306,"root","123456","employees")
Create a distributed database in DolphinDB to save MySQL data:
db=database("dfs://mysql",VALUE,`F`M)
Import data:
pt=loadEx(conn,db,"pt","gender","employees")