Table
The DolphinDB Python API provides native methods to create and use DolphinDB databases and tables. This section describes how to create and query tables.
Constructing Table Object with Session.table
The Table class wraps the DolphinDB table object handle, providing an interface to interact with the tables in Python. Table objects are usually constructed using the Session.table
or Session.loadTable
method. It can also be obtained through Session.loadText
, Session.loadTextEx
, etc.
Syntax of Session.table
:
Session.table(dbPath=None, data=None, tableAliasName=None, inMem=False, partitions=None)
- dbPath: database path. Leave this parameter unspecified when creating an in-memory table or a stream table.
- data: table data. It can be a dict, pandas.DataFrame, or a DolphinDB table name.
- tableAliasName: table alias
- inMem: whether to load table data into the DolphinDB server memory.
- partitions: partitions to be loaded into the server memory.
data: Uploading Data as a Temporary Table
When the data parameter is a dictionary or a Pandas DataFrame, the data is uploaded to DolphinDB server as a temporary table. In this case, dbPath, inMem, and partitions do not need to be specified.
Example:
data1 = pd.DataFrame({
'a': [1, 2, 3],
'b': [4, 5, 6],
})
t1 = s.table(data=data1)
print(t1, t1.tableName())
data2 = {
'a': ['a', 'b', 'c'],
'b': [1, 2, 3],
}
t2 = s.table(data=data2)
print(t2, t2.tableName())
Output:
<dolphindb.table.Table object at 0x7fbd5f02bd60> TMP_TBL_3cc57246
<dolphindb.table.Table object at 0x7fbd3205fc70> TMP_TBL_dbae4978
"data1" and "data2" are uploaded to the server as temporary tables "TMP_TBL_3cc57246" and "TMP_TBL_dbae4978", respectively.
data: Obtaining Handle to a DolphinDB Table
When the data parameter is a string indicating a DolphinDB table name, Session.table
obtains a handle to the table.
(1) When dbPath and data are both specified, calling Session.table
means to load the table with the name specified by data from the database specified by dbPAth.
dbPath = "dfs://testTable"
if s.existsDatabase(dbPath):
s.dropDatabase(dbPath)
db = s.database(partitionType=keys.VALUE, partitions=[1, 2, 3], dbPath=dbPath, engine="TSDB")
s.run("schema_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
schema_t = s.table(data="schema_t")
db.createTable(schema_t, "pt", ["csymbol"])
pt = s.table(dbPath=dbPath, data="pt")
print(pt, pt.tableName())
print(pt.toDF())
Output:
<dolphindb.table.Table object at 0x7f5036bcd040> pt_TMP_TBL_5229a3cc
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []
(2) When only data is specified, Session.table
obtains a handle to the in-memory table with the name specified by data.
s.run("test_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
t = s.table(data="test_t")
print(t, t.tableName())
print(t.toDF())
Output:
<dolphindb.table.Table object at 0x7f11ffb3c070> test_t
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []
tableAliasName
If tableAliasName is specified, Session.table
will assign it as the table handle instead of assigning a random string.
(1) Specify tableAliasName when uploading local data to server
data1 = pd.DataFrame({
'a': [1, 2, 3],
'b': [4, 5, 6],
})
t1 = s.table(data=data1, tableAliasName="data1")
print(t1, t1.tableName())
data2 = {
'a': ['a', 'b', 'c'],
'b': [1, 2, 3],
}
t2 = s.table(data=data2, tableAliasName="data2")
print(t2, t2.tableName())
Output:
<dolphindb.table.Table object at 0x7f167ecb69d0> data1
<dolphindb.table.Table object at 0x7f1651d0bc40> data2
(2) Specify tableAliasName when loading table from database to obtain its handle
dbPath = "dfs://testTable"
if s.existsDatabase(dbPath):
s.dropDatabase(dbPath)
db = s.database(partitionType=keys.VALUE, partitions=[1, 2, 3], dbPath=dbPath, engine="TSDB")
s.run("schema_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
schema_t = s.table(data="schema_t")
db.createTable(schema_t, "pt", ["csymbol"])
pt = s.table(dbPath=dbPath, data="pt", tableAliasName="tmp_pt")
print(pt, pt.tableName())
print(pt.toDF())
Output:
<dolphindb.table.Table object at 0x7f3350edc040> tmp_pt
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []
(3) tableAliasName does not take effect when obtaining the handle to a DolphinDB in-memory table:
s.run("test_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
t = s.table(data="test_t", tableAliasName="test_t2")
print(t, t.tableName())
print(t.toDF())
Output:
<dolphindb.table.Table object at 0x7f9fb55b4070> test_t
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []
tableAliasName can be used to specify an alias when loading a partitioned table or uploading local data as a DolphinDB table, but it does not work when loading a table with its name specified. This helps to avoid using a temporary table name.
inMem, partitions
These two parameters only take effect for local databases on disk. For details, see the DolphinDB User Manual - loadTable.
Session.loadTable
Session.loadTable
returns a Table object. It is similar to Session.table
. The difference is that Session.loadTable
does not upload local data and can only be used to obtain handles to tables on the DolphinDB server.
Session.loadTable(tableName, dbPath=None, partitions=None, memoryMode=False)
- tableName: the name of an in-memory table or a database table
- dbPath: database path
- partitions: partitions to be loaded into the server memory
- memoryMode: whether to load table data into the server memory
Loading an In-Memory Table
Example:
s.run("test_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
t = s.loadTable("test_t")
print(t, t.tableName())
print(t.toDF())
Output:
<dolphindb.table.Table object at 0x7fd1c90a4c10> test_t
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []
Loading a Database Table
Example:
dbPath = "dfs://testTable"
if s.existsDatabase(dbPath):
s.dropDatabase(dbPath)
db = s.database(partitionType=keys.VALUE, partitions=[1, 2, 3], dbPath=dbPath, engine="TSDB")
s.run("schema_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
schema_t = s.table(data="schema_t")
db.createTable(schema_t, "pt", ["csymbol"])
pt = s.loadTable("pt", dbPath=dbPath)
print(pt, pt.tableName())
print(pt.toDF())
Output:
<dolphindb.table.Table object at 0x7fdaf7885eb0> pt_TMP_TBL_0dfdc80a
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []
Life Cycle of Uploaded Tables
The table
and loadTable
methods both return a Python local variable. When local data is uploaded to the DolphinDB server without specifying an alias, a random name will be assigned as the handle to the uploaded variable.
The following example uploads local object "data" to the DolphinDB server as a table. "t" represents the local variable of the uploaded table. For the name of the uploaded table on the server, call Table.tableName
.
data = pd.DataFrame({
'a': [1, 2, 3],
'b': [4, 5, 6],
})
t = s.table(data=data)
print(t.tableName())
Output:
TMP_TBL_e03723c9
The prefix "TMP_TBL_" indicates that it is a handle to a temporary table. Once the Python Table object is destructed, the corresponding temporary table on the server will be destructed as well.
There are 3 options to release the table variable on DolphinDB server:
undef
s.undef(t.tableName(), "VAR")
- assign NULL value to the variable on DolphinDB server
s.run(f"{t.tableName()}=NULL")
- Destruct the local variable in Python to remove its reference to the server variable
del t
Note:
If an alias is specified when obtaining a handle, or the obtained handle already exists before the call, the destruction of the Python object will not affect the corresponding data on server.
If the data is used only once on DolphinDB server, it is recommended to include it as a parameter in a function call instead of uploading it as a table variable. A function call does not cache data. After the function call is executed, all variables are released. Additionally, a function call is faster to execute as the network transmission only takes place once.
Operating on Tables
The DolphinDB Python API provide a variety of methods to operate DolphinDB tables in Python, including querying data with conditions, and update and delete data.
Note: The objects returned by the update
and delete
methods are not Table objects, but TableUpdate and TableDelete objects. Although this step is transparent to users, after calling update
and delete
, you must call the execute
method to synchronize the table updates to the server. For details, see the description of these methods in the following documentation.
This section uses the example.csv file in examples.
Obtaining Table Properties
Use rows
and cols
to get the number of table rows and columns.
>>> s.run("t = table(1..5 as a, 2..6 as b)")
>>> t = s.table(data="t")
>>> t.rows
5
>>> t.cols
2
Use colNames
to get the column names of table table.
>>> t.colNames
['a', 'b']
schema
returns a Pandas DataFrame indicating the schema of the table. It has the same columns as the output of the server function schema().colDefs
.
>>> t.schema
name typeString typeInt extra comment
0 a INT 4 NaN
1 b INT 4 NaN
select
Similar to the SQL "select" keyword, the select
method selects columns from a table. The following examples use the toDF
method to convert the table to a Pandas DataFrame object.
(1) List of column names as input
>>> trade=s.loadText(WORK_DIR+"/example.csv")
>>> trade.select(["ticker", "date"]).toDF()
ticker date
0 AMZN 1997-05-15
1 AMZN 1997-05-16
2 AMZN 1997-05-19
3 AMZN 1997-05-20
4 AMZN 1997-05-21
...
(2) String as input
>>> trade.select("ticker, date, bid").toDF()
ticker date bid
0 AMZN 1997-05-15 23.50000
1 AMZN 1997-05-16 20.50000
2 AMZN 1997-05-19 20.50000
3 AMZN 1997-05-20 19.62500
4 AMZN 1997-05-21 17.12500
...
update
Use update
to update a table. The cols parameter indicates the columns to update and the vals parameter indicate the values for update.
As shown in the following examples, to synchronize the updates to server, execute()
must be executed at the same time.
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> t1 = trade.update(["VOL"],["999999"]).where("TICKER=`AMZN").where(["date=2015.12.16"])
>>> t2 = trade.where("ticker=`AMZN").where("date=2015.12.16")
>>> t2.toDF()
TICKER date VOL PRC BID ASK
0 AMZN 2015-12-16 3964470 675.77002 675.76001 675.83002
>>> t1 = trade.update(["VOL"],["999999"]).where("TICKER=`AMZN").where(["date=2015.12.16"]).execute()
>>> t2.toDF()
TICKER date VOL PRC BID ASK
0 AMZN 2015-12-16 999999 675.77002 675.76001 675.83002
delete
Use delete
to delete records from table. To synchronize the changes to server, execute
must be executed at the same time.
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> trade.rows
13136
>>> t = trade.delete().where('date<2013.01.01')
>>> trade.rows
13136
>>> t = trade.delete().where('date<2013.01.01').execute()
>>> trade.rows
3024
groupby
groupby
groups records based on the specified rules. It is equivalent to the DolphinDB server function groupby
.
groupby
accepts string list or a single string as argument, which indicates the column(s) to group by. If the SQL statement also includes a having
condition, it must contain an aggregate function, meaning to generate a record for each group satisfying the having
condition.
Note: groupby
must be followed by an aggregate function such as count
, sum
, avg
, std
, etc.
Prepare the data:
>>> dbPath = "dfs://valuedb"
>>> if s.existsDatabase(dbPath):
... s.dropDatabase(dbPath)
>>> s.database(dbName='mydb', partitionType=keys.VALUE, partitions=["AMZN","NFLX","NVDA"], dbPath=dbPath)
>>> trade = s.loadTextEx(dbPath=dbPath, partitionColumns=["TICKER"], tableName='trade', remoteFilePath=WORK_DIR+"/example.csv")
Calculate the sum of column "vol" and the sum of column "prc" in each "ticker" group:
>>> trade.select(['sum(vol)','sum(prc)']).groupby(['ticker']).toDF()
ticker sum_vol sum_prc
0 AMZN 33706396492 772503.81377
1 NFLX 14928048887 421568.81674
2 NVDA 46879603806 127139.51092
Use groupby
and having
in the same query:
>>> trade.select('count(ask)').groupby(['vol']).having('count(ask)>1').toDF()
vol count_ask
0 579392 2
1 3683504 2
2 5732076 2
3 6299736 2
4 6438038 2
5 6946976 2
6 8160197 2
7 8924303 2
contextby
contextby
is a unique feature in DolphinDB. It is an extension to standard SQL for convenient time-series data manipulation.
contextby
is similar to groupby
except that for each group, groupby
returns a scalar whereas contextby
returns a vector of the same size as the number of rows in the group. For details, see the DolphinDB User Manual - context by.
If having
is used after contextby
and is used with only aggregate functions, the result is the rows of the members of the groups whose aggregate function values satisfy the specified conditions. If having
is after contextby
and is used with a function that is not aggregate function, the result is the rows of the records that satisfy the specified conditions.
>>> trade.contextby('ticker').top(3).toDF()
TICKER date VOL PRC BID ASK
0 AMZN 1997-05-15 6029815 23.5000 23.5000 23.6250
1 AMZN 1997-05-16 1232226 20.7500 20.5000 21.0000
2 AMZN 1997-05-19 512070 20.5000 20.5000 20.6250
3 NFLX 2002-05-23 7507079 16.7500 16.7500 16.8500
4 NFLX 2002-05-24 797783 16.9400 16.9400 16.9500
5 NFLX 2002-05-28 474866 16.2000 16.2000 16.3700
6 NVDA 1999-01-22 5702636 19.6875 19.6250 19.6875
7 NVDA 1999-01-25 1074571 21.7500 21.7500 21.8750
8 NVDA 1999-01-26 719199 20.0625 20.0625 20.1250
>>> trade.select("TICKER, month(date) as month, cumsum(VOL)").contextby("TICKER,month(date)").toDF()
TICKER month cumsum_VOL
0 AMZN 1997-05-01 6029815
1 AMZN 1997-05-01 7262041
2 AMZN 1997-05-01 7774111
3 AMZN 1997-05-01 8230468
4 AMZN 1997-05-01 9807882
...
Use contextby
and having
in the same query:
>>> trade.contextby('ticker').having("sum(VOL)>40000000000").toDF()
TICKER date VOL PRC BID ASK
0 NVDA 1999-01-22 5702636 19.6875 19.6250 19.6875
1 NVDA 1999-01-25 1074571 21.7500 21.7500 21.8750
2 NVDA 1999-01-26 719199 20.0625 20.0625 20.1250
3 NVDA 1999-01-27 510637 20.0000 19.8750 20.0000
4 NVDA 1999-01-28 476094 19.9375 19.8750 20.0000
...
pivotby
pivotby
is a unique feature in DolphinDB. It is an extension to standard SQL for convenient time-series data manipulation. It rearranges a column (or multiple columns) of a table (with or without a data transformation function) on two dimensions. For details, see the DolphinDB User Manual - pivot by.
When used with select
, pivotby
returns a table.
>>> trade = s.table("dfs://valuedb", "trade")
>>> t1 = trade.select("VOL").pivotby("TICKER", "date")
>>> t1.toDF()
TICKER 1997.05.15 1997.05.16 ... 2016.12.28 2016.12.29 2016.12.30
0 AMZN 6029815.0 1232226.0 ... 3301025 3158299 4139451
1 NFLX NaN NaN ... 4388956 3444729 4455012
2 NVDA NaN NaN ... 57384116 54384676 30323259
When used with select
, pivotby
returns a DolphinDB matrix.
>>> trade.exec("VOL").pivotby("TICKER", "date").toDF()
[array([[ 6029815., 1232226., 512070., ..., 3301025., 3158299.,
4139451.],
[ nan, nan, nan, ..., 4388956., 3444729.,
4455012.],
[ nan, nan, nan, ..., 57384116., 54384676.,
30323259.]]), array(['AMZN', 'NFLX', 'NVDA'], dtype=object), array(['1997-05-15T00:00:00.000000000', '1997-05-16T00:00:00.000000000',
'1997-05-19T00:00:00.000000000', ...,
'2016-12-28T00:00:00.000000000', '2016-12-29T00:00:00.000000000',
'2016-12-30T00:00:00.000000000'], dtype='datetime64[ns]')]
sort
, csort
sort
and csort
can be used to sort the data in ascending ("asc") or descending ("desc") order. sort
is equivalent to the "order by" SQL clause whereas csort
is only used to sort the data in each group after contextby
.
Use sort
to sort query result:
>>> trade = s.loadTable("trade", "dfs://valuedb")
>>> trade.sort('date').toDF()
TICKER date VOL PRC BID ASK
0 AMZN 1997-05-15 6029815 23.500 23.50000 23.62500
1 AMZN 1997-05-16 1232226 20.750 20.50000 21.00000
2 AMZN 1997-05-19 512070 20.500 20.50000 20.62500
3 AMZN 1997-05-20 456357 19.625 19.62500 19.75000
4 AMZN 1997-05-21 1577414 17.125 17.12500 17.25000
...
Use csort
with a contextby
clause:
>>> trade = s.loadTable("trade", "dfs://valuedb")
>>> trade.contextby('ticker').csort('date desc').toDF()
TICKER date VOL PRC BID ASK
0 AMZN 2016-12-30 4139451 749.87000 750.02002 750.40002
1 AMZN 2016-12-29 3158299 765.15002 764.66998 765.15997
2 AMZN 2016-12-28 3301025 772.13000 771.92999 772.15997
3 AMZN 2016-12-27 2638725 771.40002 771.40002 771.76001
4 AMZN 2016-12-23 1981616 760.59003 760.33002 760.59003
...
You can also specify the sorting order by passing in arguments:
sort(by, ascending=True)
csort(by, ascending=True)
The ascending parameter indicates whether to sort data in ascending order. The default value is True. You can specify different sorting methods for multiple columns by passing in a list.
>>> trade.select("*").contextby('ticker').csort(["TICKER", "VOL"], True).limit(5).toDF()
TICKER date VOL PRC BID ASK
0 AMZN 1997-12-26 40721 54.2500 53.8750 54.625
1 AMZN 1997-08-12 47939 26.3750 26.3750 26.750
2 AMZN 1997-07-21 48325 26.1875 26.1250 26.250
3 AMZN 1997-08-13 49690 26.3750 26.0000 26.625
4 AMZN 1997-06-02 49764 18.1250 18.1250 18.375
5 NFLX 2002-09-05 20725 12.8500 12.8500 12.950
6 NFLX 2002-11-11 26824 8.4100 8.3000 8.400
7 NFLX 2002-09-04 27319 13.0000 12.8200 13.000
8 NFLX 2002-06-10 35421 16.1910 16.1900 16.300
9 NFLX 2002-09-06 54951 12.8000 12.7900 12.800
10 NVDA 1999-05-10 41250 17.5000 17.5000 17.750
11 NVDA 1999-05-07 52310 17.5000 17.3750 17.625
12 NVDA 1999-05-14 59807 18.0000 17.7500 18.000
13 NVDA 1999-04-01 63997 20.5000 20.1875 20.500
14 NVDA 1999-04-19 65940 19.0000 19.0000 19.125
>>> trade.select("*").contextby('ticker').csort(["TICKER", "VOL"], [True, False]).limit(5).toDF()
TICKER date VOL PRC BID ASK
0 AMZN 2007-04-25 104463043 56.8100 56.80 56.8100
1 AMZN 1999-09-29 80380734 80.7500 80.75 80.8125
2 AMZN 2006-07-26 76996899 26.2600 26.17 26.1800
3 AMZN 2007-04-26 62451660 62.7810 62.77 62.8300
4 AMZN 2005-02-03 60580703 35.7500 35.74 35.7300
5 NFLX 2015-07-16 63461015 115.8100 115.85 115.8600
6 NFLX 2015-08-24 59952448 96.8800 96.85 96.8800
7 NFLX 2016-04-19 55728765 94.3400 94.30 94.3100
8 NFLX 2016-07-19 55685209 85.8400 85.81 85.8300
9 NFLX 2016-01-20 53009419 107.7400 107.73 107.7800
10 NVDA 2011-01-06 87693472 19.3300 19.33 19.3400
11 NVDA 2011-02-17 87117555 25.6800 25.68 25.7000
12 NVDA 2011-01-12 86197484 23.3525 23.34 23.3600
13 NVDA 2011-08-12 80488616 12.8800 12.86 12.8700
14 NVDA 2003-05-09 77604776 21.3700 21.39 21.3700
top
, limit
top
is used to get the top N number of records in a table. For details, see the DolphinDB User Manual - top.
>>> trade = s.table("dfs://valuedb", "trade")
>>> trade.top(5).toDF()
TICKER date VOL PRC BID ASK
0 AMZN 1997-05-15 6029815 23.500 23.500 23.625
1 AMZN 1997-05-16 1232226 20.750 20.500 21.000
2 AMZN 1997-05-19 512070 20.500 20.500 20.625
3 AMZN 1997-05-20 456357 19.625 19.625 19.750
4 AMZN 1997-05-21 1577414 17.125 17.125 17.250
The limit
clause is similar to the top
clause with the following differences:
- The
top
clause cannot use negative integers. When used with thecontext by
clause, thelimit
clause can use a negative integer to select a limited number of records from the end of each group. In all other cases, thelimit
clause can only use non-negative integers. - The
limit
clause can select a limited number of rows starting from a specified row.
For details, see the DolphinDB User Manual - limit.
>>> trade.select("*").contextby('ticker').limit(-2).toDF()
TICKER date VOL PRC BID ASK
0 AMZN 2016-12-29 3158299 765.15002 764.66998 765.15997
1 AMZN 2016-12-30 4139451 749.87000 750.02002 750.40002
2 NFLX 2016-12-29 3444729 125.33000 125.31000 125.33000
3 NFLX 2016-12-30 4455012 123.80000 123.80000 123.83000
4 NVDA 2016-12-29 54384676 111.43000 111.26000 111.42000
5 NVDA 2016-12-30 30323259 106.74000 106.73000 106.75000
>>> trade.select("*").limit([2, 5]).toDF()
TICKER date VOL PRC BID ASK
0 AMZN 1997-05-19 512070 20.500 20.500 20.625
1 AMZN 1997-05-20 456357 19.625 19.625 19.750
2 AMZN 1997-05-21 1577414 17.125 17.125 17.250
3 AMZN 1997-05-22 983855 16.750 16.625 16.750
4 AMZN 1997-05-23 1330026 18.000 18.000 18.125
merge
, merge_asof
, merge_window
, and merge_cross
Use merge
for inner joins, left joins, left-semi joins, and outer joins. Use merge_asof
for asof joins, and merge_window
for window joins.
merge
Specify join columns with parameter on if join column names are identical in both tables; use parameters left_on and right_on when join column names are different. The optional parameter how indicates the table join type. The default table join mode is inner join.
- When join column names are identical in both tables:
>>> trade = s.table("dfs://valuedb", "trade")
>>> t1 = s.table(data={
... 'TICKER': ['AMZN', 'AMZN', 'AMZN'],
... 'date': np.array(['2015-12-31', '2015-12-30', '2015-12-29'], dtype='datetime64[D]'),
... 'open': [695, 685, 674],
... })
...
>>> t1 = t1.select("TICKER, date(date) as date, open")
>>> trade.merge(t1,on=["TICKER","date"]).toDF()
TICKER date VOL PRC BID ASK open
0 AMZN 2015-12-29 5734996 693.96997 693.96997 694.20001 674
1 AMZN 2015-12-30 3519303 689.07001 689.07001 689.09998 685
2 AMZN 2015-12-31 3749860 675.89001 675.85999 675.94000 695
- Specify parameters left_on and right_on when the join column names are different.
>>> trade = s.table("dfs://valuedb", "trade")
>>> t1 = s.table(data={
... 'TICKER': ['AMZN', 'AMZN', 'AMZN'],
... 'date': np.array(['2015-12-31', '2015-12-30', '2015-12-29'], dtype='datetime64[D]'),
... 'open': [695, 685, 674],
... })
...
>>> t1 = t1.select("TICKER as TICKER1, date(date) as date1, open")
>>> trade.merge(t1, left_on=["TICKER","date"], right_on=["TICKER1", "date1"]).toDF()
TICKER date VOL PRC BID ASK open
0 AMZN 2015-12-29 5734996 693.96997 693.96997 694.20001 674
1 AMZN 2015-12-30 3519303 689.07001 689.07001 689.09998 685
2 AMZN 2015-12-31 3749860 675.89001 675.85999 675.94000 695
- To conduct left join, set how = "left".
>>> trade = s.table("dfs://valuedb", "trade")
>>> t1 = s.table(data={
... 'TICKER': ['AMZN', 'AMZN', 'AMZN'],
... 'date': np.array(['2015-12-31', '2015-12-30', '2015-12-29'], dtype='datetime64[D]'),
... 'open': [695, 685, 674],
... })
...
>>> t1 = t1.select("TICKER, date(date) as date, open")
>>> trade.merge(t1,how="left", on=["TICKER","date"]).where('TICKER=`AMZN').where('2015.12.23<=date<=2015.12.31').toDF()
TICKER date VOL PRC BID ASK open
0 AMZN 2015-12-23 2722922 663.70001 663.48999 663.71002 NaN
1 AMZN 2015-12-24 1092980 662.78998 662.56000 662.79999 NaN
2 AMZN 2015-12-28 3783555 675.20001 675.00000 675.21002 NaN
3 AMZN 2015-12-29 5734996 693.96997 693.96997 694.20001 674.0
4 AMZN 2015-12-30 3519303 689.07001 689.07001 689.09998 685.0
5 AMZN 2015-12-31 3749860 675.89001 675.85999 675.94000 695.0
- To conduct outer join, set how = "outer".
>>> t1 = s.table(data={'TICKER': ['AMZN', 'AMZN', 'NFLX'], 'date': ['2015.12.29', '2015.12.30', '2015.12.31'], 'open': [674, 685, 942]})
>>> t2 = s.table(data={'TICKER': ['AMZN', 'NFLX', 'NFLX'], 'date': ['2015.12.29', '2015.12.30', '2015.12.31'], 'close': [690, 936, 951]})
>>> t1.merge(t2, how="outer", on=["TICKER","date"]).toDF()
TICKER date open tmp_TICKER tmp_date close
0 AMZN 2015.12.29 674.0 AMZN 2015.12.29 690.0
1 AMZN 2015.12.30 685.0 NaN
2 NFLX 2015.12.31 942.0 NFLX 2015.12.31 951.0
3 NaN NFLX 2015.12.30 936.0
Note: A partitioned table can only be outer joined with a partitioned table, and an in-memory table can only be outer joined with an in-memory table.
merge_asof
The merge_asof
method is equivalent to the keyword asof join (aj)
in DolphinDB (see the DolphinDB User Manual). asof join
is similar to left join
with the following differences:
- For asof join, the data type of the last matching column is usually temporal. For a row in the left table with time t, if there is not a match of left join in the right table, the row in the right table that corresponds to the most recent time before time t is used, if all the other matching columns are matched; if there are more than one matching record in the right table, the last record is used.
- If there is only 1 join column, asof join assumes the right table is sorted on the join column. If there are multiple join columns, asof join assumes the right table is sorted on the last join column within each group defined by the other join columns. The right table does not need to be sorted by the other join columns. If these conditions are not met, we may see unexpected results. The left table does not need to be sorted.
For the examples in this and the next section, we use trades.csv and quotes.csv which have AAPL and FB trades and quotes data on 10/24/2016 taken from the NYSE website.
>>> dbPath = "dfs://tickDB"
>>> if s.existsDatabase(dbPath):
... s.dropDatabase(dbPath)
...
>>> s.database(partitionType=keys.VALUE, partitions=["AAPL","FB"], dbPath=dbPath)
>>> trades = s.loadTextEx(dbPath, tableName='trades', partitionColumns=["Symbol"], remoteFilePath=WORK_DIR+"/trades.csv")
>>> quotes = s.loadTextEx(dbPath, tableName='quotes', partitionColumns=["Symbol"], remoteFilePath=WORK_DIR+"/quotes.csv")
>>> trades.top(5).toDF()
Time Exchange Symbol Trade_Volume Trade_Price
0 1970-01-01 08:00:00.022239 75 AAPL 300 27.00
1 1970-01-01 08:00:00.022287 75 AAPL 500 27.25
2 1970-01-01 08:00:00.022317 75 AAPL 335 27.26
3 1970-01-01 08:00:00.022341 75 AAPL 100 27.27
4 1970-01-01 08:00:00.022368 75 AAPL 31 27.40
>>> quotes.where("second(Time)>=09:29:59").top(5).toDF()
Time Exchange Symbol Bid_Price Bid_Size Offer_Price Offer_Size
0 1970-01-01 09:30:00.005868 90 AAPL 26.89 1 27.10 6
1 1970-01-01 09:30:00.011058 90 AAPL 26.89 11 27.10 6
2 1970-01-01 09:30:00.031523 90 AAPL 26.89 13 27.10 6
3 1970-01-01 09:30:00.284623 80 AAPL 26.89 8 26.98 8
4 1970-01-01 09:30:00.454066 80 AAPL 26.89 8 26.98 1
>>> trades.merge_asof(quotes,on=["Symbol","Time"]).select(["Symbol","Time","Trade_Volume","Trade_Price","Bid_Price", "Bid_Size","Offer_Price", "Offer_Size"]).top(5).toDF()
Symbol Time Trade_Volume Trade_Price Bid_Price Bid_Size \
0 AAPL 1970-01-01 08:00:00.022239 300 27.00 26.9 1
1 AAPL 1970-01-01 08:00:00.022287 500 27.25 26.9 1
2 AAPL 1970-01-01 08:00:00.022317 335 27.26 26.9 1
3 AAPL 1970-01-01 08:00:00.022341 100 27.27 26.9 1
4 AAPL 1970-01-01 08:00:00.022368 31 27.40 26.9 1
Offer_Price Offer_Size
0 27.49 10
1 27.49 10
2 27.49 10
3 27.49 10
4 27.49 10
merge_window
merge_window
is equivalent to the keyword window join (wj)
in DolphinDB (see the DolphinDB User Manual). It is a generalization of asof join. With a window defined by parameters leftBound (w1) and rightBound (w2), for each row in the left table with the value of the last join column equal to t, find the rows in the right table with the value of the last join column between (t+w1) and (t+w2) conditional on all other join columns are matched, then apply aggFunctions to the selected rows in the right table.
The only difference between window join
and prevailing window join
is that if the right table doesn't contain a matching value for t+w1 (the left boundary of the window), prevailing window join will fill it with the last value before t+w1 (conditional on all other join columns are matched), and apply aggFunctions. To use prevailing window join
, set prevailing = True.
>>> trades.merge_window(quotes, -5000000000, 0, aggFunctions=["avg(Bid_Price)","avg(Offer_Price)"], on=["Symbol","Time"]).where("Time>=07:59:59").top(10).toDF()
Time Exchange Symbol Trade_Volume Trade_Price avg_Bid_Price avg_Offer_Price
0 1970-01-01 08:00:00.022239 75 AAPL 300 27.00 26.90 27.49
1 1970-01-01 08:00:00.022287 75 AAPL 500 27.25 26.90 27.49
2 1970-01-01 08:00:00.022317 75 AAPL 335 27.26 26.90 27.49
3 1970-01-01 08:00:00.022341 75 AAPL 100 27.27 26.90 27.49
4 1970-01-01 08:00:00.022368 75 AAPL 31 27.40 26.90 27.49
5 1970-01-01 08:00:02.668076 68 AAPL 2434 27.42 26.75 27.36
6 1970-01-01 08:02:20.116025 68 AAPL 66 27.00 NaN NaN
7 1970-01-01 08:06:31.149930 75 AAPL 100 27.25 NaN NaN
8 1970-01-01 08:06:32.826399 75 AAPL 100 27.25 NaN NaN
9 1970-01-01 08:06:33.168833 75 AAPL 74 27.25 NaN NaN
[10 rows x 6 columns]
merge_cross
merge_cross
is equivalent to the keyword cross join (cj)
in DolphinDB (see the DolphinDB User Manual). It returns the Cartesian product of two tables. If the left table has n rows and the right table has m rows, then cross join returns n*m rows.
>>> s.run("""
... t1 = table(2010 2011 2012 as year);
... t2 = table(`IBM`C`AAPL as Ticker);
... """)
...
>>> t1 = s.table(data="t1")
>>> t2 = s.table(data="t2")
>>> t1.merge_cross(t2).toDF()
year Ticker
0 2010 IBM
1 2010 C
2 2010 AAPL
3 2011 IBM
4 2011 C
5 2011 AAPL
6 2012 IBM
7 2012 C
8 2012 AAPL
rename
Use rename
to set a new name for a table.
>>> t.tableName()
t
>>> t.rename("xx")
>>> t.tableName()
xx
Note: Renaming a temporary table can potentially cause memory leaks. This happens because renaming a table prevents it from being automatically destroyed, leading to memory occupied by the table not being freed in a timely manner.
drop
Use drop
to delete elements from a table.
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> trade.colNames
['TICKER', 'date', 'VOL', 'PRC', 'BID', 'ASK']
>>> t1 = trade.drop(['ask', 'bid'])
>>> t1.colNames
['TICKER', 'date', 'VOL', 'PRC']
exec
The select
clause always generates a table, even when only one column is selected. To generate a scalar or vector, you can use exec
clause.
If only one column is selected, exec
generates a DolphinDB vector. Download the object with toDF()
in Python and you can obtain an np.ndarray object:
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> trade.exec("ticker").toDF()
['AMZN' 'AMZN' 'AMZN' ... 'NFLX' 'NFLX' 'NFLX']
If multiple columns are selected, same as select
, the exec
clause generates a DolphinDB table. Download the object with toDF()
in Python and you can obtain a pd.DataFrame object.
>>> trade.exec(["ticker", "date", "bid"]).toDF()
ticker date bid
0 AMZN 1997-05-15 23.50000
1 AMZN 1997-05-16 20.50000
2 AMZN 1997-05-19 20.50000
3 AMZN 1997-05-20 19.62500
4 AMZN 1997-05-21 17.12500
...
where
The where
clause is used to extract only the records that satisfy the specified condition or conditions.
(1) Multiple where
Conditions
When the where
clause is used, if multiple filter conditions are specified, each condition is connected by "and" by default.
Note: To ensure the correct interpretation of the priority of operations when adding multiple "where" conditions before Python API 1.30.21.2, an additional pair of parentheses should be included for each condition. For instance, the SQL query select * from table where (a = false) and (b = true or c = true)
should be represented as t.where("(a = false)").where("(b = true or c = true)")
.
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> t1 = trade.select(['date','bid','ask','prc','vol']).where('TICKER=`AMZN').where('bid!=NULL').where('ask!=NULL').where('vol>10000000')
>>> t1.toDF()
date bid ask prc vol
0 1998-09-01 79.93750 80.25000 79.95313 11321844
1 1998-11-17 148.68750 149.00000 148.50000 10279448
2 1998-11-20 179.62500 179.75000 180.62500 11314228
3 1998-11-23 217.75000 217.81250 218.00000 11559042
4 1998-11-24 214.25000 214.62500 214.50000 13820992
...
>>> t1.rows
765
>>> t1.showSQL()
select date,bid,ask,prc,vol from TMP_TBL_2744917d where (TICKER=`AMZN) and (bid!=NULL) and (ask!=NULL) and (vol>10000000)
(2) String as Input
We can pass a list of conditions as a string to where
method.
>>> t1 = trade.select("ticker, date, vol").where("bid!=NULL, ask!=NULL, vol>50000000")
>>> t1.toDF()
ticker date vol
0 AMZN 1999-09-29 80380734
1 AMZN 2000-06-23 52221978
2 AMZN 2001-11-26 51543686
3 AMZN 2002-01-22 57235489
4 AMZN 2005-02-03 60580703
...
>>> t1.rows
41
execute
(1) For the Table class, use execute
to select column(s) as a scalar or vector, which is equivalent to using exec
. Pass a string or a list of strings to execute
to indicate the column(s) to select.
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> trade.execute("TICKER")
['AMZN' 'AMZN' 'AMZN' ... 'NFLX' 'NFLX' 'NFLX']
>>> trade.execute(["VOL", "PRC"])
VOL PRC
0 6029815 23.50000
1 1232226 20.75000
2 512070 20.50000
3 456357 19.62500
4 1577414 17.12500
... ... ...
13131 2010120 125.59000
13132 5287520 128.35001
13133 4388956 125.89000
13134 3444729 125.33000
13135 4455012 123.80000
[13136 rows x 2 columns]
(2) After applying update
or delete
on a table object, call execute
to make sure the operations are synchronized to the server. In this case, submit the update
/delete
statement to the server. A table object is returned.
executeAs
executeAs
saves query result as a DolphinDB table. The table name is specified by parameter newTableName.
Note: Tables created by this method is independent of the Python script. Instead, their lifespan is tied to the lifespan of the session object.
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> t1 = trade.select(['date','bid','ask','prc','vol']).where('TICKER=`AMZN').where('bid!=NULL').where('ask!=NULL').where('vol>10000000').executeAs("AMZN")
>>> t1.tableName()
AMZN
showSQL
Use showSQL
to convert the SQL call to a standard DolphinDB SQL statement.
>>> trade=s.loadText(WORK_DIR+"/example.csv")
>>> trade.select(["ticker", "date"]).showSQL()
select ticker,date from TMP_TBL_fb11c541
toDF
, toList
toDF
and toList
both execute the SQL statement (can be obtained with showSQL
) cached in Python and return the execution result. The differences are:
toDF
has the same behavior assession.run(sql)
whereastoList
has the same behavior assession.run(sql,pickleTableToList)
.- When a session is constructed, if protocol is set to PROTOCOL_PICKLE or PROTOCOL_DDB,
toDF
returns a pd.DataFrame, andtoList
returns a list of np.ndarrays where each np.ndarray represents a table column.
For details, see PROTOCOL_DDB and PROTOCOL_PICKLE.