Join DFS Tables

Table joins on DFS tables are not supported outside SQL statements. The reason is that joining DFS tables without any filters such as select, where clauses usually results in unnecessarily large size tables. Table joins on DFS tables only support left join(lj / lsj), equi join(ej), asof join(aj), window join(wj), full join(fj).

Note:

  • When performing asof join or window join on DFS tables, you need to specify at least two joining columns.

  • When a DFS table joins with an in-memory table or a dimension table, the system will copy the in-memory or dimension table to all the nodes where the DFS table is located for joining. If the table is of significant size, it could be very time consuming to transfer the data. To improve performance, the system attempts to filter the in-memory table as much as possible with the where conditions before data transfer. In this case if the right table is too large, the execution could be slow. It is recommended to use a small right table for this case.

Examples

Create a DFS database:

dates=2019.01.01..2019.01.31
syms="A"+string(1..30)
sym_range=cutPoints(syms,3)
db1=database("",VALUE,dates)
db2=database("",RANGE,sym_range)
db=database("dfs://stock",COMPO,[db1,db2])
n=10000
datetimes=2019.01.01T00:00:00..2019.01.31T23:59:59
t=table(take(datetimes,n) as trade_time,take(syms,n) as sym,rand(1000,n) as qty,rand(500.0,n) as price)
trades=db.createPartitionedTable(t,`trades,`trade_time`sym).append!(t)

n=200
t2=table(take(datetimes,n) as trade_time,take(syms,n) as sym,rand(500.0,n) as bid,rand(500.0,n) as offer)
quotes=db.createPartitionedTable(t2,`quotes,`trade_time`sym).append!(t2)

t3=table(syms as sym,take(0 1,30) as type)
infos=db.createDimensionTable(t3,`infos).append!(t3)

Example 1. Join 2 DFS table trades and quotes:

// output
select * from ej(trades,quotes,`trade_time`sym);
trade_time sym qty price bid offer
2019.01.01T00:00:00 A1 39 7.366735 37.933525 446.917644
2019.01.01T00:00:09 A10 15 461.381014 405.092702 26.659516
2019.01.01T00:00:10 A11 987 429.981704 404.289413 347.64917
2019.01.01T00:00:11 A12 266 60.466206 420.426175 83.538043
2019.01.01T00:00:12 A13 909 362.057769 324.886047 162.502655
2019.01.01T00:00:13 A14 264 113.964472 497.598722 103.114702
2019.01.01T00:00:14 A15 460 347.518325 24.584629 357.854207
2019.01.01T00:00:15 A16 196 258.889177 49.467399 13.974672
2019.01.01T00:00:16 A17 198 403.564922 428.539984 208.410852
2019.01.01T00:00:17 A18 30 288.469046 41.905556 378.080141
... ... ... ... ... ...

Example 2. Join a DFS table and a dimension table.

// output
select * from lj(trades,infos,`sym);
trade_time sym qty price type
2019.01.01T00:00:00 A1 856 359.809918 0
2019.01.01T00:00:09 A10 368 305.801702 1
2019.01.01T00:00:10 A11 549 447.406744 0
2019.01.01T00:00:11 A12 817 115.613373 1
2019.01.01T00:00:12 A13 321 298.317481 0
2019.01.01T00:00:13 A14 3 2.289171 1
2019.01.01T00:00:14 A15 586 91.841629 0
2019.01.01T00:00:15 A16 745 43.256142 1
2019.01.01T00:00:16 A17 60 0.153205 0
... ... ... ... ...

Example 3. Join a DFS table and an in-memory table.

// output
tmp=table("A"+string(1..15) as sym,2019.01.11..2019.01.25 as date);
select * from ej(trades,tmp,`sym);
trade_time sym qty price date
2019.01.01T00:00:00 A1 856 359.809918 2019.01.11
2019.01.01T00:00:09 A10 368 305.801702 2019.01.20
2019.01.01T00:00:10 A11 549 447.406744 2019.01.21
2019.01.01T00:00:11 A12 817 115.613373 2019.01.22
2019.01.01T00:00:12 A13 321 298.317481 2019.01.23
2019.01.01T00:00:13 A14 3 2.289171 2019.01.24
2019.01.01T00:00:14 A15 586 91.841629 2019.01.25
2019.01.01T00:00:30 A1 390 325.407485 2019.01.11
... ... ... ... ...

Please refer to TableJoiners for more information.