Working with Partitioned In-memory Tables

In DolphinDB, all partition schemes for databases also apply to in-memory tables except composite domain. Working with partitioned in-memory tables can utilize parallel computing capacities of multi-core CPUs.

1. Load Data as In-memory Partitioned Tables

DolphinDB offers various ways to load a data set as a partitioned in-memory table. We start with creating a text file for the examples below.

n=30000000
workDir = "C:/DolphinDB/Data"
if(!exists(workDir)) mkdir(workDir)
trades=table(rand(`IBM`MSFT`GM`C`FB`GOOG`V`F`XOM`AMZN`TSLA`PG`S,n) as sym, 2000.01.01+rand(365,n) as date, 10.0+rand(2.0,n) as price1, 100.0+rand(20.0,n) as price2, 1000.0+rand(200.0,n) as price3, 10000.0+rand(2000.0,n) as price4, 10000.0+rand(3000.0,n) as price5, 10000.0+rand(4000.0,n) as price6, rand(10,n) as qty1, rand(100,n) as qty2, rand(1000,n) as qty3, rand(10000,n) as qty4, rand(10000,n) as qty5, rand(10000,n) as qty6)
trades.saveText(workDir + "/trades.txt")

1.1. Use Function `ploadText` to Generate an In-memory Table with Sequential Partition

This approach is the most simple way to create a partitioned in-memory table. However, it does not have certain benefits or flexibilities of other approaches. For examples, the input data file must be smaller than available memory; the partitioned in-memory table generated by this approach cannot use function sortBy! to perform meaningful within-partition sorting.

trades = ploadText(workDir + "/trades.txt");

1.2. Use Function `loadTextEx` to Generate an In-memory Table with Specified Partition Scheme

This approach is for the following use cases:

Frequently conduct within-partition sorting.
Frequently use the partitioning column as the "group by" or "context by" column.

The input data file must be smaller than available memory.

We need to use empty string ("") for parameter "tableName" of function loadTextEx and parameter "directory" of function database.

db = database("", VALUE, `IBM`MSFT`GM`C`FB`GOOG`V`F`XOM`AMZN`TSLA`PG`S)
trades = db.loadTextEx("", `sym, workDir + "/trades.txt");

Use function sortBy! for within-partition sorting.

trades.sortBy!(`qty1);

trades.sortBy!(`date`qty1, false true);

trades.sortBy!(<qty1 * price1>, false);

When the partitioning column is the grouping column, the grouping calculation has the optimal performance. The following table compares the performance of the following query on 3 tables: an unpatitioned table generated by trades=loadText(workDir+"/trades.txt"), the partitioned table with sequential partition in 1.1, and the partitioned table with value partition on "sym" in this section.

timer(10) select std(qty1) from trades group by sym;

Here "timer(10)" means the total time consumed when the query is executed 10 times.


Table	Generated by	Time Consumed
unpartitioned table	`loadText`	3.69 second
partitioned table with sequential domain	`ploadText`	2.51 second
partitioned table with value domain	`loadTextEx`	0.17 second

1.3. Use Function `loadTable` to Import Selected or All Partitions of a Table on Disk into Memory

This approach is for the following use cases:

To import a text file larger than available memory and only a subset of the data are needed each time.
The same data set is used repeatedly. It is much faster to load a database table than to import a text file.

Use function loadTextEx to create a table in a partitioned database on disk: (can also use function createPartitionedTable and append!)

db = database(workDir+"/tradeDB", RANGE, ["A","G","M","S","ZZZZ"])
db.loadTextEx(`trades, `sym, workDir + "/trades.txt");

To load only 2 partitions (["A","G") and ["M","S")) into memory:

db = database(workDir+"/tradeDB")
trades=loadTable(db, `trades, ["A", "M"], 1);

Please note that we need to set the optional parameter "memoryMode" of function loadTable to 1. Otherwise only the metadata of the table are loaded.

1.3.1. Use Function `loadTableBySQL` to Import Selected Columns and/or Rows of a Partitioned Table on Disk

This is the most flexible way to create a partitioned in-memory table. It needs to be used after function loadTable.

db = database(workDir+"/tradeDB")
trades=loadTable(db, `trades);

sample=loadTableBySQL(<select * from trades where date between 2000.03.01 : 2000.05.01>);

sample=loadTableBySQL(<select sym, date, price1, qty1 from trades where date between 2000.03.01 : 2000.05.01>);

dates = 2000.01.16 2000.02.14 2000.08.01
st = sql(<select sym, date, price1, qty1>, trades, expr(<date>, in, dates))
sample = loadTableBySQL(st);

colNames =`sym`date`qty2`price2
st= sql(sqlCol(colNames), trades)
sample = loadTableBySQL(st);

2. Data Manipulation with In-memory Partitioned Tables

2.1. Add New Columns

trades = ploadText(workDir + "/trades.txt");

SQL update statement

update trades set logPrice1= log(price1), newQty1= double(qty1);

function update!

trades.update!(`logPrice1`newQty1, <[log(price1), double(qty1)]>);

assignment statement

trades[`logPrice1`newQty1] = <[log(price1), double(qty1)]>;

2.2. Update Existing Columns

SQL update statement

update trades set qty1=qty1+10;

update trades set qty1=qty1+10 where sym = `IBM;

function update!

trades.update!(`qty1, <qty1+10>);

trades.update!(`qty1, <qty1+10>, <sym=`IBM>);

assginment statement

trades[`qty1] = <qty1+10>;

trades[`qty1, <sym=`IBM>] = <qty1+10>;

2.3. Delete Rows

SQL delete statement
```
delete from trades where qty3<20;
```
function erase!
```
trades.erase!(< qty3<30 >);
```

2.4. Drop Columns

trades.drop!("qty1");

2.5. Rename Columns

trades.rename!("qty2", "qty2New");