Working with Partitioned In-memory Tables
In DolphinDB, all partition schemes for databases also apply to in-memory tables except composite domain. Working with partitioned in-memory tables can utilize parallel computing capacities of multi-core CPUs.
1. Load Data as In-memory Partitioned Tables
DolphinDB offers various ways to load a data set as a partitioned in-memory table. We start with creating a text file for the examples below.
n=30000000
workDir = "C:/DolphinDB/Data"
if(!exists(workDir)) mkdir(workDir)
trades=table(rand(`IBM`MSFT`GM`C`FB`GOOG`V`F`XOM`AMZN`TSLA`PG`S,n) as sym, 2000.01.01+rand(365,n) as date, 10.0+rand(2.0,n) as price1, 100.0+rand(20.0,n) as price2, 1000.0+rand(200.0,n) as price3, 10000.0+rand(2000.0,n) as price4, 10000.0+rand(3000.0,n) as price5, 10000.0+rand(4000.0,n) as price6, rand(10,n) as qty1, rand(100,n) as qty2, rand(1000,n) as qty3, rand(10000,n) as qty4, rand(10000,n) as qty5, rand(10000,n) as qty6)
trades.saveText(workDir + "/trades.txt")
1.1. Use Function ploadText
to Generate an In-memory Table with Sequential Partition
This approach is the most simple way to create a partitioned in-memory table. However, it does not have certain benefits or flexibilities of other approaches. For examples, the input data file must be smaller than available memory; the partitioned in-memory table generated by this approach cannot use function sortBy!
to perform meaningful within-partition sorting.
trades = ploadText(workDir + "/trades.txt");
1.2. Use Function loadTextEx
to Generate an In-memory Table with Specified Partition Scheme
This approach is for the following use cases:
- Frequently conduct within-partition sorting.
- Frequently use the partitioning column as the "group by" or "context by" column.
The input data file must be smaller than available memory.
We need to use empty string ("") for parameter "tableName" of function loadTextEx
and parameter "directory" of function database
.
db = database("", VALUE, `IBM`MSFT`GM`C`FB`GOOG`V`F`XOM`AMZN`TSLA`PG`S)
trades = db.loadTextEx("", `sym, workDir + "/trades.txt");
Use function sortBy!
for within-partition sorting.
trades.sortBy!(`qty1);
trades.sortBy!(`date`qty1, false true);
trades.sortBy!(<qty1 * price1>, false);
When the partitioning column is the grouping column, the grouping calculation has the optimal performance. The following table compares the performance of the following query on 3 tables: an unpatitioned table generated by trades=loadText(workDir+"/trades.txt"), the partitioned table with sequential partition in 1.1, and the partitioned table with value partition on "sym" in this section.
timer(10) select std(qty1) from trades group by sym;
Here "timer(10)" means the total time consumed when the query is executed 10 times.
Table | Generated by | Time Consumed |
---|---|---|
unpartitioned table | loadText | 3.69 second |
partitioned table with sequential domain | ploadText | 2.51 second |
partitioned table with value domain | loadTextEx | 0.17 second |
1.3. Use Function loadTable
to Import Selected or All Partitions of a Table on Disk into Memory
This approach is for the following use cases:
- To import a text file larger than available memory and only a subset of the data are needed each time.
- The same data set is used repeatedly. It is much faster to load a database table than to import a text file.
Use function loadTextEx
to create a table in a partitioned database on disk: (can also use function createPartitionedTable
and append!
)
db = database(workDir+"/tradeDB", RANGE, ["A","G","M","S","ZZZZ"])
db.loadTextEx(`trades, `sym, workDir + "/trades.txt");
To load only 2 partitions (["A","G") and ["M","S")) into memory:
db = database(workDir+"/tradeDB")
trades=loadTable(db, `trades, ["A", "M"], 1);
Please note that we need to set the optional parameter "memoryMode" of function loadTable
to 1. Otherwise only the metadata of the table are loaded.
1.3.1. Use Function loadTableBySQL
to Import Selected Columns and/or Rows of a Partitioned Table on Disk
This is the most flexible way to create a partitioned in-memory table. It needs to be used after function loadTable
.
db = database(workDir+"/tradeDB")
trades=loadTable(db, `trades);
sample=loadTableBySQL(<select * from trades where date between 2000.03.01 : 2000.05.01>);
sample=loadTableBySQL(<select sym, date, price1, qty1 from trades where date between 2000.03.01 : 2000.05.01>);
dates = 2000.01.16 2000.02.14 2000.08.01
st = sql(<select sym, date, price1, qty1>, trades, expr(<date>, in, dates))
sample = loadTableBySQL(st);
colNames =`sym`date`qty2`price2
st= sql(sqlCol(colNames), trades)
sample = loadTableBySQL(st);
1.4. Syntax of Related Functions
- ploadText(fileName, [delimiter], [schema])
- loadTextEx(dbHandle, tableName, partitionColumns, filename, [delimiter=','], [schema])
- database(directory, [partitionType], [partitionScheme], [locations])
- loadTable(database, tableName, [partitions], [memoryMode=false])
- loadTableBySQL(meta code representing a SQL query)
- sortBy!(table, sortColumns, [sortDirections])
- update!(table, colNames, newValues, [filter])
- drop!(table, colNames)
- erase!(table, filter)
- rename!(table, ColName, newColName)
2. Data Manipulation with In-memory Partitioned Tables
2.1. Add New Columns
trades = ploadText(workDir + "/trades.txt");
- SQL update statement
update trades set logPrice1= log(price1), newQty1= double(qty1);
- function
update!
trades.update!(`logPrice1`newQty1, <[log(price1), double(qty1)]>);
- assignment statement
trades[`logPrice1`newQty1] = <[log(price1), double(qty1)]>;
2.2. Update Existing Columns
- SQL update statement
update trades set qty1=qty1+10; update trades set qty1=qty1+10 where sym = `IBM;
- function
update!
trades.update!(`qty1, <qty1+10>); trades.update!(`qty1, <qty1+10>, <sym=`IBM>);
- assginment statement
trades[`qty1] = <qty1+10>; trades[`qty1, <sym=`IBM>] = <qty1+10>;
2.3. Delete Rows
SQL delete statement
delete from trades where qty3<20;
function
erase!
trades.erase!(< qty3<30 >);
2.4. Drop Columns
trades.drop!("qty1");
2.5. Rename Columns
trades.rename!("qty2", "qty2New");