createDimensionTable
Syntax
createDimensionTable(dbHandle, table, tableName,
[compressMethods], [sortColumns], [keepDuplicates=ALL],
[softDelete=false])
Alias: createTable
Arguments
dbHandle is a DFS database handle returned by function database.
table is a table object. The table schema will be used to construct the new dimension table.
tableName is a string indicating the name of the dimension table.
compressMethods (optional) is a dictionary indicating which compression methods are used for specified columns. The keys are columns name and the values are compression methods ("lz4", "delta", "zstd" or "chimp"). If unspecified, use LZ4 compression method.
Note:
- The delta compression method can be used for DECIMAL, SHORT, INT, LONG or temporal data types.
- Save strings as SYMBOL type to enable compression of strings.
- The chimp compression method can be used for DOUBLE type data with decimal parts not exceeding three digits in length.
Details
This function creates an empty dimension (non-partitioned) table in a DFS database, used to store small datasets with infrequent updates. During query, all data in a dimension table will be loaded into the memory.
The system will regularly check the memory usage. When memory usage exceeds
warningMemSize, the system will discard the least recently used (LRU)
data from memory to clean up the cache. Users can also manually call command
clearCachedDatabase
to clear the cached data.
Like partitioned tables, a dimension table can have multiple replicas (determined by the configuration parameter dfsReplicationFactor).
To enable concurrent writes, updates or deletes on a dimension table, set the configuration parameter enableConcurrentDimensionalTableWrite to true.
Examples
Example 1
db=database("dfs://db1",VALUE,1 2 3)
timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
sym = `C`MS`MS`MS`IBM`IBM`C`C`C
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
t = table(timestamp, sym, qty, price);
dt=db.createDimensionTable(t,`dt).append!(t);
select * from dt;
timestamp | sym | qty | price |
---|---|---|---|
09:34:07 | C | 2200 | 49.6 |
09:36:42 | MS | 1900 | 29.46 |
09:36:51 | MS | 2100 | 29.52 |
09:36:59 | MS | 3200 | 30.02 |
09:32:47 | IBM | 6800 | 174.97 |
09:35:26 | IBM | 5400 | 175.23 |
09:34:16 | C | 1300 | 50.76 |
09:34:26 | C | 2500 | 50.32 |
09:38:12 | C | 8800 | 51.29 |
Example 2
db = database("dfs://demodb", VALUE, 1..10)
t=table(take(1, 86400) as id, 2020.01.01T00:00:00 + 0..86399 as timestamp, rand(1..100, 86400) as val)
dt = db.createDimensionTable(t, "dt", {timestamp:"delta", val:"delta"})
dt.append!(t)
Related functions: createPartitionedTable