createDimensionTable

Syntax

createDimensionTable(dbHandle, table, tableName, [compressMethods], [sortColumns], [keepDuplicates=ALL], [softDelete=false])

Alias: createTable

Arguments

dbHandle is a DFS database handle returned by function database.

table is a table object. The table schema will be used to construct the new dimension table.

tableName is a string indicating the name of the dimension table.

compressMethods (optional) is a dictionary indicating which compression methods are used for specified columns. The keys are columns name and the values are compression methods ("lz4", "delta", "zstd" or "chimp"). If unspecified, use LZ4 compression method.

Note:

  • The delta compression method can be used for DECIMAL, SHORT, INT, LONG or temporal data types.
  • Save strings as SYMBOL type to enable compression of strings.
  • The chimp compression method can be used for DOUBLE type data with decimal parts not exceeding three digits in length.

Details

This function creates an empty dimension (non-partitioned) table in a DFS database, used to store small datasets with infrequent updates. During query, all data in a dimension table will be loaded into the memory.

The system will regularly check the memory usage. When memory usage exceeds warningMemSize, the system will discard the least recently used (LRU) data from memory to clean up the cache. Users can also manually call command clearCachedDatabase to clear the cached data.

Like partitioned tables, a dimension table can have multiple replicas (determined by the configuration parameter dfsReplicationFactor).

To enable concurrent writes, updates or deletes on a dimension table, set the configuration parameter enableConcurrentDimensionalTableWrite to true.

Examples

Example 1

db=database("dfs://db1",VALUE,1 2 3)
timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
sym = `C`MS`MS`MS`IBM`IBM`C`C`C
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
t = table(timestamp, sym, qty, price);

dt=db.createDimensionTable(t,`dt).append!(t);
select * from dt;
timestamp sym qty price
09:34:07 C 2200 49.6
09:36:42 MS 1900 29.46
09:36:51 MS 2100 29.52
09:36:59 MS 3200 30.02
09:32:47 IBM 6800 174.97
09:35:26 IBM 5400 175.23
09:34:16 C 1300 50.76
09:34:26 C 2500 50.32
09:38:12 C 8800 51.29

Example 2

db = database("dfs://demodb", VALUE, 1..10)
t=table(take(1, 86400) as id, 2020.01.01T00:00:00 + 0..86399 as timestamp, rand(1..100, 86400) as val)
dt = db.createDimensionTable(t, "dt", {timestamp:"delta", val:"delta"})
dt.append!(t)

Related functions: createPartitionedTable