keyedStreamTable

Syntax

keyedStreamTable(keyColumn, X, [X1], [X2], .....)

or

keyedStreamTable(keyColumn, capacity:size, colNames, colTypes)

Arguments

keyColumn is a string scalar or vector indicating the name of the primary key columns.

For the first scenario: X, X1, .... are vectors.

For the second scenario:

capacity is the amount of memory (in terms of the number of rows) allocated to the table. When the number of rows exceeds capacity, the system will first allocate memory of 1.2-2 times of capacity, copy the data to the new memory space, and release the original memory. For large tables, these steps may use significant amount of memory.

size can be 0 or 1, indicating the initial size (in terms of the number of rows) of the table. If size=0, create an empty table; If size = 1, create a table with one record, and the initialized values are:

  • false for Boolean type;

  • 0 for numeric, temporal, IPADDR, COMPLEX, and POINT types;

  • NULL for Literal, INT128 types.

colNames is a string vector of column names.

colTypes is a string vector of data types. As of 2.00.11.2, the non-key columns can be specified as array vectors.

Details

This function creates a keyed stream table with one or more columns as the primary key. Duplicate values are not allowed in the primary key columns.

In practical applications, streaming data may be repeatedly written due to high-availability writes or network issues. A keyed stream table enables idempotent writing for streaming data, i.e., multiple writes bearing an identical key will produce the same result as the first write, thereby avoiding duplicate writes. Note that the uniqueness of the primary key is not at global level, but limited to data in memory. If persistence is enabled for stream tables, the number of records retained in memory can be set. Once the limit is exceeded, half of the data will be persisted to disk. Therefore, the primary key of data in memory could be duplicate with that on disk. Nevertheless, it solves the problem of repeated writes caused by multiple writes or network delays.

When new records are being inserted into a keyed stream table, the primary key values will be checked:

  • If the key value of a new record is identical to that of an existing record in memory, the existing record will NOT be updated.

  • If multiple records with identical key values (which are different from the existing ones) are being inserted at one time, only the first record can be successfully inserted.

Examples

Example 1

id=`A`B`C`D`E
x=1 2 3 4 5
t1=keyedStreamTable(`id, id, x)
t1;
id x
A 1
B 2
C 3
D 4
E 5

Example 2

t2=keyedStreamTable(`id,100:0,`id`x, [INT,INT])
insert into t2 values(1 2 3,10 20 30);
t2;
id x
1 10
2 20
3 30

If we try to insert a new row with duplicate primary key value as one of the existing rows, the new row will not be inserted:

insert into t2 values(3 4 5,35 45 55)
t2;
id x
1 10
2 20
3 30
4 45
5 55

the record with id=3 has not been overwritten.

There are multiple columns in the primary key:

t=keyedStreamTable(`sym`id,1:0,`sym`id`val,[SYMBOL,INT,DOUBLE])
insert into t values(`A`B`C`D`E,5 4 3 2 1,52.1 64.2 25.5 48.8 71.9);
insert into t values(`A`B`R`T`Y,5 8 3 2 1,152.3 164.6 125.5 148.8 171.6);
t;
sym id val
A 5 52.1
B 4 64.2
C 3 25.5
D 2 48.8
E 1 71.9
B 8 164.6
R 3 125.5
T 2 148.8
Y 1 171.6