# createWindowJoinEngine {#createwindowjoinengine}

**Parent topic:**[Functions](../../Functions/category.md)

## Syntax {#syntax}

`createWindowJoinEngine(name, leftTable, rightTable, outputTable, window, metrics, matchingColumn, [timeColumn], [useSystemTime=false], [garbageSize = 5000], [maxDelayedTime], [nullFill], [outputElapsedMicroseconds=false], [sortByTime], [closed], \[snapshotDir\], \[snapshotIntervalInMsgCount\],[cachedTableCapacity=1024], [keyPurgeFreqInSec])`

## Details {#details}

Create a window join streaming engine. Return a table object that is the real-time [window join](../../Programming/SQLStatements/TableJoiners/windowjoin.md) result of a left table and a right table.

Data ingested into the engine is grouped by *matchingColumn*. Within a group, for each record in the left table, calculate the *metrics* over the specified window in the right table and return the metrics in additional columns.

**Standard windows \(i.e., _window_ = _a:b_\)**:

The windows over the right table are determined by the current timestamp in the left table and the specified parameter *window*. Suppose the current timestamp in the left table is *t*, and window is set to *a:b*, then the corresponding window in the right table consists of records with timestamps in \[t+a, t+b\]. The engine returns the join result containing the results of the metrics calculated using the windowed data.

Window triggering rules:

-   A window is triggered when a timestamp \(with the same *matchingColumn* value\) past the end of that window arrives in the right table. The record itself does not participate in the calculation of that window.
-   If *maxDelayedTime* is specified - a new timestamp *t* \(regardless of its *matchingColumn* value\) in the right table triggers an uncalculated window when *t &gt; b + maxDelayedTime*

**Special windows \(i.e., _window_ = _0:0_\):**

The windows over the right table are determined by the current timestamp \(*t*\) in the left table and its previous timestamp \(*t0*\). By default, the window is left-closed and right-open, meaning it includes records from the right table with timestamps in the interval \[*t0, t*\). The window boundary can be adjusted using the *closed*parameter. The window triggering rules are as follows:

-   When *useSystemTime* = false
    -   A window is triggered when a record in the right table \(same group\) arrives with a timestamp exceeding the window's right boundary.
    -   If a window remains uncomputed, the window is triggered when a new record in the right table \(regardless of group\) arrives with a timestamp exceeding the window's right boundary + *maxDelayedTime*.
-   When *useSystemTime* = true, a window is triggered immediately upon ingestion of each record in the left table \(same group\).

Note: When *window* = *0:0*, if *metrics* contains a non-aggregate function which applies to a right table column, the corresponding output column must be specified as an array vector of the appropriate data type.

For more application scenarios of streaming engines, see [Streaming Engines](../../Streaming/streaming_engines.md).

## Arguments {#arguments}

**name** is a string indicating the name of the window join streaming engine. It is the unique identifier of the engine on a data/compute node. It can contain letters, numbers and underscores and must start with a letter.

**leftTable** and **rightTable** are table objects whose schema must be the same as the stream table to which the engine subscribes. Since version 2.00.10, array vectors are allowed for *leftTable*.

**outputTable** is a table to which the engine inserts calculation result. It can be an in-memory table or a DFS table. Before calling a function, an empty table with specified column names must be created.

The columns of *outputTable* are in the following order:

\(1\) The first column must be a temporal column.

-   if *useSystemTime* = true, the data type must be TIMESTAMP;

-   if *useSystemTime* = false, it has the same data type as *timeColumn*.


\(2\) Then followed by one or more columns on which the tables are joined, arranged in the same order as specified in *matchingColumn*.

\(3\) Further followed by one or more columns which are the calculation results of *metrics*.

\(4\) If the *outputElapsedMicroseconds* is set to true, specify two more columns: a LONG column and an INT column.

**window** is a pair of integers or duration values, indicating the range of a sliding window, including both left and right bounds.

**metrics** is metacode \(which can be a tuple\) specifying the calculation formulas. For more information about metacode, please refer to [Metaprogramming](../../Programming/Metaprogramming/functional_meta.md).

-   *metrics* can use one or more expressions, built-in or user-defined functions \(both aggregate functions and non-aggregate functions are accepted\).

-   *metrics* can be functions that return multiple values and the columns in the output table to hold the return values must be specified. For example, &lt;func\(price\) as \`col1\`col2&gt;.

-   The column names specified in *metrics* are not case-sensitive and can be inconsistent with the column names of the input tables.


If you want to specify a column that exists in both the left and the right tables, use the format *tableName.colName*. By default, the column from the left table is used.

The following functions are optimized in the engine when they are applied only to the columns from the right table: `sum`, `sum2`, `avg`, `std`, `var`, `corr`, `covar`, `wavg`, `wsum`, `beta`, `max`, `min`, `last`, `first`, `med`, `percentile`.

**matchingColumn** is a STRING scaler/vector/tuple indicating the column\(s\) on which the tables are joined. It supports integral, temporal or literal \(except UUID\) types.

-   When there is only 1 column to match - If the names of the matching column are the same in both tables, *matchingColumn* should be specified as a STRING scalar; otherwise it's a tuple of two elements. For example, if the column is named "sym" in the left table and "sym1" in the right table, then *matchingColumn* = \[\[\`sym\],\[\`sym1\]\].
-   When there are multiple columns to match - If the names of all the columns to match are the same in both tables, *matchingColumn* is a STRING vector; otherwise it's a tuple of two elements. For example, if the columns are named "timestamp" and "sym" in the left table, whereas in the right table they're named "timestamp" and "sym1", then matchingColumn = \[\[\`timestamp, \`sym\], \[\`timestamp,\`sym1\]\].

**timeColumn** \(optional\) When *useSystemTime* = false, it must be specified to indicate the name\(s\) of the time column in the left table and the right table. The time columns must have the same data type. If the names of the time column in the left table and the right table are the same, *timeColumn* is a string. Otherwise, it is a vector of 2 strings indicating the time column in each table.

**useSystemTime** \(optional\) indicates whether the left table and the right table are joined on the system time, instead of on the *timeColumn*.

-   *useSystemTime* = true: join records based on the system time \(timestamp with millisecond precision\) when they are ingested into the engine.

-   *useSystemTime* = false \(default\): join records based on the specified timeColumn from the left table and the right table.


**garbageSize** \(optional\) is a positive integer with the default value of 5,000 \(rows\). As the subscribed data is ingested into the engine, it continues to take up the memory. Within the left/right table, the records are grouped by *matchingColumn* values; When the number of records in a group exceeds *garbageSize*, the system will remove those already been calculated from memory.

**maxDelayedTime** \(optional\) is a positive integer. *maxDelayedTime* only takes effect when *timeColumn* is specified and the two arguments must have the same time precision. Use *maxDelayedTime* to trigger windows which remain uncalculated long past its end. The default *maxDelayedTime* is 3 seconds. For more information about this parameter, see "Window triggering rules" in the Details section.

**nullFill** \(optional\) is a tuple of the same size as the number of output columns. It is used to fill in the null values in the output table. The data type of each element corresponds to each output column.

**outputElapsedMicroseconds** \(optional\) is a Boolean value, indicating whether to output the elapsed time to calculate each batch and the number of records in each batch in the output table. The default value is false. When *outputElapsedMicroseconds* = true, two additional columns are required when specifying outputTables \(see the Arguments section\).

**sortByTime** \(optional\) is a Boolean value that indicates whether the output data is globally sorted by time. The default value is false, meaning the output data is sorted only within groups. Note that if *sortByTime* is set to true, the data input to the left and right tables must be globally sorted, and the parameter *maxDelayedTime* cannot be specified \(i.e., no delayed triggering allowed\).

**closed**\(optional\) is a string that indicates whether the left or the right boundary is included. It only takes effect when *window*=0:0.

-   *closed*= 'left': left-closed, right-open.

-   *closed*= 'right': left-open, right-closed. The parameter *useSystemTime*must be set to false.


To enable snapshot in the streaming engines, specify parameters *snapshotDir* and *snapshotIntervalInMsgCount*.

**snapshotDir** \(optional\) is a string indicating the directory where the streaming engine snapshot is saved. The directory must already exist, otherwise an exception is thrown. If the *snapshotDir* is specified, the system checks whether a snapshot already exists in the directory when creating a streaming engine. If it exists, the snapshot will be loaded to restore the engine state. Multiple streaming engines can share a directory where the snapshot files are named as the engine names.

The file extension of a snapshot can be:

-   *&lt;engineName&gt;.tmp*: a temporary snapshot
-   *&lt;engineName&gt;.snapshot*: a snapshot that is generated and flushed to disk
-   *&lt;engineName&gt;.old*: if a snapshot with the same name already exists, the previous snapshot is renamed to *&lt;engineName&gt;.old*.

**snapshotIntervalInMsgCount** \(optional\) is a positive integer indicating the number of messages to receive before the next snapshot is saved.

**cachedTableCapacity** \(optional\) is a positive integer indicating the initial capacity \(in terms of the number of rows\) of the left and right cache tables for each group. The default value is 1024.

**keyPurgeFreqInSec**\(optional\) is a positive integer indicating the interval \(in seconds\) at which the system checks for groups that can be removed. This parameter does not take effect when *window*=0:0. Removal rules:

-   If the left cache table has no unprocessed messages, the group can be removed either when the right cache table is empty or when the timestamp of its last record is earlier than \(latest right timestamp − *maxDelayedTime* − window size\).
-   The engine triggers removal when the number of purgeable groups reaches 10% of the total number of groups.

## Examples {#examples}

```
share streamTable(1:0, `time`sym`price, [TIMESTAMP, SYMBOL, DOUBLE]) as leftTable
share streamTable(1:0, `time`sym`val, [TIMESTAMP, SYMBOL, DOUBLE]) as rightTable
share table(100:0, `time`sym`factor1`factor2`factor3, [TIMESTAMP, SYMBOL, DOUBLE, DOUBLE, DOUBLE]) as output

nullFill= [2012.01.01T00:00:00.000, `NONE, 0.0, 0.0, 0.0]
wjEngine=createWindowJoinEngine(name="test1", leftTable=leftTable, rightTable=rightTable, outputTable=output, window=-2:2, metrics=<[price,val,sum(val)]>, matchingColumn=`sym, timeColumn=`time, useSystemTime=false,nullFill=nullFill)

subscribeTable(tableName="leftTable", actionName="joinLeft", offset=0, handler=appendForJoin{wjEngine, true}, msgAsTable=true)
subscribeTable(tableName="rightTable", actionName="joinRight", offset=0, handler=appendForJoin{wjEngine, false}, msgAsTable=true)

n=10
tp1=table(take(2012.01.01T00:00:00.000+0..10, 2*n) as time, take(`AAPL, n) join take(`IBM, n) as sym, take(NULL join rand(10.0, n-1),2*n) as price)
tp1.sortBy!(`time)
leftTable.append!(tp1)

tp2=table(take(2012.01.01T00:00:00.000+0..10, 2*n) as time, take(`AAPL, n) join take(`IBM, n) as sym, take(double(1..n),2*n) as val)
tp2.sortBy!(`time)
rightTable.append!(tp2)

select * from output where time between 2012.01.01T00:00:00.000:2012.01.01T00:00:00.001
```

|time|sym|factor1|factor2|factor3|
|----|---|-------|-------|-------|
|2012.01.01T00:00:00.000|AAPL|0|1|6|
|2012.01.01T00:00:00.000|AAPL|0|2|6|
|2012.01.01T00:00:00.000|AAPL|0|3|6|
|2012.01.01T00:00:00.001|AAPL|5.2705|1|10|
|2012.01.01T00:00:00.001|AAPL|5.2705|2|10|
|2012.01.01T00:00:00.001|AAPL|5.2705|3|10|
|2012.01.01T00:00:00.001|AAPL|5.2705|4|10|
|2012.01.01T00:00:00.000|IBM|5.2705|2|9|
|2012.01.01T00:00:00.000|IBM|5.2705|3|9|
|2012.01.01T00:00:00.000|IBM|5.2705|4|9|
|2012.01.01T00:00:00.001|IBM|1.0179|2|14|
|2012.01.01T00:00:00.001|IBM|1.0179|3|14|
|2012.01.01T00:00:00.001|IBM|1.0179|4|14|
|2012.01.01T00:00:00.001|IBM|1.0179|5|14|

Example for *window* = 0:0:

```
share streamTable(1:0, `time`sym`price, [TIMESTAMP, SYMBOL, DOUBLE]) as leftTable
share streamTable(1:0, `time`sym`val, [TIMESTAMP, SYMBOL, DOUBLE]) as rightTable

v = [1, 5, 10, 15]
tp1=table(2012.01.01T00:00:00.000+v as time, take(`AAPL, 4) as sym, rand(10.0,4) as price)

v = [1, 2, 3, 4, 5, 6, 9, 15]
tp2=table(2012.01.01T00:00:00.000+v as time, take(`AAPL, 8) as sym, rand(10.0,8) as val)

share table(100:0, `time`sym`price`val`sum_val, [TIMESTAMP, SYMBOL, DOUBLE, DOUBLE[], DOUBLE]) as output
wjEngine=createWindowJoinEngine(name="test1", leftTable=leftTable, rightTable=rightTable, outputTable=output,  window=0:0, metrics=<[price, val, sum(val)]>, matchingColumn=`sym, timeColumn=`time, useSystemTime=false)

subscribeTable(tableName="leftTable", actionName="joinLeft", offset=0, handler=appendForJoin{wjEngine, true}, msgAsTable=true)
subscribeTable(tableName="rightTable", actionName="joinRight", offset=0, handler=appendForJoin{wjEngine, false}, msgAsTable=true)

leftTable.append!(tp1)
rightTable.append!(tp2)
```

|time|sym|price|val|sum\_val|
|----|---|-----|---|--------|
|2012.01.01T00:00:00.001|AAPL|8.8252|\[\]||
|2012.01.01T00:00:00.005|AAPL|7.1195|\[7.495792,9.417891,1.419681,…\]|21.3741|
|2012.01.01T00:00:00.010|AAPL|5.2217|\[4.840462,8.086567,3.495306\]|16.4223|
|2012.01.01T00:00:00.015|AAPL|9.2517|\[\]||

``` {#codeblock_xkt_h52_hzb}
share streamTable(1:0, `time`sym`price, [TIMESTAMP, SYMBOL, DOUBLE]) as leftTable
share streamTable(1:0, `time`sym`val, [TIMESTAMP, SYMBOL, DOUBLE]) as rightTable

v = [1, 5, 10, 15]
tp1=table(2012.01.01T00:00:00.000+v as time, take(`A, 4) as sym, rand(10.0,4) as price)

v = [1, 2, 3, 4, 5, 6, 9, 15]
tp2=table(2012.01.01T00:00:00.000+v as time, take(`A, 8) as sym, rand(10.0,8) as val)

share table(100:0, `time`sym`price`sum_val, [TIMESTAMP, SYMBOL, DOUBLE, DOUBLE]) as output
wjEngine=createWindowJoinEngine(name="test1", leftTable=leftTable, rightTable=rightTable, outputTable=output,  window=0:0, metrics=<[price, sum(val)]>, matchingColumn=`sym, timeColumn=`time, useSystemTime=false)

subscribeTable(tableName="leftTable", actionName="joinLeft", offset=0, handler=appendForJoin{wjEngine, true}, msgAsTable=true)
subscribeTable(tableName="rightTable", actionName="joinRight", offset=0, handler=appendForJoin{wjEngine, false}, msgAsTable=true)

leftTable.append!(tp1)
rightTable.append!(tp2)
```

<table id="table_eky_352_hzb"><tbody><tr><td align="left">

time

</td><td align="left">

sym

</td><td align="left">

price

</td><td align="left">

sum\_val

</td></tr><tr><td align="left">

2012.01.01T00:00:00.001

</td><td align="left">

A

</td><td align="left">

8.8252

</td><td align="left">

 

</td></tr><tr><td align="left">

2012.01.01T00:00:00.005

</td><td align="left">

A

</td><td align="left">

7.1195

</td><td align="left">

21.3741

</td></tr><tr><td align="left">

2012.01.01T00:00:00.010

</td><td align="left">

A

</td><td align="left">

5.2217

</td><td align="left">

16.4223

</td></tr><tr><td align="left">

2012.01.01T00:00:00.015

</td><td align="left">

A

</td><td align="left">

9.2517

</td><td align="left">

 

</td></tr></tbody>
</table>When *window*=0:0, the window is left-closed and right-open by default. The following example uses a left-open and right-closed window by setting *closed*to 'right'.

``` {#codeblock_etf_lmj_b1c}
unsubscribeTable(tableName="leftTable", actionName="joinLeft")
unsubscribeTable(tableName="rightTable", actionName="joinRight")
undef(`leftTable,SHARED)
undef(`rightTable,SHARED)
dropAggregator(name="test1")

share streamTable(1:0, `time`sym`price, [TIMESTAMP, SYMBOL, DOUBLE]) as leftTable
share streamTable(1:0, `time`sym`val, [TIMESTAMP, SYMBOL, DOUBLE]) as rightTable

v1 = [1, 5, 10, 15]
tp1=table(2012.01.01T00:00:00.000+v1 as time, take(`A, 4) as sym, rand(10.0,4) as price)

v2 = [1, 2, 3, 4, 5, 6, 9, 15]
tp2=table(2012.01.01T00:00:00.000+v2 as time, take(`A, 8) as sym, rand(10.0,8) as val)

share table(100:0, `time`sym`price`val`sum_val, [TIMESTAMP, SYMBOL, DOUBLE, DOUBLE[], DOUBLE]) as output
wjEngine=createWindowJoinEngine(name="test1", leftTable=leftTable, rightTable=rightTable, outputTable=output,  window=0:0, metrics=<[price, val, sum(val)]>, matchingColumn="sym", timeColumn="time", useSystemTime=false, closed="right")

subscribeTable(tableName="leftTable", actionName="joinLeft", offset=0, handler=appendForJoin{wjEngine, true}, msgAsTable=true)
subscribeTable(tableName="rightTable", actionName="joinRight", offset=0, handler=appendForJoin{wjEngine, false}, msgAsTable=true)

leftTable.append!(tp1)
rightTable.append!(tp2)
sleep(100)
select * from output
/* output
time	                  sym	price	val	                      sum_val
2012.01.01T00:00:00.001	A	9.7366	[7.8310]	                  7.831
2012.01.01T00:00:00.005	A	2.6537	[1.8564,4.6238,8.2536,3.1028]     17.8368
2012.01.01T00:00:00.010	A	3.9586	[0.8413,8.0684]	           8.9098
*/
```

The following example shows that when *sortByTime* =true, the engine outputs data sorted by time.

```
unsubscribeTable(tableName="leftTable", actionName="joinLeft")
unsubscribeTable(tableName="rightTable", actionName="joinRight")
undef(`leftTable,SHARED)
undef(`rightTable,SHARED)
dropAggregator(name="test1")

//define a window join engine
share streamTable(1:0, `time`sym`price, [TIMESTAMP, SYMBOL, DOUBLE]) as leftTable
share streamTable(1:0, `time`sym`val, [TIMESTAMP, SYMBOL, DOUBLE]) as rightTable
share table(100:0, `time`sym`factor1`factor2`factor3, [TIMESTAMP, SYMBOL, DOUBLE, DOUBLE, DOUBLE]) as output
nullFill= [2012.01.01T00:00:00.000, `NONE, 0.0, 0.0, 0.0]
wjEngine=createWindowJoinEngine(name="test1", leftTable=leftTable, rightTable=rightTable, outputTable=output,  window=-2:2, metrics=<[price,val,sum(val)]>, matchingColumn=`sym, timeColumn=`time, useSystemTime=false,nullFill=nullFill, sortByTime=true)

//subscribe data
subscribeTable(tableName="leftTable", actionName="joinLeft", offset=0, handler=appendForJoin{wjEngine, true}, msgAsTable=true)
subscribeTable(tableName="rightTable", actionName="joinRight", offset=0, handler=appendForJoin{wjEngine, false}, msgAsTable=true)

n=10
tp1=table(take(2012.01.01T00:00:00.000+0..10, 2*n) as time, take(`A, n) join take(`B, n) as sym, take(NULL join rand(10.0, n-1),2*n) as price)
tp1.sortBy!(`time)
leftTable.append!(tp1)

tp2=table(take(2012.01.01T00:00:00.000+0..10, 2*n) as time, take(`A, n) join take(`B, n) as sym, take(double(1..n),2*n) as val)
tp2.sortBy!(`time)
rightTable.append!(tp2)

sleep(100)
select * from output where time between 2012.01.01T00:00:00.000:2012.01.01T00:00:00.001


/* output
time                    sym   factor1        factor2 factor3
2012.01.01T00:00:00.000      A       0        1        6
2012.01.01T00:00:00.000      A       0        2        6
2012.01.01T00:00:00.000      A       0        3        6
2012.01.01T00:00:00.000      B     3.9389     2        9
2012.01.01T00:00:00.000      B     3.9389     3        9
2012.01.01T00:00:00.000      B     3.9389     4        9
2012.01.01T00:00:00.001      A     3.9389     1        10
2012.01.01T00:00:00.001      A     3.9389     2        10
2012.01.01T00:00:00.001      A     3.9389     3        10
2012.01.01T00:00:00.001      A     3.9389     4        10
2012.01.01T00:00:00.001      B     4.9875     2        14
2012.01.01T00:00:00.001      B     4.9875     3        14
2012.01.01T00:00:00.001      B     4.9875     4        14
2012.01.01T00:00:00.001      B     4.9875     5        14
*/
```

