# createLeftSemiJoinEngine {#createleftsemijoinengine}

**Parent topic:**[Functions](../../Functions/category.md)

## Syntax {#syntax}

`createLeftSemiJoinEngine(name, leftTable, rightTable, outputTable, metrics, matchingColumn, [garbageSize=5000], [updateRightTable=false], \[snapshotDir\], \[snapshotIntervalInMsgCount\])`

## Details {#details}

Create a left semi join engine. For each record from the left table, the left semi join engine finds the matching records from the right table, and returns a table of its joining result. Unmatched records will not be returned.

If an incoming record has the identical *matchingColumn* of an existing record in the right table, only the first/latest record \(determined by parameter *updateRightTable*\) is kept.

Note: Only one record with the indentical *matchingColumn* is kept by the engine in the right table, and data in the right table will not be removed from memory. Therefore, a large number of distinct values of *matchingColumn* should be avoided, otherwise an OOM problem may occur.

For more details of streaming engines, refer to [streamingEngine](../../Streaming/streaming_engines.md).

## Arguments {#arguments}

**name** is a string indicating the name of the left semi join engine. It is the unique identifier of the engine on a data/compute node. It can contain letters, numbers and underscores and must start with a letter.

**leftTable** is a stream table object whose schema must be the same as the stream table to which the engine subscribes. It does not matter whether the table contains data or not. Since version 2.00.11, array vectors are allowed for *leftTable*.

**rightTable** is a stream table object whose schema must be the same as the stream table to which the engine subscribes. It does not matter whether the table contains data or not. Since version 2.00.11, array vectors are allowed for *RightTable*.

**outputTable** is a stream table object to hold the calculation results. Create an empty table and specify the column names and types before calling the function. Since version 2.00.11, array vectors are allowed for *outputTable.* With the result column being an array vector column, user-defined aggregate functions can be used to output the results in the form of an array vector.

The columns of *outputTable* are in the following order:

\(1\) The first column\(s\) are the column\(s\) on which the tables are joined, arranged in the same order as specified in *matchingColumn*.

\(2\) Then followed by the calculation results of *metrics*. There can be one or multiple columns.

**metrics** is metacode \(can be a tuple\) specifying the calculation formulas. For more information about metacode, refer to [Metaprogramming](https://test.dolphindb.cn/en/Programming/Metaprogramming/MetacodeWithFunction.html).

-   *metrics* can use one or more expressions, built-in or user-defined functions \(but not aggregate functions\), or a constant scalar/vector. Note that the output column for a constant vector must be in array vector form.

-   *metrics* can be functions that return multiple values and the columns in the output table to hold the return values must be specified. For example, &lt;func\(price\) as \`col1\`col2&gt;.


To specify a column that exists in both the left and the right tables, use the format *tableName.colName*. By default, the column from the left table is used.

**Note:** The column names specified in *metrics* are not case-sensitive and can be inconsistent with the column names of the input tables.

**matchingColumn** is a STRING scaler/vector/tuple indicating the column\(s\) on which the tables are joined. It supports integral, temporal or literal \(except UUID\) types.

-   When there is only 1 column to match - If the names of the columns to match are the same in both tables, *matchingColumn* should be specified as a STRING scalar; otherwise it's a tuple of two elements. For example, if the column is named "sym" in the left table and "sym1" in the right table, then *matchingColumn* = \[\[\`sym\],\[\`sym1\]\].
-   When there are multiple columns to match - If both tables share the names of all columns to match, *matchingColumn* is a STRING vector; otherwise it's a tuple of two elements. For example, if the columns are named "timestamp" and "sym" in the left table, whereas in the right table they're named "timestamp" and "sym1", then *matchingColumn* = \[\[\`timestamp, \`sym\], \[\`timestamp,\`sym1\]\].


**garbageSize** \(optional\) is a positive integer. The default value is 5,000. Unlike other join engines, the *garbageSize* parameter for left semi join engine is only used to remove the historical data from the left table. The system will clear the data from the left table when the number of joined records exceeds *garbageSize*.

**updateRightTable** \(optional\) is a Boolean value indicating whether to output the first record \(*updateRightTable* = true\) or the latest record \(*updateRightTable* = false\) when there are more than one matching records in the right table. The default value is false.

To enable snapshot in the streaming engines, specify parameters *snapshotDir* and *snapshotIntervalInMsgCount*.

**snapshotDir** \(optional\) is a string indicating the directory where the streaming engine snapshot is saved. The directory must already exist, otherwise an exception is thrown. If the *snapshotDir* is specified, the system checks whether a snapshot already exists in the directory when creating a streaming engine. If it exists, the snapshot will be loaded to restore the engine state. Multiple streaming engines can share a directory where the snapshot files are named as the engine names.

The file extension of a snapshot can be:

-   *&lt;engineName&gt;.tmp*: a temporary snapshot
-   *&lt;engineName&gt;.snapshot*: a snapshot that is generated and flushed to disk
-   *&lt;engineName&gt;.old*: if a snapshot with the same name already exists, the previous snapshot is renamed to *&lt;engineName&gt;.old*.

**snapshotIntervalInMsgCount** \(optional\) is a positive integer indicating the number of messages to receive before the next snapshot is saved.

## Examples {#examples}

```
share streamTable(1:0, `time`sym`price, [TIMESTAMP, SYMBOL, DOUBLE]) as leftTable
share streamTable(1:0, `time`sym1`vol, [TIMESTAMP, SYMBOL, INT]) as rightTable
share table(100:0, `time`sym`price`vol`total, [TIMESTAMP, SYMBOL, DOUBLE, INT, DOUBLE]) as output
lsjEngine=createLeftSemiJoinEngine(name="test1", leftTable=leftTable, rightTable=rightTable, outputTable=output,  metrics=<[price, vol,price*vol]>, matchingColumn=[[`time,`sym], [`time,`sym1]], updateRightTable=true)

subscribeTable(tableName="leftTable", actionName="joinLeft", offset=0, handler=appendForJoin{lsjEngine, true}, msgAsTable=true)
subscribeTable(tableName="rightTable", actionName="joinRight", offset=0, handler=appendForJoin{lsjEngine, false}, msgAsTable=true)

v = [1, 5, 10, 15]
tp1=table(2012.01.01T00:00:00.000+v as time, take(`AAPL, 4) as sym, rand(100,4) as price)
leftTable.append!(tp1)

v = [1, 1, 3, 4, 5, 5, 5, 15]
tp2=table(2012.01.01T00:00:00.000+v as time, take(`AAPL, 8) as sym, rand(100,8) as vol)
rightTable.append!(tp2)

select * from output
```

|time|sym|price|vol|total|
|----|---|-----|---|-----|
|2012.01.01T00:00:00.001|AAPL|44|76|3344|
|2012.01.01T00:00:00.005|AAPL|15|64|960|
|2012.01.01T00:00:00.015|AAPL|24|75|1800|

To execute the above script again, delete the engine and unsubscribe:

```
dropStreamEngine("test1")
lsjEngine=NULL
unsubscribeTable(tableName="leftTable", actionName="joinLeft")
unsubscribeTable(tableName="rightTable", actionName="joinRight")
```

