registerSnapshotEngine

Syntax

registerSnapshotEngine(dbName, tableName, keyColumnName, [dedupColumnName])

Arguments

dbName is a string indicating the name of a distributed database.

tableName is a string indicating the name of a table.

keyColumnName is a string indicating the name of a column.

dedupColumnName [optional] is a string indicating the column based on which data will be sorted in descending order within each group, retaining only the first record after sorting. In cluster mode,this parameter must be specified to keep the query results correct and consistent across cluster.

Details

Register a snapshot engine for a DFS table. After registering the snapshot engine, we can get the most recent row of each group specified by keyColumnName with the following SQL statement:

select [HINT_SNAPSHOT] * from loadTable(dbName,tableName).
Note:
  • This engine is currently not supported in clusters with compute nodes.
  • A DFS table can only be registered with one snapshot engine.
  • After system restart, the snapshot engine needs to be re-registered. In cluster mode, if any data node restarts, the snapshot engine needs to be re-registered. First use unregisterSnapshotEngine to clear snapshot information on other nodes, then call registerSnapshotEngine to re-register.

Examples

Create a partitioned DFS table:

db1=database("",VALUE,2018.09.01..2018.09.30)
db2=database("",VALUE,`AAPL`MSFT`MS`C)
db=database("dfs://compoDB",COMPO,[db1,db2])
t=table(1:0,`date`sym`val,[DATE,SYMBOL,DOUBLE])
pt=db.createPartitionedTable(t,`pt,`date`sym);

Register a snapshot engine to get the latest record for each symbol from the DFS table pt.

registerSnapshotEngine("dfs://compoDB","pt","sym");

Append data to pt. Then get the latest records. Note that this example uses randomly generated data, so your execution results may differ from those shown in this example.

def writeData(batch){
    pt=loadTable("dfs://compoDB","pt")
    tmp=table(batch:0,`date`sym`val,[DATE,SYMBOL,DOUBLE])
    dates=take(2018.09.01..2018.09.30,batch)
    syms=take(`AAPL`MSFT`MS`C,batch)
    vals=rand(100.0,batch)
    insert into tmp values(dates,syms,vals)
    pt.append!(tmp)
}

writeData(1000);
select [HINT_SNAPSHOT] * from loadTable("dfs://compoDB","pt");
date sym val
2018.09.29 AAPL 24.945753
2018.09.29 MS 14.034453
2018.09.30 C 3.89175
2018.09.30 MSFT 17.720025
writeData(1000);
select [HINT_SNAPSHOT] * from loadTable("dfs://compoDB","pt");
date sym val
2018.09.29 AAPL 86.296883
2018.09.29 MS 48.17885
2018.09.30 C 83.7821
2018.09.30 MSFT 44.415456

When executing the script above in cluster mode, the query may return different results on different executions. To avoid inconsistent results, set the "date" column as dedupColumnName to sort the data of each group by date in descending order and retain only the latest record.

unregisterSnapshotEngine(dbName="dfs://compoDB", tableName="pt")
registerSnapshotEngine(dbName="dfs://compoDB", tableName="pt",keyColumnName="sym", dedupColumnName="date");
writeData(1000);

// same query result across the cluster
select [HINT_SNAPSHOT] * from loadTable("dfs://compoDB","pt");