DStream::parallelize

Syntax

DStream::parallelize(columnName, count)

Arguments

columnName A string specifying the name of the column used for partitioning.

count An integer representing the degree of parallelism.

Details

Partitions the stream data using a hash function based on the specified column. It generates multiple parallel DStream objects for concurrent processing. Null values in the columnName will be filtered out.

Note: The DStream::parallelize and DStream::sync methods must be called together.

Examples

Partition the stream data into four streams based on the symbol column for downstream calculations:

use catalog test

g = createStreamGraph(name)
g.source("trade", 1024:0, `symbol`datetime`price`volume, [SYMBOL, TIMESTAMP,DOUBLE, INT])
  .parallelize("symbol", 4)
  .timeSeriesEngine(60*1000, 60*1000, <[first(price),max(price),min(price),last(price),sum(volume)]>, "datetime", false, "symbol")
  .reactiveStateEngine(<[datetime, first_price, max_price, min_price, last_price, sum_volume, mmax(max_price, 5), mavg(sum_volume, 5)]>, `symbol)
  .sync()
  .sink("output")
.g.submit()