contextby
Syntax
contextby(func, funcArgs, groupingCol, [sortingCol], [semanticFilter=1],
[containArgsOfHighOrderFunc=false])
or
funcArg func:X groupingCol
or
func:X(funcArgs, groupingCol, [sortingCol], [semanticFilter=1])
Details
Calculates func(funcArgs) for each groupingCol group. The
result is a vector of the same size as each of the input arguments other than
func. If func is an aggregate function, all elements within the
same group have identical result. We can use sortingCol to sort the
within-group data before the calculation.
Note: The keyword defg must be used to declare an aggregate
function.
Parameters
func is a function. For the second use case, func can only have one parameter (funcArg).
funcArgs are the parameters of func. It is a tuple if there are more than 1 parameter of func. When it is a tuple, its elements must be vectors, matrices, or tables of the same length.
groupingCol is the grouping variable(s). It can be one or multiple vectors.
sortingCol is an optional argument for within group sorting before applying func.
semanticFilter (optional) is a positive integer indicating which columns to include in calculations when funcArgs is a table. The possible values are:
- 0 - All columns
- 1 (default) - Columns of FLOATING, INTEGRAL, and DECIMAL categories, excluding the COMPRESSED data type.
- 2 - Columns of TEMPORAL category.
- 3 - Columns of LITERAL category, excluding the BLOB data type.
- 4 - Columns of FLOATING, INTEGRAL, DECIMAL, and TEMPORAL categories, excluding the COMPRESSED data type.
The vectors in groupingCol, sortingCol and each of the function argument in funcArgs all have the same size.
containArgsOfHighOrderFunc (optional) is a BOOL scalar. When funcArgs is a tuple and some of its elements are also tuples, if set to true, each element of the nested tuples in funcArgs will be processed separately; if set to false (default), the nested tuples in funcArgs will be processed as a whole (vector) without expanding the nested structure.
Examples
Example 1: Define four vectors sym, price, qty, and trade_date, representing stock
symbols, stock prices, trading volumes, and trading dates, respectively. Use
contextby to perform grouped computation.
sym=`IBM`IBM`IBM`MS`MS`MS
price=172.12 170.32 175.25 26.46 31.45 29.43
qty=5800 700 9000 6300 2100 5300
trade_date=2013.05.08 2013.05.06 2013.05.07 2013.05.08 2013.05.06 2013.05.07;
Group by stock symbol and compute the average stock price within each group.
contextby(avg, price, sym);
// Equivalent to: price avg :X sym;
// Output: [172.563,172.563,172.563,29.113,29.113,29.113]
Select all stock prices that are higher than the average price of their respective stock.
price at price>contextby(avg, price,sym);
// Output: [175.25,31.45,29.43]
Select the stock symbols whose prices are higher than the average price of their respective stock.
sym at price>contextby(avg, price,sym);
// Output: ["IBM","MS","MS"]
Group by stock symbol and compute the weighted average price within each group, using trading volume as the weight.
contextby(wavg, [price, qty], sym);
// Output: [173.856,173.856,173.856,28.374,28.374,28.374]
Group by stock symbol and compute the daily price change rate relative to the previous day within each group’s time series. In the example below, sortingCol=trade_date ensures that the data in each stock group is sorted in ascending order by trade_date before computing the price change rate.
contextby(ratios, price, sym, trade_date) - 1;
// Output: [-0.01786,,0.028946,-0.100917,,-0.064229]
Example 2: Specify groupingCol as multiple vectors to perform a two-level grouping by stock symbol and trading date. Compute the cumulative trading volume for each stock on each trading day.
sym=`IBM`IBM`IBM`IBM`IBM`IBM`MS`MS`MS`MS`MS`MS
date=2020.12.01 + 0 0 0 1 1 1 0 0 0 1 1 1
qty=5800 700 9000 1000 3500 3900 6300 2100 5300 7800 1200 4300
contextby(cumsum, qty, [sym,date]);
// Output: [5800,6500,15500,1000,4500,8400,6300,8400,13700,7800,9000,13300]
Example 3: When funcArgs is a table, semanticFilter controls which columns are included in the computation. This example demonstrates:
- semanticFilter=1: compute only on numeric columns;
- semanticFilter=2: compute only on time columns.
dailyOps = table(
[2024.06.01, 2024.06.01, 2024.06.02, 2024.06.02] as bizDate,
[2024.06.01T20:00:00.000, 2024.06.01T20:05:00.000, 2024.06.02T20:00:00.000, 2024.06.02T20:05:00.000] as snapshotTime,
`East`East`South`South as region,
["store_101","store_102","store_201","store_202"] as storeId,
[23500.5, 19880.0, 30210.3, 28760.4] as salesAmt,
[128, 103, 156, 149] as orderCount
)
dailyOps;
The original data of dailyOps is as follows::
| bizDate | snapshotTime | region | storeId | salesAmt | orderCount |
|---|---|---|---|---|---|
| 2024.06.01 | 2024.06.01 20:00:00.000 | East | store_101 | 23500.5 | 128 |
| 2024.06.01 | 2024.06.01 20:05:00.000 | East | store_102 | 19880 | 103 |
| 2024.06.02 | 2024.06.02 20:00:00.000 | South | store_201 | 30210.3 | 156 |
| 2024.06.02 | 2024.06.02 20:05:00.000 | South | store_202 | 28760.4 | 149 |
Set semanticFilter=1 to add 100 to the two numeric columns, salesAmt and orderCount, while keeping the other columns unchanged.
contextby(add{,100}, dailyOps, dailyOps.region,,1)
Output:
| bizDate | snapshotTime | region | storeId | salesAmt | orderCount |
|---|---|---|---|---|---|
| 2024.06.01 | 2024.06.01 20:00:00.000 | East | store_101 | 23600.5 | 228 |
| 2024.06.01 | 2024.06.01 20:05:00.000 | East | store_102 | 19980 | 203 |
| 2024.06.02 | 2024.06.02 20:00:00.000 | South | store_201 | 30310.3 | 256 |
| 2024.06.02 | 2024.06.02 20:05:00.000 | South | store_202 | 28860.4 | 249 |
Set semanticFilter=2 to add 1 day to the two time columns, bizDate and snapshotTime, while keeping the other columns unchanged.
contextby(temporalAdd{,1d}, dailyOps, dailyOps.region,,2)
Output:
| bizDate | snapshotTime | region | storeId | salesAmt | orderCount |
|---|---|---|---|---|---|
| 2024.06.02 | 2024.06.02 20:00:00.000 | East | store_101 | 23500.5 | 128 |
| 2024.06.02 | 2024.06.02 20:05:00.000 | East | store_102 | 19880.0 | 103 |
| 2024.06.03 | 2024.06.03 20:00:00.000 | South | store_201 | 30210.3 | 156 |
| 2024.06.03 | 2024.06.03 20:05:00.000 | South | store_202 | 28760.4 | 149 |
Example 4: Use contextby in SQL queries.
sym=`IBM`IBM`IBM`MS`MS`MS
price=172.12 170.32 175.25 26.46 31.45 29.43
qty=5800 700 9000 6300 2100 5300
trade_date=2013.05.08 2013.05.06 2013.05.07 2013.05.08 2013.05.06 2013.05.07;
t1=table(trade_date,sym,qty,price);
t1;
| trade_date | sym | qty | price |
|---|---|---|---|
| 2013.05.08 | IBM | 5800 | 172.12 |
| 2013.05.06 | IBM | 700 | 170.32 |
| 2013.05.07 | IBM | 9000 | 175.25 |
| 2013.05.08 | MS | 6300 | 26.46 |
| 2013.05.06 | MS | 2100 | 31.45 |
| 2013.05.07 | MS | 5300 | 29.43 |
Select the trading records where the stock price is higher than the group’s average price.
select trade_date, sym, qty, price from t1 where price > contextby(avg, price,sym);
| trade_date | sym | qty | price |
|---|---|---|---|
| 2013.05.07 | IBM | 9000 | 175.25 |
| 2013.05.06 | MS | 2100 | 31.45 |
| 2013.05.07 | MS | 5300 | 29.43 |
Add one day to all trading dates.
contextby(temporalAdd{,1d}, t1, t1.sym,,2)
| trade_date | sym | qty | price |
|---|---|---|---|
| 2013.05.09 | IBM | 5,800 | 172.12 |
| 2013.05.07 | IBM | 700 | 170.32 |
| 2013.05.08 | IBM | 9,000 | 175.25 |
| 2013.05.09 | MS | 6,300 | 26.46 |
| 2013.05.07 | MS | 2,100 | 31.45 |
| 2013.05.08 | MS | 5,300 | 29.43 |
Example 5: When funcArgs is a tuple:
sym=`IBM`IBM`IBM`MS`MS`MS
price=172.12 170.32 175.25 26.46 31.45 29.43
qty=5800 700 9000 6300 2100 5300
trade_date=2013.05.08 2013.05.06 2013.05.07 2013.05.08 2013.05.06 2013.05.07;
t = table(sym, price, qty, trade_date)
t = select * from t order by trade_date
defg avgUdf(price, qty){
return avg(price*qty)/sum(qty)
}
defg avgUdf1(price){
return avg(price)
}
// Normal
select contextby(func=tmoving{avgUdf1,,,3d}, funcArgs=(trade_date, price) , groupingCol=sym) from t;
// Normal
select contextby(func=tmoving{avgUdf,,,3d}, funcArgs=(trade_date,(price, qty) ), groupingCol=sym, containArgsOfHighOrderFunc=true) from t;
(trade_date, (price, qty)) is a tuple containing two elements:
- The first element is trade_date, representing the trading date column, which is a vector.
- The second element is (price, qty), which is itself a tuple containing two elements. If containArgsOfHighOrderFunc=false, (price, qty) is treated as a single unit (i.e., as one vector).
