aggrTopN
Syntax
aggrTopN(func, funcArgs, sortingCol, top, [ascending=true])
Parameters
func is an aggregate function.
funcArgs are the parameters of func. It can be a scalar or vector. It is a tuple if there are more than 1 parameter of func.
sortingCol is a numeric/temporal vector, based on which funcArgs are sorted.
top is an integer or floating-point number.
-
If it is an integer, select the first top rows of records for calculation.
-
If it is a floating-point number, the value should be less than 1.0 to indicate a percentage. The function will select top of the rows in funcArgs for calculation. If the result is less than 1, select the first row. If the result is not an integer, it is rounded down and at least one row is selected.
ascending is a Boolean value indicating whether to sort the sortingCol in ascending order. It is an optional parameter and the default value is true.
Details
After sorting funcArgs based on sortingCol, aggrTopN
applies func to the first top elements in funcArgs. Null values
in sortingCol are treated as the minimum value.
Examples
Example 1 Calculate the average of price corresponding to the top 3 records with the largest volume.
price = 10.0 20.0 30.0 40.0 50.0
volume = 5 2 5 1 4
aggrTopN(func=avg, funcArgs=price, sortingCol=volume, top=3, ascending=false)
// output: 30
Example 2 Calculate the correlation between price and volume for the top 3 records with the largest volume. In this case, func is a binary aggregation function, and funcArgs should be passed as a tuple.
volume = 2 3 4 5 3
price = 10.0 20.0 30.0 40.0 50.0
aggrTopN(func=corr, funcArgs=(volume, price), sortingCol=2 3 4 5 3, top=3, ascending=true)
// output: 0.69338
Example 3
Calculate the average of price corresponding to the top N records with the
largest volume. Here, N can be specified either directly as an integer
(top = N) or as a proportion (top = N /
size).
price = 10 20 30 40 50
volume = 1 2 3 4 5
// top=3 (integer): take the first 3 rows
aggrTopN(func=avg, funcArgs=price, sortingCol=volume, top=3)
// take [10, 20, 30], output: 20
// top=0.6 (float): take the first floor(5*0.6)=3 rows
aggrTopN(func=avg, funcArgs=price, sortingCol=volume, top=0.6)
// take [10, 20, 30], output: 20
// top=0.5 (float): take the first floor(5*0.5)=2 rows (rounded down)
aggrTopN(func=avg, funcArgs=price, sortingCol=volume, top=0.5)
// take [10, 20], output: 15
// top=0.01 (float): 5*0.01=0.05, floor to 0, but at least 1 row is selected
aggrTopN(func=sum, funcArgs=price, sortingCol=volume, top=0.01)
// take [10], output: 10
Example 4
For each stock on each day, calculate the average of the value corresponding to the smallest 40% of the factor.
trade_date=sort(take(2017.01.11..2017.01.12,20))
secu_code=take(`600570`600000,20)
value=1..20
tb=table(trade_date,secu_code,value)
| trade_date | secu_code | value |
|---|---|---|
| 2017.01.11 | 600570 | 1 |
| 2017.01.11 | 600000 | 2 |
| 2017.01.11 | 600570 | 3 |
| 2017.01.11 | 600000 | 4 |
| 2017.01.11 | 600570 | 5 |
| 2017.01.11 | 600000 | 6 |
| 2017.01.11 | 600570 | 7 |
| 2017.01.11 | 600000 | 8 |
| 2017.01.11 | 600570 | 9 |
| 2017.01.11 | 600000 | 10 |
| 2017.01.12 | 600570 | 11 |
| 2017.01.12 | 600000 | 12 |
| 2017.01.12 | 600570 | 13 |
| 2017.01.12 | 600000 | 14 |
| 2017.01.12 | 600570 | 15 |
| 2017.01.12 | 600000 | 16 |
| 2017.01.12 | 600570 | 17 |
| 2017.01.12 | 600000 | 18 |
| 2017.01.12 | 600570 | 19 |
| 2017.01.12 | 600000 | 20 |
select aggrTopN(avg, funcArgs=value, sortingCol=value, top=0.4, ascending=true) as factor_value from tb group by trade_date,secu_code
| trade_date | secu_code | factor_value |
|---|---|---|
| 2017.01.11 | 600000 | 3 |
| 2017.01.11 | 600570 | 2 |
| 2017.01.12 | 600000 | 13 |
| 2017.01.12 | 600570 | 12 |
