aggrTopN

Syntax

aggrTopN(func, funcArgs, sortingCol, top, [ascending=true])

Parameters

func is an aggregate function.

funcArgs are the parameters of func. It can be a scalar or vector. It is a tuple if there are more than 1 parameter of func.

sortingCol is a numeric/temporal vector, based on which funcArgs are sorted.

top is an integer or floating-point number.

  • If it is an integer, select the first top rows of records for calculation.

  • If it is a floating-point number, the value should be less than 1.0 to indicate a percentage. The function will select top of the rows in funcArgs for calculation. If the result is less than 1, select the first row. If the result is not an integer, it is rounded down and at least one row is selected.

ascending is a Boolean value indicating whether to sort the sortingCol in ascending order. It is an optional parameter and the default value is true.

Details

After sorting funcArgs based on sortingCol, aggrTopN applies func to the first top elements in funcArgs. Null values in sortingCol are treated as the minimum value.

Examples

Example 1 Calculate the average of price corresponding to the top 3 records with the largest volume.

price = 10.0 20.0 30.0 40.0 50.0
volume = 5 2 5 1 4
aggrTopN(func=avg, funcArgs=price, sortingCol=volume, top=3, ascending=false)
// output: 30

Example 2 Calculate the correlation between price and volume for the top 3 records with the largest volume. In this case, func is a binary aggregation function, and funcArgs should be passed as a tuple.

volume = 2 3 4 5 3
price = 10.0 20.0 30.0 40.0 50.0
aggrTopN(func=corr, funcArgs=(volume, price), sortingCol=2 3 4 5 3, top=3, ascending=true)
// output: 0.69338

Example 3

Calculate the average of price corresponding to the top N records with the largest volume. Here, N can be specified either directly as an integer (top = N) or as a proportion (top = N / size).

price = 10 20 30 40 50
volume = 1 2 3 4 5

// top=3 (integer): take the first 3 rows
aggrTopN(func=avg, funcArgs=price, sortingCol=volume, top=3)
// take [10, 20, 30], output: 20

// top=0.6 (float): take the first floor(5*0.6)=3 rows
aggrTopN(func=avg, funcArgs=price, sortingCol=volume, top=0.6)
// take [10, 20, 30], output: 20

// top=0.5 (float): take the first floor(5*0.5)=2 rows (rounded down)
aggrTopN(func=avg, funcArgs=price, sortingCol=volume, top=0.5)
// take [10, 20], output: 15

// top=0.01 (float): 5*0.01=0.05, floor to 0, but at least 1 row is selected
aggrTopN(func=sum, funcArgs=price, sortingCol=volume, top=0.01)
// take [10], output: 10

Example 4

For each stock on each day, calculate the average of the value corresponding to the smallest 40% of the factor.

trade_date=sort(take(2017.01.11..2017.01.12,20))
secu_code=take(`600570`600000,20)
value=1..20
tb=table(trade_date,secu_code,value)  
trade_date secu_code value
2017.01.11 600570 1
2017.01.11 600000 2
2017.01.11 600570 3
2017.01.11 600000 4
2017.01.11 600570 5
2017.01.11 600000 6
2017.01.11 600570 7
2017.01.11 600000 8
2017.01.11 600570 9
2017.01.11 600000 10
2017.01.12 600570 11
2017.01.12 600000 12
2017.01.12 600570 13
2017.01.12 600000 14
2017.01.12 600570 15
2017.01.12 600000 16
2017.01.12 600570 17
2017.01.12 600000 18
2017.01.12 600570 19
2017.01.12 600000 20
select aggrTopN(avg, funcArgs=value, sortingCol=value, top=0.4, ascending=true) as factor_value from tb group by trade_date,secu_code
trade_date secu_code factor_value
2017.01.11 600000 3
2017.01.11 600570 2
2017.01.12 600000 13
2017.01.12 600570 12