sym = `C`MS`MS`MS`IBM`IBM`C`C`C$SYMBOL
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
t1 = table(timestamp, sym, qty, price);
t1;
timestamp |
sym |
qty |
price |
09:34:07 |
C |
2200 |
49.6 |
09:36:42 |
MS |
1900 |
29.46 |
09:36:51 |
MS |
2100 |
29.52 |
09:36:59 |
MS |
3200 |
30.02 |
09:32:47 |
IBM |
6800 |
174.97 |
09:35:26 |
IBM |
5400 |
175.2 |
09:34:16 |
C |
1300 |
50.76 |
09:34:26 |
C |
2500 |
50.32 |
09:38:12 |
C |
8800 |
51.29 |
select count(sym) as counts from t1 group by sym;
sym |
counts |
C |
4 |
MS |
3 |
IBM |
2 |
select top 2:3 count(sym) as counts from t1 group by sym;
select max(price), sym from t1 group by sym, minute(timestamp)
minute_timestamp |
max_price |
sym |
09:34m |
50.76 |
C |
09:36m |
30.02 |
MS |
09:32m |
174.97 |
IBM |
09:35m |
175.23 |
IBM |
09:38m |
51.29 |
C |
select avg(qty) from t1 group by sym;
sym |
avg_qty |
C |
3700 |
MS |
2400 |
IBM |
6100 |
select wavg(price, qty) as vwap, sum(qty) from t1 group by sym;
sym |
vwap |
sum_qty |
C |
50.828378 |
14800 |
IBM |
175.085082 |
12200 |
MS |
29.726389 |
7200 |
select wsum(price, qty) as dollarVolume, sum(qty) from t1 group by minute(timestamp) as ts;
ts |
dollarVolume |
sum_qty |
09:32m |
1.189796e+006 |
6800 |
09:34m |
300908 |
6000 |
09:35m |
946242 |
5400 |
09:36m |
214030 |
7200 |
09:38m |
451352 |
8800 |
select sum(qty) from t1 group by sym, timestamp.minute() as minute;
sym |
minute |
sum_qty |
C |
09:34m |
6000 |
C |
09:38m |
8800 |
IBM |
09:32m |
6800 |
IBM |
09:35m |
5400 |
MS |
09:36m |
7200 |
When group by is used with order by, the order by column(s) can be one or more of
group by columns, or one or more of the new columns generated by group by
calculation.
select sum(qty) from t1 group by sym, timestamp.minute() as minute order by minute;
sym |
minute |
sum_qty |
C |
09:00m |
1100 |
IBM |
09:32m |
6800 |
C |
09:34m |
6000 |
IBM |
09:35m |
5400 |
MS |
09:36m |
7200 |
C |
09:38m |
8800 |
In the examples above, the result of the function for each group is a scalar. In
other cases, a function may output a vector or a dictionary rather than a scalar.
For examples, the function stat outputs a dictionary, and the function ols outputs a
coefficient vector or a dictionary that includes parameter estimates, t-stat, R
square, etc. To output the results in multiple columns, we need to convert the
vector/dictionary output to multiple scalars. In other words, we need to convert a
composite column into multiple columns. This can be accomplished with "as" keyword
and a constant string vector for column names.
y=1..15
factor1=3.2 1.2 5.9 6.9 11.1 9.6 1.4 7.3 2.0 0.1 6.1 2.9 6.3 8.4 5.6
factor2=1.7 1.3 4.2 6.8 9.2 1.3 1.4 7.8 7.9 9.9 9.3 4.6 7.8 2.4 8.7
id=take(1 2 3, 15).sort();
t=table(id, y, factor1, factor2);
t;
id |
y |
factor1 |
factor2 |
1 |
1 |
3.2 |
1.7 |
1 |
2 |
1.2 |
1.3 |
1 |
3 |
5.9 |
4.2 |
1 |
4 |
6.9 |
6.8 |
1 |
5 |
11.1 |
9.2 |
2 |
6 |
9.6 |
1.3 |
2 |
7 |
1.4 |
1.4 |
2 |
8 |
7.3 |
7.8 |
2 |
9 |
2 |
7.9 |
2 |
10 |
0.1 |
9.9 |
3 |
11 |
6.1 |
9.3 |
3 |
12 |
2.9 |
4.6 |
3 |
13 |
6.3 |
7.8 |
3 |
14 |
8.4 |
2.4 |
3 |
15 |
5.6 |
8.7 |
select ols(y,(factor1,factor2),true,0) as `int`factor1`factor2 from t group by id;
id |
int |
factor1 |
factor2 |
1 |
1.063991 |
-0.258685 |
0.732795 |
2 |
6.886877 |
-0.148325 |
0.303584 |
3 |
11.833867 |
0.272352 |
-0.065526 |
select ols(y,(factor1,factor2),true,2).Coefficient.tstat[1:] as `t1`t2 from t group by id;
id |
t1 |
t2 |
1 |
-0.891868 |
2.253451 |
2 |
-5.73315 |
11.433117 |
3 |
0.510866 |
-0.183903 |
To skip certain elements from the output of the function, we can leave the names of
the corresponding elements empty.
select ols(y,(factor1,factor2),true,2).Coefficient.beta as ``factor1`factor2 from t group by id;
id |
factor1 |
factor2 |
1 |
-0.258685 |
0.732795 |
2 |
-0.148325 |
0.303584 |
3 |
0.272352 |
-0.065526 |
To customize the output, we can write a simple wrapper. In the following example, the
output includes coefficient estimates and R square:
def myols(y,x) {
r=ols(y,x,true,2)
return r.Coefficient.beta join r.RegressionStat.statistics[0]
}
select myols(y,(factor1,factor2)) as `int`factor1`factor2`R2 from t group by id;
id |
int |
factor1 |
factor2 |
R2 |
1 |
1.063991 |
-0.258685 |
0.732795 |
0.946056 |
2 |
6.886877 |
-0.148325 |
0.303584 |
0.992413 |
3 |
11.833867 |
0.272352 |
-0.065526 |
0.144837 |