pivot by

pivot by is a unique feature in DolphinDB and an extension to the ANSI SQL. It rearranges a column (or multiple columns) of a table (with or without a data transformation function) on two dimensions. If the pivot by clause includes more than 2 columns, the last column determines the columns in the result and all other columns determine the rows in the result. It produces a table with select statement and a matrix with exec statement. When used in conjunction with the pivot by clause, the select clause can specify multiple columns. See pivot and unpivot .

The following versions contain enhancements for the pivot by statement:


Version	Enhancements
1.20.4	1. If the pivot by clause includes more than 2 columns, the last column determines the columns in the result and all other columns determine the rows in the result. 2. When used in conjunction with the pivot by clause, the select clause can specify multiple columns.

Examples

Example 1

sym = `C`MS`MS`MS`IBM`IBM`C`C`C
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
timestamp = [09:34:07,09:35:42,09:36:51,09:36:59,09:35:47,09:36:26,09:34:16,09:35:26,09:36:12]
t2 = table(timestamp, sym, qty, price);

t2;


timestamp	sym	qty	price
09:34:07	C	2200	49.6
09:35:42	MS	1900	29.46
09:36:51	MS	2100	29.52
09:36:59	MS	3200	30.02
09:35:47	IBM	6800	174.97
09:36:26	IBM	5400	175.23
09:34:16	C	1300	50.76
09:35:26	C	2500	50.32
09:36:12	C	8800	51.29

select price from t2 pivot by timestamp, sym;


timestamp	C	IBM	MS
09:34:07	49.6
09:34:16	50.76
09:35:26	50.32
09:35:42			29.46
09:35:47		174.97
09:36:12	51.29
09:36:26		175.23
09:36:51			29.52
09:36:59			30.02

a=select last(price) from t2 pivot by timestamp.minute(), sym;

a;


minute_timestamp	C	IBM	MS
09:34m	50.76
09:35m	50.32	174.97	29.46
09:36m	51.29	175.23	30.02

typestr a;
// output
TABLE

b=exec count(price) from t2 pivot by timestamp.minute(), sym;
b;


label	C	IBM	MS
09:34m	2
09:35m	1	1	1
09:36m	1	1	2

typestr b;
// output
FAST DOUBLE MATRIX

Example 2. Calculate the basket value of an ETF's constituents. For simplicity, assume an ETF has 2 constituents AAPL and FB.

symbol=take(`AAPL, 6) join take(`FB, 5)
time=2019.02.27T09:45:01.000000000+[146, 278, 412, 445, 496, 789, 212, 556, 598, 712, 989]
price=173.27 173.26 173.24 173.25 173.26 173.27 161.51 161.50 161.49 161.50 161.51
quotes=table(symbol, time, price)
weights=dict(`AAPL`FB, 0.6 0.4)
ETF = select symbol, time, price*weights[symbol] as price from quotes;

select price from ETF pivot by time, symbol;


Time	AAPL	FB
2019.02.27T09:45:01.000000146	103.962
2019.02.27T09:45:01.000000212		64.604
2019.02.27T09:45:01.000000278	103.956
2019.02.27T09:45:01.000000412	103.944
2019.02.27T09:45:01.000000445	103.95
2019.02.27T09:45:01.000000496	103.956
2019.02.27T09:45:01.000000556		64.6
2019.02.27T09:45:01.000000598		64.596
2019.02.27T09:45:01.000000712		64.6
2019.02.27T09:45:01.000000789	103.962
2019.02.27T09:45:01.000000989		64.604

To calculate the basket value of this ETF's constituents at each timestamp that any of its constituents had a trade, we need to forward fill each stock's price in the pivot table, and then sum each row's weighted price. They can be completed in just one SQL statement with "pivot by".

select rowSum(ffill(price)) from ETF pivot by time, symbol;


Time	rowSum
2019.02.27T09:45:01.000000146	103.962
2019.02.27T09:45:01.000000212	168.566
2019.02.27T09:45:01.000000278	168.56
2019.02.27T09:45:01.000000412	168.548
2019.02.27T09:45:01.000000445	168.554
2019.02.27T09:45:01.000000496	168.56
2019.02.27T09:45:01.000000556	168.556
2019.02.27T09:45:01.000000598	168.552
2019.02.27T09:45:01.000000712	168.556
2019.02.27T09:45:01.000000789	168.562
2019.02.27T09:45:01.000000989	168.566

Example 3. The names of the newly created columns from a pivot by clause will be automatically adjusted.

Before version 1.30.14, column names in DolphinDB can only use letters, digits or underscores (_), and must start with letters.

Since version 1.30.14, column names generated by pivot by, addColumn can contain special characters or start with digits.

For details, please refer to section Creating tables in Table.

date = take(2021.08.01  2021.08.02 2021.08.03, 12)
sym = take(["IBM N", "_MSFTN", "3_GOOGS", ""], 12).sort()
value = 1..12
t=table(date, sym, value)
re = select value from t pivot by date, sym


date	NULL	3_GOOGS	IBM N	_MSFTN
2021.08.01	1	4	7	10
2021.08.02	2	5	8	11
2021.08.03	3	6	9	12

select  _"NULL" from re


NULL
1
2
3

select  _"3_GOOGS" from re


3_GOOGS
4
5
6

In the following example, set removeSpecialCharInColumnName = true. The stock symbols "600300" and "600600" are adjusted to "c600300" and "c600600" as column names.

symbol=take(`600300, 6) join take(`600600, 5)
time=2019.02.27T09:45:01.000000000+[146, 278, 412, 445, 496, 789, 212, 556, 598, 712, 989]
price=173.27 173.26 173.24 173.25 173.26 173.27 161.51 161.50 161.49 161.50 161.51
quotes=table(symbol, time, price)
weights=dict(`600300`600600, 0.6 0.4)
ETF = select symbol, time, price*weights[symbol] as price from quotes;

select last(price) from ETF pivot by time, symbol;


time	C600300	C600600
2019.02.27T09:45:01.000000146	103.962
2019.02.27T09:45:01.000000212		64.604
2019.02.27T09:45:01.000000278	103.956
2019.02.27T09:45:01.000000412	103.944
2019.02.27T09:45:01.000000445	103.95
2019.02.27T09:45:01.000000496	103.956
2019.02.27T09:45:01.000000556		64.6
2019.02.27T09:45:01.000000598		64.596
2019.02.27T09:45:01.000000712		64.6
2019.02.27T09:45:01.000000789	103.962
2019.02.27T09:45:01.000000989		64.604

Example 4. The pivot by clause includes more than 2 columns.

date = 2020.09.21 + 0 0 0 0 1 1 1 1
sym = `MS`MS`GS`GS`MS`MS`GS`GS$SYMBOL
factorNum = 1 2 1 2 1 2 1 2
factorValue = 1.2 -3.4 -2.5 6.3 1.1 -3.2 -2.1 5.6
t = table(date, sym, factorNum, factorValue);

t;


date	sym	factorNum	factorValue
2020.09.21	MS	1	1.2
2020.09.21	MS	2	-3.4
2020.09.21	GS	1	-2.5
2020.09.21	GS	2	6.3
2020.09.22	MS	1	1.1
2020.09.22	MS	2	-3.2
2020.09.22	GS	1	-2.1
2020.09.22	GS	2	5.6

select factorValue from t pivot by date, sym, factorNum;


date	sym	1	2
2020.09.21	GS	-2.5	6.3
2020.09.21	MS	1.2	-3.4
2020.09.22	GS	-2.1	5.6
2020.09.22	MS	1.1	-3.2

Select multiple columns for each factorNum

select factorValue, factorValue>0 as factorSign from t pivot by date, sym, factorNum;


date	sym	factorValue_1	factorValue_2	factorSign_1	factorSign_2
2020.09.21	GS	-2.5	6.3	0	1
2020.09.21	MS	1.2	-3.4	1	0
2020.09.22	GS	-2.1	5.6	0	1
2020.09.22	MS	1.1	-3.2	1	0