pivot by

pivot by is a unique feature in DolphinDB and an extension to the ANSI SQL. It rearranges a column (or multiple columns) of a table (with or without a data transformation function) on two dimensions. If the pivot by clause includes more than 2 columns, the last column determines the columns in the result and all other columns determine the rows in the result. It produces a table with select statement and a matrix with exec statement. When used in conjunction with the pivot by clause, the select clause can specify multiple columns. See pivot and unpivot.

The following versions contain enhancements for the pivot by statement:


Version	Enhancements
1.20.4	1. If the pivot by clause includes more than 2 columns, the last column determines the columns in the result and all other columns determine the rows in the result. 2. When used in conjunction with the pivot by clause, the select clause can specify multiple columns.
2.00.9	As of version 1.30.21/2.00.9, query performance has been optimized for `pivot by` statements which satisfy both of the following conditions: None of the following functions are applied to the queried columns: aggregate functions (e.g., `sum`), order-sensitive functions (e.g., `last`), row-based functions (e.g., `rowMin`), filling functions (e.g., `ffill`). The last column specified in the `pivot by` clause is a partitioning column.
2.00.10.4	`pivot by` supports querying array vector columns in the `select` statement, and applying aggregate functions (except row-based columns) on array vector columns. It also supports applying aggregate functions that return scalars (such as `max`, `sum` etc.) on array vector columns in the `exec` clause (see Example 6 for details). When there exist multiple identical combinations of pivot-by column values, all data will be retained without deduplication after applying the `asis` function to at least one query column if all the following conditions are met: The queried table is a DFS table. The `from` clause does not contain table joins. The first n-1 pivot-by columns cannot be of data type DECIMAL128. The last pivot-by column is a partitioning column. The selected columns do not contain null values. Aggregate functions, order-sensitive functions, row-based functions, or fill functions are not used in the `select` clause. The table generated by the `pivot by` clause can be fully populated by original data, which means: The number of rows for each combination of values in the first n-1 pivot-by columns (which determine the rows in the output table) does not need to be exactly equal. For each combination of values in the first n-1 pivot-by columns, the number of values in the last pivot-by column (which determines the columns in the output table) must be the same.

Examples

Example 1

sym = `C`MS`MS`MS`IBM`IBM`C`C`C
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
timestamp = [09:34:07,09:35:42,09:36:51,09:36:59,09:35:47,09:36:26,09:34:16,09:35:26,09:36:12]
t2 = table(timestamp, sym, qty, price);

t2;


timestamp	sym	qty	price
09:34:07	C	2200	49.6
09:35:42	MS	1900	29.46
09:36:51	MS	2100	29.52
09:36:59	MS	3200	30.02
09:35:47	IBM	6800	174.97
09:36:26	IBM	5400	175.23
09:34:16	C	1300	50.76
09:35:26	C	2500	50.32
09:36:12	C	8800	51.29

select price from t2 pivot by timestamp, sym;


timestamp	C	IBM	MS
09:34:07	49.6
09:34:16	50.76
09:35:26	50.32
09:35:42			29.46
09:35:47		174.97
09:36:12	51.29
09:36:26		175.23
09:36:51			29.52
09:36:59			30.02

a=select last(price) from t2 pivot by timestamp.minute(), sym;

a;


minute_timestamp	C	IBM	MS
09:34m	50.76
09:35m	50.32	174.97	29.46
09:36m	51.29	175.23	30.02

typestr a;
// output: TABLE

b=exec count(price) from t2 pivot by timestamp.minute(), sym;
b;


label	C	IBM	MS
09:34m	2
09:35m	1	1	1
09:36m	1	1	2

typestr b;
// output: FAST DOUBLE MATRIX

Example 2. Calculate the basket value of an ETF's constituents. For simplicity, assume an ETF has 2 constituents AAPL and FB.

symbol=take(`AAPL, 6) join take(`FB, 5)
time=2019.02.27T09:45:01.000000000+[146, 278, 412, 445, 496, 789, 212, 556, 598, 712, 989]
price=173.27 173.26 173.24 173.25 173.26 173.27 161.51 161.50 161.49 161.50 161.51
quotes=table(symbol, time, price)
weights=dict(`AAPL`FB, 0.6 0.4)
ETF = select symbol, time, price*weights[symbol] as price from quotes;

select price from ETF pivot by time, symbol;


Time	AAPL	FB
2019.02.27T09:45:01.000000146	103.962
2019.02.27T09:45:01.000000212		64.604
2019.02.27T09:45:01.000000278	103.956
2019.02.27T09:45:01.000000412	103.944
2019.02.27T09:45:01.000000445	103.95
2019.02.27T09:45:01.000000496	103.956
2019.02.27T09:45:01.000000556		64.6
2019.02.27T09:45:01.000000598		64.596
2019.02.27T09:45:01.000000712		64.6
2019.02.27T09:45:01.000000789	103.962
2019.02.27T09:45:01.000000989		64.604

To calculate the basket value of this ETF's constituents at each timestamp that any of its constituents had a trade, we need to forward fill each stock's price in the pivot table, and then sum each row's weighted price. They can be completed in just one SQL statement with "pivot by".

select rowSum(ffill(price)) from ETF pivot by time, symbol;


Time	rowSum
2019.02.27T09:45:01.000000146	103.962
2019.02.27T09:45:01.000000212	168.566
2019.02.27T09:45:01.000000278	168.56
2019.02.27T09:45:01.000000412	168.548
2019.02.27T09:45:01.000000445	168.554
2019.02.27T09:45:01.000000496	168.56
2019.02.27T09:45:01.000000556	168.556
2019.02.27T09:45:01.000000598	168.552
2019.02.27T09:45:01.000000712	168.556
2019.02.27T09:45:01.000000789	168.562
2019.02.27T09:45:01.000000989	168.566

Example 3. The names of the newly created columns from a pivot by clause will be automatically adjusted.

Before version 2.00.2, column names in DolphinDB can only use letters, digits or underscores (_), and must start with letters.

Since version 2.00.2, column names generated by pivot by, addColumn can contain special characters or start with digits.

For details, please refer to section Creating tables in Table.

date = take(2021.08.01  2021.08.02 2021.08.03, 12)
sym = take(["IBM N", "_MSFTN", "3_GOOGS", ""], 12).sort()
value = 1..12
t=table(date, sym, value)
re = select value from t pivot by date, sym


date	NULL	3_GOOGS	IBM N	_MSFTN
2021.08.01	1	4	7	10
2021.08.02	2	5	8	11
2021.08.03	3	6	9	12

select  _"NULL" from re


NULL
1
2
3

select  _"3_GOOGS" from re


3_GOOGS
4
5
6

In the following example, set removeSpecialCharInColumnName = true. The stock symbols "600300" and "600600" are adjusted to "c600300" and "c600600" as column names.

symbol=take(`600300, 6) join take(`600600, 5)
time=2019.02.27T09:45:01.000000000+[146, 278, 412, 445, 496, 789, 212, 556, 598, 712, 989]
price=173.27 173.26 173.24 173.25 173.26 173.27 161.51 161.50 161.49 161.50 161.51
quotes=table(symbol, time, price)
weights=dict(`600300`600600, 0.6 0.4)
ETF = select symbol, time, price*weights[symbol] as price from quotes;

select last(price) from ETF pivot by time, symbol;


time	C600300	C600600
2019.02.27T09:45:01.000000146	103.962
2019.02.27T09:45:01.000000212		64.604
2019.02.27T09:45:01.000000278	103.956
2019.02.27T09:45:01.000000412	103.944
2019.02.27T09:45:01.000000445	103.95
2019.02.27T09:45:01.000000496	103.956
2019.02.27T09:45:01.000000556		64.6
2019.02.27T09:45:01.000000598		64.596
2019.02.27T09:45:01.000000712		64.6
2019.02.27T09:45:01.000000789	103.962
2019.02.27T09:45:01.000000989		64.604

Example 4. The pivot by clause includes more than 2 columns.

date = 2020.09.21 + 0 0 0 0 1 1 1 1
sym = `MS`MS`GS`GS`MS`MS`GS`GS$SYMBOL
factorNum = 1 2 1 2 1 2 1 2
factorValue = 1.2 -3.4 -2.5 6.3 1.1 -3.2 -2.1 5.6
t = table(date, sym, factorNum, factorValue);

t;


date	sym	factorNum	factorValue
2020.09.21	MS	1	1.2
2020.09.21	MS	2	-3.4
2020.09.21	GS	1	-2.5
2020.09.21	GS	2	6.3
2020.09.22	MS	1	1.1
2020.09.22	MS	2	-3.2
2020.09.22	GS	1	-2.1
2020.09.22	GS	2	5.6

select factorValue from t pivot by date, sym, factorNum;


date	sym	1	2
2020.09.21	GS	-2.5	6.3
2020.09.21	MS	1.2	-3.4
2020.09.22	GS	-2.1	5.6
2020.09.22	MS	1.1	-3.2

Select multiple columns for each factorNum

select factorValue, factorValue>0 as factorSign from t pivot by date, sym, factorNum;


date	sym	factorValue_1	factorValue_2	factorSign_1	factorSign_2
2020.09.21	GS	-2.5	6.3	0	1
2020.09.21	MS	1.2	-3.4	1	0
2020.09.22	GS	-2.1	5.6	0	1
2020.09.22	MS	1.1	-3.2	1	0

Example 5. Retain duplicate values after applying function asis

In this example, the following conditions are met:

table pt is a DFS table
The from clause does not contain table joins.
The first n-1 pivot-by column (i.e., the time column) does not contain DECIMAL128 values.
The last pivot-by column (i.e., the sym column) is a partitioning column.
The selected column value does not contain null values. Aggregate functions, order-sensitive functions, row-based functions, or fill functions are not used in the select clause.
The table generated by the pivot by clause can be fully populated. In this example, for each distinct value in the time column, the number of values in the last pivot-by column sym is always equal. When time=10:20:44, the sym A, B, and C all have one record; When time=10:20:45, the sym A, B, and C all have one record; When time=10:20:46, the sym A, B, and C all have two records.

Apply function asis to the value column, all data can be retained:

time = [10:20:44,10:20:44,10:20:44,10:20:45,10:20:45,10:20:45,10:20:46,10:20:46,10:20:46,10:20:46,10:20:46,10:20:46]
sym = ["A","B","C","A","B","C","A","B","C","A","B","C"]
value = [510,434,999,837,402,615,495,885,745,968,975,165]
t = table(time,sym,value)

db = database("dfs://test_pivot",VALUE,`A`B`C)
pt = db.createPartitionedTable(t,`pt,`sym)
pt.append!(t)

select value from pt pivot by time,sym


time	A	B	C
10:20:44	510	434	999
10:20:45	837	402	615
10:20:46	968	975	165

select asis(value) from pt pivot by time,sym


time	A	B	C
10:20:44	510	434	999
10:20:45	837	402	615
10:20:46	495	885	745
10:20:46	968	975	165

Example 6. Use pivot by with select or exec clause to query and aggregate array vector columns.

Create an in-memory table where the column val1 contains array vectors:

id = `A`B`B`B`C`C`D`D`D
val1 = array(DOUBLE[], 0, 10).append!([1.0, 2.1 2.2 2.4, 2.1 2.2 2.4, 3, 5, 1.6 7, 8.9 100, 2.3, 1.1 4 5.1 4]);
val2 = 2200 1900 2100 3200 6800 5400 1300 2500 8800
timestamp = [09:34:07,09:35:42,09:36:51,09:36:59,09:35:47,09:36:26,09:34:16,09:35:26,09:36:12]
t = table(timestamp, id, val1, val2);

// Query the val1 value for each id every minute
select val1 from t pivot by timestamp.minute(), id;


minute_timestamp	A	B	C	D
09:34m	[1.0000]	[]	[]	[8.9000,100.0000]
09:35m	[]	[2.1000,2.2000,2.4000]	[5.0000]	[2.3000]
09:36m	[]	[3.0000]	[1.6000,7.0000]	[1.1000,4.0000,5.1000,4.0000]

Apply aggregation on val1 with select and return a table:

select max(val1) from t pivot by timestamp.minute(), id;


minute_timestamp	A	B	C	D
09:34m	1.0000			100.0000
09:35m		2.4000	5.0000	2.3000
09:36m		3.0000	7.0000	5.1000

Apply aggregation on val1 with exec and return a matrix:

exec max(val1) from t pivot by timestamp.minute(), id;


label	A	B	C	D
09:34m	1.0000			100.0000
09:35m		2.4000	5.0000	2.3000
09:36m		3.0000	7.0000	5.1000