# context by {#context-by}

context by is a unique feature in DolphinDB. It is an extension to ANSI SQL for convenient time-series data manipulation.

The traditional relational database doesn't support time series data processing. In RDBMS, a table is a set of rows and the order of rows is not modeled. We can apply aggregate functions such as `min`, `max`, `avg`, `stdev` to a group of rows, but we can't use order sensitive aggregate functions such as `first`, `last`, etc, or order sensitive vector functions such as `cumsum`, `cummax`, `ratios`, `deltas` on groups of rows. DolphinDB supports time series data processing. The `context by` clause make it very convenient to perform time series data processing within each group.

Both `context by` and `group by` conduct grouping. However, with group by, each group returns a scalar value; with context by, each group returns a vector of the same size as the group's records. The group by clause can only be used with aggregate functions, whereas the context by clause can be used with aggregate functions, moving window functions, cumulative functions, etc. The context by clause is often used with update statement. The context by clause can also be used together with the having clause. Please refer to the section about having.

context by is often used together with time-series functions such as [cumsum](../../Functions/c/cumsum.md) and [mavg](../../Functions/m/mavg.md) etc. The results of these functions are affected by the order of rows within each context by group. We can use keyword csort after context by. csort sorts the rows within each context by group before conducting calculations in select statements. csort can use multiple columns including calculated columns and can sort in both ascending order \(asc, the default sorting order\) and descending order \(desc\). csort can be used with top clause to get the most recent observations within each group.

The example below illustrates the difference between group by and context by.

```
sym = `C`MS`MS`MS`IBM`IBM`C`C`C$SYMBOL
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
t1 = table(timestamp, sym, qty, price);

t1;
```

|timestamp|sym|qty|price|
|---------|---|---|-----|
|09:34:07|C|2200|49.6|
|09:36:42|MS|1900|29.46|
|09:36:51|MS|2100|29.52|
|09:36:59|MS|3200|30.02|
|09:32:47|IBM|6800|174.97|
|09:35:26|IBM|5400|175.23|
|09:34:16|C|1300|50.76|
|09:34:26|C|2500|50.32|
|09:38:12|C|8800|51.29|

```
select wavg(price,qty) as wvap, sum(qty) as totalqty from t1 group by sym;
```

|sym|wvap|totalqty|
|---|----|--------|
|C|50.828378|14800|
|IBM|175.085082|12200|
|MS|29.726389|7200|

```
select sym, price, qty, wavg(price,qty) as wvap, sum(qty) as totalqty from t1 context by sym;
```

|sym|price|qty|wvap|totalqty|
|---|-----|---|----|--------|
|C|49.6|2200|50.828378|14800|
|C|50.76|1300|50.828378|14800|
|C|50.32|2500|50.828378|14800|
|C|51.29|8800|50.828378|14800|
|IBM|174.97|6800|175.085082|12200|
|IBM|175.23|5400|175.085082|12200|
|MS|29.46|1900|29.726389|7200|
|MS|29.52|2100|29.726389|7200|
|MS|30.02|3200|29.726389|7200|

To calculate stock returns for each firm, we cannot use group by. Instead we can use context by. We need to make sure the records are sorted appropriately within each group before we use context by.

```
select sym, timestamp, price, eachPre(\,price)-1.0 as ret from t1 context by sym;
```

|sym|timestamp|price|ret|
|---|---------|-----|---|
|C|09:34:07|49.6||
|C|09:34:16|50.76|0.023387|
|C|09:34:26|50.32|-0.008668|
|C|09:38:12|51.29|0.019277|
|IBM|09:32:47|174.97||
|IBM|09:35:26|175.23|0.001486|
|MS|09:36:42|29.46||
|MS|09:36:51|29.52|0.002037|
|MS|09:36:59|30.02|0.016938|

We can use template function [contextby](../../Functions/Templates/contextby.md) for the same calculation, but the result is a vector instead of a table.

```
contextby(eachPre{ratio}, t1.price, t1.sym);
// output
[,,1.002037,1.016938,,1.001486,1.023387,0.991332,1.019277]
```

Here we use a partial application eachPre\{ratio\}. Please refer to [Partial Application](../FunctionalProgramming/PartialApplication.md) for details.

Calculate cumulative sum of trading volume for each stock in every minute:

```
select *, cumsum(qty) from t1 context by sym, timestamp.minute();
```

|timestamp|sym|qty|price|cumsum\_qty|
|---------|---|---|-----|-----------|
|09:34:07|C|2200|49.6|2200|
|09:34:16|C|1300|50.76|3500|
|09:34:26|C|2500|50.32|6000|
|09:38:12|C|8800|51.29|8800|
|09:32:47|IBM|6800|174.97|6800|
|09:35:26|IBM|5400|175.23|5400|
|09:36:42|MS|1900|29.46|1900|
|09:36:51|MS|2100|29.52|4000|
|09:36:59|MS|3200|30.02|7200|

Use top clause with context by clause:

```
select top 2 * from t1 context by sym;
```

|timestamp|sym|qty|price|
|---------|---|---|-----|
|09:34:07|C|2200|49.6|
|09:34:16|C|1300|50.76|
|09:32:47|IBM|6800|174.97|
|09:35:26|IBM|5400|175.23|
|09:36:42|MS|1900|29.46|
|09:36:51|MS|2100|29.52|

Please note that we cannot specify a range for the top clause when it is used with the context by clause:

```
select top 2:3 * from t1 context by sym;
//Syntax Error: [line #2] When top clause uses together with context clause in SQL query, can't specify a range in top clause
```

Use top clause and csort clause together with context by clause to get the most recent 2 records for each stock:

```
select top 2 * from t1 context by sym csort timestamp desc;
```

|timestamp|sym|qty|price|
|---------|---|---|-----|
|09:38:12|C|8800|51.29|
|09:34:26|C|2500|50.32|
|09:35:26|IBM|5400|175.23|
|09:32:47|IBM|6800|174.97|
|09:36:59|MS|3200|30.02|
|09:36:51|MS|2100|29.52|

Use context by clause together with limit clause to get the first or the last few records for each stock:

```
select * from t1 context by sym limit 2;
```

|timestamp|sym|qty|price|
|---------|---|---|-----|
|09:34:07|C|2200|49.6|
|09:36:42|MS|1900|29.46|
|09:36:51|MS|2100|29.52|
|09:32:47|IBM|6800|174.97|
|09:35:26|IBM|5400|175.23|
|09:34:16|C|1300|50.76|

Use context by clause together with csort and limit clause to get the last 2 records for each stock after sorting by qty:

```
select * from t1 context by sym limit -2;
```

|timestamp|sym|qty|price|
|---------|---|---|-----|
|09:36:51|MS|2100|29.52|
|09:36:59|MS|3200|30.02|
|09:32:47|IBM|6800|174.97|
|09:35:26|IBM|5400|175.23|
|09:34:26|C|2500|50.32|
|09:38:12|C|8800|51.29|

Use context by clause together with csort and limit clause to get the last 2 records for each stock after sorting by qty:

```
select * from t1 context by sym csort qty limit -2;
```

|timestamp|sym|qty|price|
|---------|---|---|-----|
|09:34:26|C|2500|50.32|
|09:38:12|C|8800|51.29|
|09:35:26|IBM|5400|175.23|
|09:32:47|IBM|6800|174.97|
|09:36:51|MS|2100|29.52|
|09:36:59|MS|3200|30.02|

Calculate fitted values of price from the regression of price on qty for each stock:

```
select *, ols(price, qty)[0]+ols(price, qty)[1]*qty as fittedPrice from t1 context by sym;
```

|timestamp|sym|qty|price|fittedPrice|
|---------|---|---|-----|-----------|
|09:34:07|C|2200|49.6|50.282221|
|09:34:16|C|1300|50.76|50.156053|
|09:34:26|C|2500|50.32|50.324277|
|09:38:12|C|8800|51.29|51.207449|
|09:32:47|IBM|6800|174.97|174.97|
|09:35:26|IBM|5400|175.23|175.23|
|09:36:42|MS|1900|29.46|29.447279|
|09:36:51|MS|2100|29.52|29.535034|
|09:36:59|MS|3200|30.02|30.017687|

The context by clause can be used with the order by clause. The order by columns must be among the output columns.

```
select *, ols(price, qty)[0]+ols(price, qty)[1]*qty as fittedPrice from t1 context by sym order by timestamp;
```

|timestamp|sym|qty|price|fittedPrice|
|---------|---|---|-----|-----------|
|09:32:47|IBM|6800|174.97|174.97|
|09:34:07|C|2200|49.6|50.075318|
|09:34:16|C|1300|50.76|49.911222|
|09:34:26|C|2500|50.32|50.130017|
|09:35:26|IBM|5400|175.23|175.23|
|09:36:42|MS|1900|29.46|29.447279|
|09:36:51|MS|2100|29.52|29.535034|
|09:36:59|MS|3200|30.02|30.017687|
|09:38:12|C|8800|51.29|51.278686|

context by is different from the *contextby* function in 3 aspects:

\(1\) *contextby* generates a vector while context by is used in a select clause to produce a table.

\(2\) *contextby* is limited to one grouping column whereas a context by clause can be used on multiple columns.

\(3\) *contextby* calculates one item for every call, whereas a context by clause can calculate multiple items.

## Performance Tip {#performance-tip}

Before we use context by, we should sort the database by the same variable or variables in the context by clause. This could greatly improve the speed of context by clause.

```
n=1000000
ID=rand(100, n)
x=rand(10.0, n)
ta=table(ID, x)
tb=select * from ta order by ID;
```

```
timer select (NULL \:P x)-1 as ret from ta context by ID;
// Time elapsed: 4.018 ms

timer select (NULL \:P x)-1 as ret from tb context by ID;
// Time elapsed: 2.991 ms
```

DolphinDB optimizes the performance of `context by` clause under certain conditions.

To query the latest records of given groups in a partitioned table, use `context by` clause with keywords `csort` and `limit`. The performance of "context by + csort + limit" statement is optimized if the following conditions are satisfied:

1.  The `context by` column is filtered by the `where` clause.

2.  The `csort` column is a partitioning column, and the partition type is VALUE or RANGE.

3.  `csort` and `context by` can only specify one column.

4.  The `context by` column is specified in the `select` clause.


You can obtain the execution plan by adding keyword [\[HINT\_EXPLAIN\]](hint/hint_explain.md) to check whether the statement is optimized.

