Order of Execution
The order of execution of a SQL query in DolphinDB is similar to most other systems.
Please pay special attention to the order of execution of csort keyword
and limit / top clause when context
by clause is used.
The order of execution of a SQL query in DolphinDB:
(1) from clause
The FROM clause, and subsequent table joins are first executed to determine the total working set of data that is being queried. This includes subqueries in this clause.
(2) on clause
The on clause is used to specify the join conditions, i.e., define how
the records should be matched. When using outer joins, the unmatched records are
populated with null values in the temporary table generated by the on
clause.
(3) where conditions
Rows that do not satisfy the where conditions are discarded.
-
The
whereconditions can only access columns directly from the tables specified in thefromclause. Columns in theselectclause that are not specified in thefromclause cannot be referenced, such as columns renamed withasor computed columns. -
In the
whereclause, if the conditions are separated by a comma (","), the data is filtered by the first condition, then the next condition is applied to the filtered data, and so on; if the conditions are separated by "and", the data is filtered separately on each condition and the intersection of the filtering results is returned.
(4) group by / context by / pivot
by clause
The remaining rows after the where conditions are applied are then
grouped based on common values in the column specified in the group by / context by /
pivot by clause
(5) csort keyword (only used in conjunction with the context
by clause)
The context by clause is often used together with time-series functions
such as cumsum, mavg, and deltas, etc. The results of these functions are affected by
the order of rows within each context by group. The csort keyword (that
is used only with the context by clause) sorts the rows within each
context by group.
(6) having conditions (only used in conjunction with the
group by / context by clause)
If the query has a group by / context by clause, then
the having conditions are applied to the grouped rows, discard the grouped rows that
don't satisfy the conditions. Like the where conditions, aliases are
also not accessible from this step.
(7) select clause
Any expressions in the select clause are finally computed.
(8) distinct clause
The distinct clause eliminates all the duplicate records and return
distinct values.
(9) limit / top clause (if the context by
clause is used)
If the query uses the context by clause, the limit /
top clause applies to each group. If there are n
context by groups and the query uses "limit m" or "top
m", then return at most n*m rows.
The limit / top clause in this scenario is executed
before the order by clause; in all other scenarios the
limit / top clause is executed after the
order by clause.
(10) order by clause
If an order is specified by the order by clause, the rows are then
sorted by the specified data in either ascending or descending order. Since all the
expressions in the select clause of the query have been computed, you
can reference aliases in this clause.
(11) limit / top clause (if there is no
context by clause)
Finally, if the context by clause is not used, the rows that fall
outside the range specified by the limit / top clause
are discarded, leaving the final set of rows to be returned from the query.
Please note that if the SQL statement uses the cgroup by clause, the
execution sequence is as follows: first apply the filtering conditions (if any), and
then conduct the calculations in the select clause within the groups
determined by the cgroup by and group by (if any)
columns, then sort the grouping calculation results according to the required order by
columns (must belong to the cgroup by and group by (if
any) columns), and finally calculate the cumulative value of the grouping calculation
results. If group by is used, the cumulative value is calculated within
each group by group.
Examples
t = table(1 1 1 1 1 2 2 2 2 2 as id, 09:30:00+1 3 2 5 4 5 2 4 3 1 as time, 1 2 3 4 5 6 5 4 3 2 as x);
t;
| id | time | x |
|---|---|---|
| 1 | 09:30:01 | 1 |
| 1 | 09:30:03 | 2 |
| 1 | 09:30:02 | 3 |
| 1 | 09:30:05 | 4 |
| 1 | 09:30:04 | 5 |
| 2 | 09:30:05 | 6 |
| 2 | 09:30:02 | 5 |
| 2 | 09:30:04 | 4 |
| 2 | 09:30:03 | 3 |
| 2 | 09:30:01 | 2 |
select *, deltas(x) from t context by id;
| id | time | x | deltas_x |
|---|---|---|---|
| 1 | 09:30:01 | 1 | |
| 1 | 09:30:03 | 2 | 1 |
| 1 | 09:30:02 | 3 | 1 |
| 1 | 09:30:05 | 4 | 1 |
| 1 | 09:30:04 | 5 | 1 |
| 2 | 09:30:05 | 6 | |
| 2 | 09:30:02 | 5 | -1 |
| 2 | 09:30:04 | 4 | -1 |
| 2 | 09:30:03 | 3 | -1 |
| 2 | 09:30:01 | 2 | -1 |
select *, deltas(x) from t context by id csort time;
| id | time | x | deltas_x |
|---|---|---|---|
| 1 | 09:30:01 | 1 | |
| 1 | 09:30:02 | 3 | 2 |
| 1 | 09:30:03 | 2 | -1 |
| 1 | 09:30:04 | 5 | 3 |
| 1 | 09:30:05 | 4 | -1 |
| 2 | 09:30:01 | 2 | |
| 2 | 09:30:02 | 5 | 3 |
| 2 | 09:30:03 | 3 | -2 |
| 2 | 09:30:04 | 4 | 1 |
| 2 | 09:30:05 | 6 | 2 |
The example above shows that with "csort time", the calculation of deltas(x) is conducted after sorting the rows along column "time" within each group.
select *, deltas(x) from t context by id csort time limit 3;
| id | time | x | deltas_x |
|---|---|---|---|
| 1 | 09:30:01 | 1 | |
| 1 | 09:30:02 | 3 | 2 |
| 1 | 09:30:03 | 2 | -1 |
| 2 | 09:30:01 | 2 | |
| 2 | 09:30:02 | 5 | 3 |
| 2 | 09:30:03 | 3 | -2 |
The example above shows that if the limit clause is used together with the
context by clause, the limit clause is applied
to each group instead of the entire final result.
select *, deltas(x) from t context by id csort time order by id, deltas_x desc;
| id | time | x | deltas_x |
|---|---|---|---|
| 1 | 09:30:04 | 5 | 3 |
| 1 | 09:30:02 | 3 | 2 |
| 1 | 09:30:03 | 2 | -1 |
| 1 | 09:30:05 | 4 | -1 |
| 1 | 09:30:01 | 1 | |
| 2 | 09:30:02 | 5 | 3 |
| 2 | 09:30:05 | 6 | 2 |
| 2 | 09:30:04 | 4 | 1 |
| 2 | 09:30:03 | 3 | -2 |
| 2 | 09:30:01 | 2 |
The example above shows that as the order by clause is executed
after the select clause, the order by clause can use the
calculation result "deltas_x" in the select clause.
select *, deltas(x) from t context by id csort time order by id, deltas_x desc limit 3;
| id | time | x | deltas_x |
|---|---|---|---|
| 1 | 09:30:02 | 3 | 2 |
| 1 | 09:30:03 | 2 | -1 |
| 1 | 09:30:01 | 1 | |
| 2 | 09:30:02 | 5 | 3 |
| 2 | 09:30:03 | 3 | -2 |
| 2 | 09:30:01 | 2 |
The example above shows that if the query has a context by clause,
the limit clause restricts the number of rows within each group and
is executed before the order by clause.
select * from t order by id, x desc limit 3;
| id | time | x |
|---|---|---|
| 1 | 09:30:04 | 5 |
| 1 | 09:30:05 | 4 |
| 1 | 09:30:02 | 3 |
The example above shows that if the query does not contain a context
by clause, the limit clause restricts the number of
rows in the final result and is executed after the order by
clause.
select *, deltas(x) from t where x>=3 context by id;
| id | time | x | deltas_x |
|---|---|---|---|
| 1 | 09:30:02 | 3 | |
| 1 | 09:30:05 | 4 | 1 |
| 1 | 09:30:04 | 5 | 1 |
| 2 | 09:30:05 | 6 | |
| 2 | 09:30:02 | 5 | -1 |
| 2 | 09:30:04 | 4 | -1 |
| 2 | 09:30:03 | 3 | -1 |
The example above shows that the where conditions are executed
before the select clause. The calculation of deltas(x) is conducted
after executing "where x>=3". Therefore, column "delta_x" is empty for the first
row of each group.
select *, deltas(x) from t where x>=3 context by id having sum(x)<=12;
| id | time | x | deltas_x |
|---|---|---|---|
| 1 | 09:30:02 | 3 | |
| 1 | 09:30:05 | 4 | 1 |
| 1 | 09:30:04 | 5 | 1 |
The example above shows that the having conditions are executed
after the where conditions.
