SQL Metaprogramming
SQL metaprogramming allows for the dynamic creation and manipulation of SQL statements, enabling queries to be defined, modified, or executed during runtime. DolphinDB includes two ways to dynamically generate SQL: with built-in functions or with macro variables.
Generating with Built-in Functions
In DolphinDB, SQL statements, particularly SELECT queries, can be broken down into several key components: select query body; table object; grouping fields; sorting fields; filtering conditions; the number of records to return.
To facilitate the dynamic generation of various SQL statements, DolphinDB provides
specialized functions: sql
(for SELECT statements),
sqlUpdate
(for UPDATE statements), sqlDelete
(for DELETE statements). Each parameter of these functions corresponds to a specific
clause in a SQL statement. Alternatively, you can use a combination of
parseExpr
and strings to generate metacode of SQL
statements.
DolphinDB provides a suite of functions to facilitate dynamic SQL generation.
sqlCol
: Generates metacode for selecting one or multiple columns with or without calculation. Aliases are supported. For example:// for a single column sqlCol("col") // output: <col> // for multiple columns sqlCol(["col0", "col1", ..., "colN"]) // output: [<col0>, <col1>, ..., <colN>] // specify the applied function sqlCol("col", func = sum, alias = "newCol") // output: <sum(col) as newCol> // specify aliases sqlCol(["col0", "col1"], func=sum, alias=["newCol0", "newCol1"]) // output: [<sum(col0) as newCol0>, <sum(col1) as newCol1>]
sqlColAlias
: Uses metacode and an optional alias name to define a column. It can be used withmakeCall
/makeUnifiedCall
to set alias for dynamic functions, i.e.,<func(cols.., args...)>
, or used withexpr
,unifiedExpr
, andbinaryExpr
for multivariant expressions, such as<a+b+c>
,<a1*b1+a2*b2+... +an*bn>
.sqlColAlias(sqlCol("col"), "newCol") // output: <col as newCol> sqlColAlias(makeCall(sum, sqlCol("col")), "newCol") // output: <sum(col) as newCol> sqlColAlias(makeCall(corr, sqlCol("col0"), sqlCol("col1")), "newCol") // output: <corr(col0, col1) as newCol>
parseExpr
: Converts strings into metacode. For example:parseExpr("select * from t") // output: <select * from t> // generate metacode for the where condition used in the sql function parseExpr("time between 09:00:00 and 11:00:00") // output: < time between pair(09:00:00, 11:00:00) >
sql
function to
generate metacode of SQL statement, with a comprehensive breakdown of defining each
component.// the sample table
securityID = take(`GE,3) join take(`MSFT,3) join take(`F,3)
time=09:00:00 09:30:00 10:00:00 10:28:00 11:00:00 12:00:00 13:00:00 14:00:00 14:29:00
price=31.82 31.69 31.92 63.12 62.58 63.12 12.46 12.59 13.24
volume=2300 3500 3700 1800 3800 6400 4200 5600 8900
t = table(securityID, time, price, volume);
// target SQL statement
select cumsum(price) as cumPrice from t
where time between 09:00:00 and 11:00:00
context by securityID csort time limit -1
selectCode = sqlColAlias(makeUnifiedCall(cumsum, sqlCol("price")), "cumPrice")
fromCode = "t"
whereCondition = parseExpr("time between 09:00:00 and 11:00:00")
contextByCol = sqlCol("securityID")
csortCol = sqlCol("time")
limitCount = -1
sql(select = selectCode, from = fromCode,
where = whereCondition, groupby = contextByCol,
groupFlag = 0, csort = csortCol, limit = limitCount)
< select cumsum(price) as cumPrice from objByName("t")
where time between pair(09:00:00, 11:00:00)
context by securityID csort time asc limit -1 >
sql
function:- The query field must be declared with
sqlCol
orsqlColAlias
. - Metacode for column calculations:
- Single-column: the
sqlCol
function specifies the func parameter. - Multi-column:
sqlColAlias
function withmakeCall
ormakeUnifiedCall
function.
- Single-column: the
- A table object can be a table variable name string, a table variable, or a
handle returned by
loadTable
.
sqlUpdate
and sqlDelete
functions can be employed to produce metacode for UPDATE and DELETE statements
respectively.// generate metacode of UPDATE statement
sqlUpdate(t, <price*2 as updatedPrice>)
// generate metacode of DELETE statement
sqlDelete(t, <time = 10:00:00>)
Generating with <select statement>
Using Macro Variables
Since verison 2.00.12/3.00.00, the use of macro variables is introduced for
metaprogramming, allowing to generate metacode of SQL statements in a more intuitive
manner through <select statement>
.
- For a scalar, use "_$" (for single-column macro variables). For example, a
variable is defined as
name="sym"
. The metacode can be<SELECT _$name FROM t>
, which will be parsed asSELECT sym FROM t
during execution. - For a vector, use "_$$" (for multi-column macro variables). For example, a
variable is defined as
names=["sym", "time"]
. The metacode can be<SELECT _$$name FROM t>
, which will be parsed asSELECT sym, time FROM t
during execution.
- The variable name must be a string and variables must conform to the Naming Rules.
- It can only be applied to columns, column aliases, parameters of functions, expressions, etc. Macro variables applied to functions and expressions are treated as a tuple, where each element corresponds to one column.
- Currently, only SELECT statement is supported. Macro variables cannot applied to UPDATE and DELETE statements.
- It can only be used with clauses WHERE, GROUP BY, PIVOT BY, CONTEXT BY, CSORT, and ORDER BY. Clauses such as CASE WHEN, OVER, and nested queries followed by FROM are not supported yet.
- When used with CSORT and ORDER BY clauses, only single-column macro variable (with "_$") is supported.
col = "price"
contextByCol = "securityID"
csortCol = "time"
a = 09:00:00
b = 11:00:00
<select cumsum(_$col) from t where _$csortCol between a and b
context by _$contextByCol csort _$csortCol limit -1>
x1 = 1 3 5 7 11 16 23
x2 = 2 8 11 34 56 54 100
x3 = 8 12 81 223 501 699 521
y = 0.1 4.2 5.6 8.8 22.1 35.6 77.2;
t = table(y, x1, x2, x3)
- Using the
sql
function.name = [`y,`x1] alias = [`y1, `x11] sql(select = sqlCol(name, sum, alias), from = t).eval()
-
Using macro variables.
name = [`y,`x1] alias = [`y1, `x11] <select sum:V(_$$name) as _$$alias from t>.eval()
Examples
Examples for single-column macro variables.
(1) Used to select a single column.
t = table(take(2023.01M+1 2 1 1 2,5) as month, take(`Rome`London`London`Rome,5) as city, take(200 500 100 300 300, 5) as sold1, take(2100 1500 1100 3100 2300,5) as sold2)
selectCity="city"
<select _$selectCity from t>.eval()
Output:
city |
---|
Rome |
London |
London |
Rome |
Rome |
(2) Used as the argument of a unary function to operate on a single column.
selectCity="city"
selectSold1="sold1"
alias="avg_sold1"
<select avg(_$selectSold1) as _$alias from t group by _$selectCity>.eval()
Output:
city | avg_sold1 |
---|---|
Rome | 266.6667 |
London | 300 |
Examples for multi-column macro variables.
(1) Used to select multiple columns.
colNames=["city", "sold1"]
<select _$$colNames from t>.eval()
Output:
city | sold1 |
---|---|
Rome | 200 |
London | 500 |
London | 100 |
Rome | 300 |
Rome | 300 |
(2) Each of the corresponding columns is passed as an argument to the function.
colNames=["sold1", "sold2"]
alias = "sum_sold"
<select rowSum(_$$colNames) as _$alias from t>.eval()
Output:
sum_sold |
---|
2,300 |
2,000 |
1,200 |
3,400 |
2,600 |
(3) Used with higher-order functions.
-
Apply a function to multiple columns by row. The following example calculates the sum for each row of column "sold1" and "sold2". It returns the same result as the example (2) above.
colNames=["sold1", "sold2"] alias = "sum_sold" <select sum:H(_$$colNames) as _$alias from t>.eval()
-
Apply a function to multiple columns by column. The following example calculates the sum for columns "sold1" and "sold2".
colNames=["sold1", "sold2"] alias=["sum_sold1", "sum_sold2"] <select sum:V(_$$colNames) as _$$alias from t>.eval()
Output:sum_sold1 sum_sold2 1,400 10,100