update

Update records in a table.

Details

The update statement can be used to update in-memory tables or DFS tables (including partitioned and dimension tables) where data isn't being written. The array vector columns in DFS tables can be updated with vectors or array vectors.

For a partitioned table, update will modify the entire partition where the target data is stored. For a dimension table, the entire table will be updated. Therefore, the update operation is only suitable for infrequent updates (e.g., updates occurring in minutes) rather than frequent updates (e.g. updates in milliseconds).

The update takes a multi-version approach and supports transactions. The system will create a new version to store the new data. Before committing the transaction, other SQL statements still can access the data in the old version. If data in multiple partitions needs to be updated, as long as the update on one partitions fails, the system will roll back all modifications on each partition.

Note:

The update statement cannot modify the data type of a column.
As of version 2.00.12, update with a context by clause supports csort and having keywords.
Since version 2.00.10, the partitioning columns of OLAP and TSDB tables (with keepDuplicated set to 'ALL') can be updated.
The update operation first loads the partitions where the data is stored, updates these partitions, and writes the partitions back to disk. Therefore, cross-partition updates are restricted in DolphinDB. Aggregate functions, order-sensitive functions, and user-defined functions cannot be used in the set clause for updates across partitions. However, these functions can be called when using update with context by specifying all partitioning columns, either as: (1) the original partitioning columns; (2) the VALUE or RANGE partitioning columns casted using time conversion functions, where the converted precision must match the partitioning scheme. See Example 2.

Syntax

update
    table_name
    set col1=X1, [col2=X2,…]
    [from table_joiner(table_names)]
    [where condition(s)]
    [context by col_name(s)] 
    [csort col_name(s) ASC|DESC]
    [having condition(s)]

Examples

Example 1. Update an in-memory table

sym = `C`MS`MS`MS`IBM`IBM`C`C`C$SYMBOL
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
t1 = table(timestamp, sym, qty, price);
t1;


timestamp	sym	qty	price
09:34:07	C	2200	49.6
09:36:42	MS	1900	29.46
09:36:51	MS	2100	29.52
09:36:59	MS	3200	30.02
09:32:47	IBM	6800	174.97
09:35:26	IBM	5400	175.23
09:34:16	C	1300	50.76
09:34:26	C	2500	50.32
09:38:12	C	8800	51.29

Add a column vol whose values are all NULL:

update t1 set vol=long();
t1;


timestamp	sym	qty	price
09:34:07	C	2200	49.6
09:36:42	MS	1900	29.46
09:36:51	MS	2100	29.52
09:36:59	MS	3200	30.02
09:32:47	IBM	6800	174.97
09:35:26	IBM	5400	175.23
09:34:16	C	1300	50.76
09:34:26	C	2500	50.32
09:38:12	C	8800	51.29

t1.drop!(`vol);

Update the table trades by adding 0.5 to the price column and subtracting 50 from the qty column for records with stock symbol C.

update t1 set price=price+0.5, qty=qty-50 where sym=`C;
t1;


timestamp	sym	qty	price
09:34:07	C	2150	50.1
09:36:42	MS	1900	29.46
09:36:51	MS	2100	29.52
09:36:59	MS	3200	30.02
09:32:47	IBM	6800	174.97
09:35:26	IBM	5400	175.23
09:34:16	C	1250	51.26
09:34:26	C	2450	50.82
09:38:12	C	8750	51.79

update t1 set price=price-0.5, qty=qty+50 where sym=`C;

context by can make group adjustments; whereas contextby can't. The example below first calculates the average price for each stock with context by, then deducts the average price from each record's original price.

update t1 set price=price-avg(price) context by sym;
t1;


timestamp	sym	qty	price
09:34:07	C	2150	-0.8925
09:36:42	MS	1900	-0.206667
09:36:51	MS	2100	-0.146667
09:36:59	MS	3200	0.353333
09:32:47	IBM	6800	-0.13
09:35:26	IBM	5400	0.13
09:34:16	C	1250	0.2675
09:34:26	C	2450	-0.1725
09:38:12	C	8750	0.7975

Updating a table with table joins:

item = table(1..10 as id, 10+rand(100,10) as qty, 1.0+rand(10.0,10) as price)
promotion = table(1..10 as id, rand(0b 1b, 10) as flag, 0.5+rand(0.4,10) as discount);
item;


id	qty	price
1	23	7.839664
2	44	7.635988
3	76	5.378054
4	91	8.078173
5	11	10.316152
6	58	9.510634
7	90	1.643082
8	68	5.787797
9	52	7.53352
10	62	6.222249

promotion;


id	flag	discount
1	0	0.650346
2	0	0.697081
3	0	0.774207
4	1	0.819562
5	0	0.710393
6	0	0.728223
7	1	0.602512
8	0	0.71226
9	1	0.606631
10	0	0.765697

update item set price = price*discount from ej(item, promotion, `id) where flag=1;
item;


id	qty	price
1	23	7.839664
2	44	7.635988
3	76	5.378054
4	91	6.620566
5	11	10.316152
6	58	9.510634
7	90	0.989976
8	68	5.787797
9	52	4.570069
10	62	6.222249

Example 2. Update a DFS table

login(`admin, `123456)
n=1000000
ID=rand(10, n)
x=rand(1.0, n)
t=table(ID, x)
db=database("dfs://rangedb123", RANGE,  0 5 10)
pt=db.createPartitionedTable(t, `pt, `ID)
pt.append!(t)

pt=loadTable("dfs://rangedb123", `pt)
select avg(x) from pt;
// output: 0.4999

update pt set x=x+1;

pt=loadTable("dfs://rangedb123", `pt)
select avg(x) from pt;
// output: 1.4999

// Order-sensitive functions are not allowed for cross-partition updates
update pt set x=prev(x) 

// update pt set x = prev(x) => Aggregate or order-sensitive functions are not allowed without context by clause when updating a partitioned table.

update pt set x=prev(x) context by ID
// The update operation can be executed when context by specifies all partitioning columns
select TOP 10* from pt order by ID

/*
ID	x
0	
0	1.4483
0	1.6277
0	1.7735
0	1.4349
*/

Example 3. Use csort and having in

context
                    by

clause with sample table pt

login("admin", "123456")
dbName = "dfs://time_comparison"
if(existsDatabase(dbName))
	dropDatabase(dbName)
	
db = database(dbName, VALUE, [2019.12M,2020.01M,2020.02M,2020.03M,2021.01M,2021.02M,2021.03M,2021.12M,2022.01M],engine="TSDB")
n = 100
t = table(n:n,[`time,`id,`value],[MONTH,INT,DOUBLE])
t[`time] = take([2019.12M,2020.01M,2020.02M,2020.03M,2021.01M,2021.02M,2021.03M,2021.12M,2022.01M],n)
t[`id] = take(1..10,n)
t[`value] = rand(100.0,n)
pt = db.createPartitionedTable(t, `pt, `time, sortColumns=`time).append!(t)
pnodeRun(flushIotCache)

update pt set value = 1
select * from pt where time = 2019.12M order by time, id

table pt:


time	id	value
2019.12M	1	1
2019.12M	1	1
2019.12M	2	1
2019.12M	3	1
2019.12M	4	1
2019.12M	5	1
2019.12M	6	1
2019.12M	7	1
2019.12M	8	1
2019.12M	9	1
2019.12M	10	1
2019.12M	10	1

3.1 update pt with csort and having:

update pt set value = 1
update pt set value = cumsum(value) context by time csort id having sum(id) > 60;
select * from pt where time = 2019.12M order by time, id


time	id	value
2019.12M	1	1
2019.12M	1	2
2019.12M	2	3
2019.12M	3	4
2019.12M	4	5
2019.12M	5	6
2019.12M	6	7
2019.12M	7	8
2019.12M	8	9
2019.12M	9	10
2019.12M	10	11
2019.12M	10	12

3.2 Use having with aggregate and non-aggregate functions:

update pt set value = 1
update pt set value = cumsum(value) context by time csort id having denseRank(id) > 3 and sum(id) > 10000
select * from pt where time = 2019.12M order by time, id


time	id	value
2019.12M	1	1
2019.12M	1	1
2019.12M	2	1
2019.12M	3	1
2019.12M	4	1
2019.12M	5	1
2019.12M	6	1
2019.12M	7	1
2019.12M	8	1
2019.12M	9	1
2019.12M	10	1
2019.12M	10	1

Example 4. Use the following script to create a DFS table with an array vector column:

dbName = "dfs://updateArrayVector"
db = database(dbName,VALUE,2024.01.01..2024.01.03, engine='TSDB')
a = arrayVector(2 4 6 8 10 12, [1, 1, 1, 2, 1, 3, 1, 1, 1, 2, 1, 3])
date = take(2024.01.03 2024.01.02 2024.01.01, 6)
sym = take(`a`b`c, 6)
t = table(a as a, sym as sym, date as date)
pt=createPartitionedTable(db,t,`pt,partitionColumns=`date, sortColumns=`date)
pt.append!(t)
select * from pt


a	sym	date
[1, 3]	c	2024.01.01
[1, 3]	c	2024.01.01
[1, 2]	b	2024.01.02
[1, 2]	b	2024.01.02
[1, 1]	a	2024.01.03
[1, 1]	a	2024.01.03

Then update column 'a':

update pt set a = [2,2,3] where date = 2024.01.01
select * from pt where date = 2024.01.01


a	sym	date
[2, 2, 3]	c	2024.01.01
[2, 2, 3]	c	2024.01.01

Example 5. Update array vector columns in DFS tables with array vectors.

Update column a of table 'pt' created in the above example. The length of the array vector must match the number of rows in the table.

update pt set a=array(INT[], 0).append!([1 2 3, 4 5]) where date = 2024.01.02
select * from pt where date = 2024.01.02


a	sym	date
[1, 2, 3]	b	2024.01.02
[4, 5]	b	2024.01.02