create
The create statement is used to create a database or a table. The syntax is as follows:
create DFS databases
The create statement only supports creating DFS databases.
create database directory partitioned by partitionType(partitionScheme),
[engine='OLAP'], [atomic='TRANS'], [chunkGranularity='TABLE']
Please refer to the related function database for details. The number of partitionType indicates the partition levels. You can specify one to three partitionType for a database and use more than one partitionType to create a composite database.
create tables
This statement only supports creating regular in-memory tables and DFS tables.
create table dbPath.tableName (
schema[columnDescription]
)
[partitioned by partitionColumns],
[sortColumns|primaryKey],
[keepDuplicates=ALL],
[sortKeyMappingFunction]
[softDelete=false]
[comment]
dbPath is a string indicating the path of a dfs database.
tableName is a string indicating the table name, or a vector indicating the table object.
schema indicates the table schema, including two columns: columnName and columnType.
columnDescription uses keywords to add a description to a column, including:
- comment adds a comment to a column;
- compress specifies the compression method, which includes: "lz4", "delta", "zstd", "chimp".
- Since version 3.00.1, indexes can be set for tables of TSDB (keepDuplicates=ALL) or PKEY databases. For detailed instructions, refer to the createPartitionedTable function.
partitionColumns specifies the partitioning column(s). For a composite
partition, this parameter is a STRING vector. It can be a
column name, or a function call applied to a column (e.g.
partitionFunc(id)
). If a function call is specified, it
must have exactly one column argument, and the other arguments must be constant
scalars. In this case, the system will partition the table based on the result
of the function call.
Parameters for TSDB storage engine only:
sortColumns is a required argument for TSDB engine. It is a string scalar/vector indicating the columns based on which the table is sorted.
- ALL: keep all records
- LAST: only keep the last record
- FIRST: only keep the first record
sortKeyMappingFunction is a vector of unary functions. It has the same length as the number of sort keys. The specified mapping functions are applied to each sort key (i.e., the sort columns except for the temporal column) for dimensionality reduction.After the dimensionality reduction for the sort keys, records with a new sort key entry will be sorted based on the last column of sortColumns (the temporal column).
Note:
- It cannot be specified when creating a dimension table.
- Dimensionality reduction is performed when writing to disk, so specifying this parameter may affect write performance.
- The functions specified in
sortKeyMappingFunction
correspond to each and every sort key. If a sort key does not require dimensionality reduction, leave the corresponding element empty in the vector. - If a mapping function is
hashBucket
AND the sort key to which it applies is a HASH partitioning column, make sure the number of Hash partitions is not divisible byhashBucket().buckets
(or vice versa), otherwise the column values from the same HASH partition would be mapped to the same hash bucket after dimensionality reduction.
softDelete determines whether to enable soft delete for TSDB databases. The default value is false. To use it, keepDuplicates must be set to 'LAST'. It is recommended to enable soft delete for databases where the row count is large and delete operations are infrequent.
comment is a STRING scalar for setting a comment for a DFS table.
Parameters for PKEY storage engine only:
primaryKey is a STRING scalar/vector that specifies the primary key column(s), uniquely identifying each record in a DFS table of the PKEY database. For records with the same primary key, only the latest one is retained. Note that:
- primaryKey must include all partitioning columns.
- The primary key columns must be of Logical, Integral (excluding COMPRESSED), Temporal, STRING, SYMBOL, or DECIMAL type.
- With more than one primary key column, a composite primary key is maintained. The composite primary key uses a Bloomfilter index by default (see the indexes parameter for details).
Please refer to the related function createPartitionedTable / createDimensionTable for details.
Create temporary in-memory tables
To create a temporary in-memory table, add keywords local temporary
(case insensitive) to create
:
create local temporary table tableName(
schema
) [on commit preserve rows]
where,
tableName is a string indicating the table name, or a variable of the table object
schema is the table schema which contains 2 columns: columnName and columnType.
on commit preserve rows (optional) specifies that the temporary table is session-specific. It is case-insensitive.
Note:
- In DolphinDB, the
create local temporary table
statement is equivalent tocreate table
as it creates a local temporary in-memory table that is only valid in the current session. - Currently, global temporary tables and the keyword
on commit delete rows
are not supported.
Examples
Creating an In-memory Table
create table tb(
id SYMBOL,
val DOUBLE
)
go; //Parse and run codes with the go statement first, otherwise an error of unrecognized variable tb will be reported.
tb.schema()
/* output
partitionColumnIndex->-1
chunkPath->
colDefs->
name typeString typeInt comment
---- ---------- ------- -------
id SYMBOL 17
val DOUBLE 16
*/
Creating an OLAP Database
if(existsDatabase("dfs://test")) dropDatabase("dfs://test")
create database "dfs://test" partitioned by VALUE(1..10), HASH([SYMBOL, 40]), engine='OLAP'
create table "dfs://test"."pt"(
id INT,
deviceId SYMBOL,
date DATE[comment="time_col", compress="delta"],
value DOUBLE,
isFin BOOL
)
partitioned by ID, deviceID,
pt = loadTable("dfs://test","pt")
pt.schema()
/* output
partitionSchema->([1,2,3,4,5,6,7,8,9,10],40)
partitionSites->
partitionColumnType->[4,17]
partitionTypeName->[VALUE,HASH]
chunkGranularity->TABLE
chunkPath->
partitionColumnIndex->[0,1]
colDefs->
name typeString typeInt comment
-------- ---------- ------- --------
id INT 4
deviceId SYMBOL 17
date DATE 6 time_col
value DOUBLE 16
isFin BOOL 1
partitionType->[1,5]
partitionColumnName->[id,deviceId]
*/
Creating a dimension table
create table "dfs://test"."pt1"(
id INT,
deviceId SYMBOL,
date DATE[comment="time_col", compress="delta"],
value DOUBLE,
isFin BOOL
)
pt1 = loadTable("dfs://test","pt1")
pt1.schema()
/* output
chunkPath->
partitionColumnIndex->-1
colDefs->
name typeString typeInt comment
-------- ---------- ------- --------
id INT 4
deviceId SYMBOL 17
date DATE 6 time_col
value DOUBLE 16
isFin BOOL 1
*/
Creating a TSDB Database
if(existsDatabase("dfs://test")) dropDatabase("dfs://test")
create database "dfs://test" partitioned by VALUE(1..10), HASH([SYMBOL, 40]), engine='TSDB'
Creating a partitioned table
create table "dfs://test"."pt"(
id INT,
deviceId SYMBOL,
date DATE[comment="time_col", compress="delta"],
value DOUBLE,
isFin BOOL
)
partitioned by ID, deviceID,
sortColumns=[`deviceId, `date],
keepDuplicates=ALL
pt = loadTable("dfs://test","pt")
pt.schema()
/* output
engineType->TSDB
keepDuplicates->ALL
partitionColumnIndex->[0,1]
colDefs->
name typeString typeInt extra comment
-------- ---------- ------- ----- --------
id INT 4
deviceId SYMBOL 17
date DATE 6 time_col
value DOUBLE 16
isFin BOOL 1
partitionType->[1,5]
partitionColumnName->[id,deviceId]
partitionSchema->([1,2,3,4,5,6,7,8,9,10],40)
partitionSites->
partitionColumnType->[4,17]
partitionTypeName->[VALUE,HASH]
sortColumns->[deviceId,date]
softDelete->false
tableOwner->admin
chunkGranularity->TABLE
chunkPath->
*/
Creating a dimension table
create table "dfs://test"."pt1"(
id INT,
deviceId SYMBOL,
date DATE[comment="time_col", compress="delta"],
value DOUBLE,
isFin BOOL
)
sortColumns=[`deviceId, `date]
pt1 = loadTable("dfs://test","pt1")
pt1.schema()
/* output
sortColumns->[deviceId,date]
softDelete->false
tableOwner->admin
engineType->TSDB
keepDuplicates->ALL
chunkGranularity->TABLE
chunkPath->
partitionColumnIndex->-1
colDefs->
name typeString typeInt extra comment
-------- ---------- ------- ----- --------
id INT 4
deviceId SYMBOL 17
date DATE 6 time_col
value DOUBLE 16
isFin BOOL 1
*/
Creating a PKEY Database
if(existsDatabase("dfs://test")) dropDatabase("dfs://test")
create database "dfs://test" partitioned by VALUE(1..10), engine="PKEY"
Creating a partitioned table
create table "dfs://test"."pt"(
id INT,
deviceId SYMBOL [indexes="bloomfilter"],
date DATE [comment="time_col", compress="delta"],
value DOUBLE,
isFin BOOL
)
partitioned by ID,
primaryKey=`ID`deviceID
Creating a dimension table
create table "dfs://test"."dt"(
id INT,
deviceId SYMBOL [indexes="bloomfilter"],
date DATE [comment="time_col", compress="delta"],
value DOUBLE,
isFin BOOL
)
partitioned by ID,
primaryKey=`ID`deviceID
Creating a Temporary In-memory Table
create local temporary table "tb" (
id SYMBOL,
val DOUBLE
) on commit preserve rows
tb.schema()
partitionColumnIndex->-1
chunkPath->
colDefs->
name typeString typeInt extra comment
---- ---------- ------- ----- -------
id SYMBOL 17
val DOUBLE 16
Creating a Partitioned Table with User-Defined Rules
For data with a column in the format id_date_id
(e.g.,
ax1ve_20240101_e37f6), partition by date using a user-defined function:
// Define a function to extract the date information
def myPartitionFunc(str,a,b) {
return temporalParse(substr(str, a, b),"yyyyMMdd")
}
// Create a database
data = ["ax1ve_20240101_e37f6", "91f86_20240103_b781d", "475b4_20240101_6d9b2", "239xj_20240102_x983n","2940x_20240102_d9237"]
tb = table(data as id_date, 1..5 as value, `a`b`c`d`e as sym)
dbName = "dfs://testdb"
if(existsDatabase(dbName)){
dropDatabase(dbName)
}
create database "dfs://testdb" partitioned by VALUE(2024.02.01..2024.02.02), engine='TSDB'
create table "dfs://testdb"."pt"(
date STRING,
value INT,
sym SYMBOL
)
partitioned by myPartitionFunc(date, 6, 8)
sortColumns="sym"
// Use myPartitionFunc to process the data column
pt = loadTable(dbName,"pt")
pt.append!(tb)
flushTSDBCache()
select * from pt
The queried data are read and returned by partition. The query result shows that table pt is partitioned by the date information extracted from the id_date column.
id_date |
value |
sym |
---|---|---|
ax1ve_20240101_e37f6 | 1 | a |
475b4_20240101_6d9b2 | 3 | c |
239xj_20240102_x983n | 4 | d |
2940x_20240102_d9237 | 5 | e |
91f86_20240103_b781d | 2 | b |