# create {#create}

The create statement is used to create a database or a table. The syntax is as follows:

## create DFS databases {#create-dfs-databases}

The create statement only supports creating DFS databases.

```
create database directory partitioned by partitionType(partitionScheme),
[engine='OLAP'], [atomic='TRANS'], [chunkGranularity='TABLE']
```

Please refer to the related function [database](../../Functions/d/database.md) for details. The number of *partitionType* indicates the partition levels. You can specify one to three *partitionType* for a database and use more than one *partitionType* to create a composite database.

## create tables {#create-tables}

This statement only supports creating regular in-memory tables and DFS tables.

``` {#codeblock_fyk_nss_xbc}
create table dbPath.tableName (
    schema[columnDescription]
)
[partitioned by partitionColumns],
\[sortColumns\|primaryKey\],
\[keepDuplicates=ALL\],
\[sortKeyMappingFunction\]
[softDelete=false]
\[comment\],
[encryptMode='plaintext']
```

**dbPath** is a string indicating the path of a dfs database.

**tableName** is a string indicating the table name, or a vector indicating the table object.

**schema** indicates the table schema, including two columns: *columnName* and *columnType*.

**columnDescription** uses keywords to add a description to a column, including:

-   comment adds a comment to a column;
-   compress specifies the compression method, which includes: "lz4", "delta", "zstd", "chimp".

-   Since version 3.00.1, indexes can be set for tables of TSDB \(*keepDuplicates*=ALL\) or PKEY databases. For detailed instructions, refer to the [createPartitionedTable](../../Functions/c/createPartitionedTable.md) function.

**partitionColumns** specifies the partitioning column\(s\). For a composite partition, this parameter is a STRING vector.It can be a column name, or a function call applied to a column \(e.g. `partitionFunc(id)`\). If a function call is specified, it must have exactly one column argument, and the other arguments must be constant scalars. In this case, the system will partition the table based on the result of the function call.

Parameters for TSDB storage engine only:

**sortColumns** is a required argument for TSDB engine. It is a string scalar/vector indicating the columns based on which the table is sorted.

**keepDuplicates** specifies how to deal with records with duplicate sortColumns values. The default value is ALL. It can have the following values:

-   ALL: keep all records
-   LAST: only keep the last record
-   FIRST: only keep the first record

**sortKeyMappingFunction** is a vector of unary functions. It has the same length as the number of sort keys. The specified mapping functions are applied to each sort key \(i.e., the sort columns except for the last column\) for dimensionality reduction.After the dimensionality reduction for the sort keys, records with a new sort key entry will be sorted based on the last column of *sortColumns* \(the last column\).

**Note**:

-   It cannot be specified when creating a dimension table.
-   Dimensionality reduction is performed when writing to disk, so specifying this parameter may affect write performance.
-   The functions specified in `sortKeyMappingFunction` correspond to each and every sort key. If a sort key does not require dimensionality reduction, leave the corresponding element empty in the vector.
-   If a mapping function is `hashBucket` AND the sort key to which it applies is a HASH partitioning column, make sure the number of Hash partitions is not divisible by `hashBucket().buckets` \(or vice versa\), otherwise the column values from the same HASH partition would be mapped to the same hash bucket after dimensionality reduction.

**softDelete** determines whether to enable soft delete for TSDB databases. The default value is false. To use it, *keepDuplicates* must be set to 'LAST'. It is recommended to enable soft delete for databases where the row count is large and delete operations are infrequent.

**comment** is a STRING scalar for setting a comment for a DFS table.

Parameters for PKEY storage engine only:

**primaryKey** is a STRING scalar/vector that specifies the primary key column\(s\), uniquely identifying each record in a DFS table of the PKEY database. For records with the same primary key, only the latest one is retained. Note that:

-   *primaryKey* must include all partitioning columns.
-   The primary key columns must be of Logical, Integral \(excluding COMPRESSED\), Temporal, STRING, SYMBOL, or DECIMAL type.
-   With more than one primary key column, a composite primary key is maintained. The composite primary key uses a Bloomfilter index by default \(see the *indexes* parameter for details\).

**encryptMode** is a STRING scalar specifying the encryption mode for IMOLTP tables. The default is no encryption \(plaintext mode\). Supported values \(case-insensitive\) include: plaintext, aes\_128\_ctr, aes\_128\_cbc, aes\_128\_ecb, aes\_192\_ctr, aes\_192\_cbc, aes\_192\_ecb, aes\_256\_ctr, aes\_256\_cbc, aes\_256\_ecb, sm4\_128\_cbc, sm4\_128\_ecb.

Please refer to the related function [createPartitionedTable](../../Functions/c/createPartitionedTable.md) / [createDimensionTable](../../Functions/c/createDimensionTable.md) for details.

## Create temporary in-memory tables {#create-temporary-in-memory-tables}

To create a temporary in-memory table, add keywords `local temporary` \(case insensitive\) to `create`:

```
create local temporary table  tableName(
schema
) [on commit preserve rows]
```

where,

**tableName** is a string indicating the table name, or a variable of the table object

**schema** is the table schema which contains 2 columns: columnName and columnType.

**on commit preserve rows** \(optional\) specifies that the temporary table is session-specific. It is case-insensitive.

Note:

-   In DolphinDB, the `create local temporary table` statement is equivalent to `create table` as it creates a local temporary in-memory table that is only valid in the current session.
-   Currently, global temporary tables and the keyword `on commit delete rows` are not supported.

## Create a Measurement Point Table {#topic_pxc_qgb_3gc}

Refer to the IOTDB Engine for the concept of measurement point table. To create a measurement point table, users can create a table with an IOTANY column or enable the latest value cache by specifying *latestKeyCache*=true when using the CREATE statement. The storage engine must be IOTDB during database creation.

Parameters:

**latestKeyCache** \(optional\) is a Boolean value indicating whether to enable the latest value cache. The default value is false. Only partitioned tables are supported.

**compressHashSortKey**\(optional\) is a Boolean value indicating whether to enable compression for sort columns. The default value is true when the storage engine is IOTDB. Only partitioned tables are supported.

**Note**：

-   The storage engine must be IOTDB when creating a measurement point table.
-   The IOTDB database can only contain measurement point tables.
-   Users can create at most one IOTANY column in a table using the CREATE statement and cannot separately create an IOTANY vector.
-   It is required to specify multiple columns that uniquely identify a group of measurement points \(typically an ID column + multiple tag columns\) and a time column as ​​the sort columns​​. If *latestKeyCache*= true, the sort columns must have at least two columns.
-   Users can enable the latest value cache without creating an IOTANY column. However, if the table contains an IOTANY column, the latest value cache must be enabled.
-   The database must use a composite partitioning scheme and there must be a time partition as the last dimension when creating a measurement point table.
-   When a measurement point table has more than two sort columns and the compression of sort columns is enabled, the compression for sort key columns can be enabled. It can save disk usage but affect performance when querying tables with a particularly large number of sort columns. The *compressHashSortKey* defaults to true if the storage engine is IOTDB.

## Examples {#examples}

### Creating an In-memory Table {#topic_vzp_kjn_1cc}

``` {#codeblock_jzf_njn_1cc}
create table tb(
            id SYMBOL,
            val DOUBLE
    )
     go;   //Parse and run codes with the go statement first, otherwise an error of unrecognized variable tb will be reported.
     tb.schema()

/* output
    partitionColumnIndex->-1
    chunkPath->
    colDefs->
    name typeString typeInt comment
    ---- ---------- ------- -------
    id   SYMBOL     17
    val  DOUBLE     16
*/
```

### Creating an OLAP Database {#topic_ibl_pln_1cc}

``` {#codeblock_iyq_qln_1cc}
if(existsDatabase("dfs://test")) dropDatabase("dfs://test")
     create database "dfs://test" partitioned by VALUE(1..10), HASH([SYMBOL, 40]), engine='OLAP'
```

**Creating a partitioned table**

``` {#codeblock_cqj_sln_1cc}
create table "dfs://test"."pt"(
        id INT,
        deviceId SYMBOL,
        date DATE[comment="time_col", compress="delta"],
        value DOUBLE,
        isFin BOOL
    )
    partitioned by ID, deviceID,

pt = loadTable("dfs://test","pt")
pt.schema()

/* output 
partitionSchema->([1,2,3,4,5,6,7,8,9,10],40)
partitionSites->
partitionColumnType->[4,17]
partitionTypeName->[VALUE,HASH]
chunkGranularity->TABLE
chunkPath->
partitionColumnIndex->[0,1]
colDefs->
name     typeString typeInt comment 
-------- ---------- ------- --------
id       INT        4               
deviceId SYMBOL     17              
date     DATE       6       time_col
value    DOUBLE     16              
isFin    BOOL       1               

partitionType->[1,5]
partitionColumnName->[id,deviceId]
*/ 
```

**Creating a dimension table**

``` {#codeblock_gbp_tln_1cc}
create table "dfs://test"."pt1"(
        id INT,
        deviceId SYMBOL,
        date DATE[comment="time_col", compress="delta"],
        value DOUBLE,
        isFin BOOL
    )

 pt1 = loadTable("dfs://test","pt1")
 pt1.schema()

/* output
chunkPath->
partitionColumnIndex->-1
colDefs->
name     typeString typeInt comment 
-------- ---------- ------- --------
id       INT        4               
deviceId SYMBOL     17              
date     DATE       6       time_col
value    DOUBLE     16              
isFin    BOOL       1
*/  
```

### Creating a TSDB Database {#topic_ixj_4jn_1cc}

``` {#codeblock_hpg_tjn_1cc}
if(existsDatabase("dfs://test")) dropDatabase("dfs://test")
create database "dfs://test" partitioned by VALUE(1..10), HASH([SYMBOL, 40]), engine='TSDB'
```

**Creating a partitioned table**

``` {#codeblock_yht_5jn_1cc}
create table "dfs://test"."pt"(
        id INT,
        deviceId SYMBOL,
        date DATE[comment="time_col", compress="delta"],
        value DOUBLE,
        isFin BOOL
    )
    partitioned by ID, deviceID,
    sortColumns=[`deviceId, `date],
    keepDuplicates=ALL

 pt = loadTable("dfs://test","pt")
 pt.schema()

/* output
engineType->TSDB
keepDuplicates->ALL
partitionColumnIndex->[0,1]
colDefs->
name     typeString typeInt extra comment 
-------- ---------- ------- ----- --------
id       INT        4                     
deviceId SYMBOL     17                    
date     DATE       6             time_col
value    DOUBLE     16                    
isFin    BOOL       1                     

partitionType->[1,5]
partitionColumnName->[id,deviceId]
partitionSchema->([1,2,3,4,5,6,7,8,9,10],40)
partitionSites->
partitionColumnType->[4,17]
partitionTypeName->[VALUE,HASH]
sortColumns->[deviceId,date]
softDelete->false
tableOwner->admin
chunkGranularity->TABLE
chunkPath-> 
*/ 
```

**Creating a dimension table**

``` {#codeblock_qmh_yjn_1cc}
create table "dfs://test"."pt1"(
        id INT,
        deviceId SYMBOL,
        date DATE[comment="time_col", compress="delta"],
        value DOUBLE,
        isFin BOOL
    )
    sortColumns=[`deviceId, `date]

 pt1 = loadTable("dfs://test","pt1")
 pt1.schema()

/* output
sortColumns->[deviceId,date]
softDelete->false
tableOwner->admin
engineType->TSDB
keepDuplicates->ALL
chunkGranularity->TABLE
chunkPath->
partitionColumnIndex->-1
colDefs->
name     typeString typeInt extra comment 
-------- ---------- ------- ----- --------
id       INT        4                     
deviceId SYMBOL     17                    
date     DATE       6             time_col
value    DOUBLE     16                    
isFin    BOOL       1 
*/
```

### Creating a PKEY Database {#topic_ysm_zjn_1cc}

``` {#codeblock_rvw_1kn_1cc}
if(existsDatabase("dfs://test")) dropDatabase("dfs://test")
create database "dfs://test" partitioned by VALUE(1..10), engine="PKEY"
```

**Creating a partitioned table**

``` {#codeblock_k2k_dkn_1cc}
create table "dfs://test"."pt"(
     id INT,
     deviceId SYMBOL [indexes="bloomfilter"],
     date DATE [comment="time_col", compress="delta"],
     value DOUBLE,
     isFin BOOL
 )
 partitioned by ID,
 primaryKey=`ID`deviceID
```

**Creating a dimension table**

``` {#codeblock_qtq_dkn_1cc}
create table "dfs://test"."dt"(
     id INT,
     deviceId SYMBOL [indexes="bloomfilter"],
     date DATE [comment="time_col", compress="delta"],
     value DOUBLE,
     isFin BOOL
 )
 partitioned by ID,
 primaryKey=`ID`deviceID
```

### Creating a Temporary In-memory Table {#topic_yqv_zkn_1cc}

``` {#codeblock_ex2_hkn_1cc}
create local temporary table "tb" (
        id SYMBOL,
        val DOUBLE
    ) on commit preserve rows
     tb.schema()

    partitionColumnIndex->-1
    chunkPath->
    colDefs->
    name typeString typeInt extra comment
    ---- ---------- ------- ----- -------
    id   SYMBOL     17
    val  DOUBLE     16
```

### Creating a Partitioned Table with User-Defined Rules {#topic_fkj_2ln_1cc}

For data with a column in the format `id_date_id` \(e.g., ax1ve\_20240101\_e37f6\), partition by date using a user-defined function:

``` {#codeblock_dnv_ssg_1cc}
// Define a function to extract the date information
def myPartitionFunc(str,a,b) {
    return temporalParse(substr(str, a, b),"yyyyMMdd")
}

// Create a database
data = ["ax1ve_20240101_e37f6", "91f86_20240103_b781d", "475b4_20240101_6d9b2", "239xj_20240102_x983n","2940x_20240102_d9237"]
tb = table(data as id_date, 1..5 as value, `a`b`c`d`e as sym)

dbName = "dfs://testdb"
if(existsDatabase(dbName)){
        dropDatabase(dbName)        
}

create database "dfs://testdb" partitioned by VALUE(2024.02.01..2024.02.02), engine='TSDB'

create table "dfs://testdb"."pt"(
	date STRING,
	value INT,
	sym SYMBOL
	)
	partitioned by myPartitionFunc(date, 6, 8)
	sortColumns="sym"
	
// Use myPartitionFunc to process the data column
pt = loadTable(dbName,"pt")
pt.append!(tb)
flushTSDBCache()

select * from pt
```

The queried data are read and returned by partition. The query result shows that table pt is partitioned by the date information extracted from the id\_date column.

<table id="table_exp_tsg_1cc"><thead><tr><th align="left">

id\_date

</th><th align="left">

value

</th><th align="left">

sym

</th></tr></thead><tbody><tr><td align="left">

ax1ve\_20240101\_e37f6

</td><td align="left">

1

</td><td align="left">

a

</td></tr><tr><td align="left">

475b4\_20240101\_6d9b2

</td><td align="left">

3

</td><td align="left">

c

</td></tr><tr><td align="left">

239xj\_20240102\_x983n

</td><td align="left">

4

</td><td align="left">

d

</td></tr><tr><td align="left">

2940x\_20240102\_d9237

</td><td align="left">

5

</td><td align="left">

e

</td></tr><tr><td align="left">

91f86\_20240103\_b781d

</td><td align="left">

2

</td><td align="left">

b

</td></tr></tbody>
</table>### Create a Measurement Point Table {#topic_zlt_blb_3gc}

In the following example, we use a composite strategy to partition data by “id” and “ts”, creating a measurement point table named “pt”. This table uses “ticket” and “id2” as the two columns that uniquely identify a measurement point. The latest value cache is enabled and the column “value” is of IOTANY type, allowing it to store data of various types. The compression for sort key columns is also enabled.

``` {#codeblock_amt_blb_3gc}
dbName = "dfs://db"
if (existsDatabase(dbName)) {
    dropDatabase(dbName)
}
// create an IOTDB database
create database "dfs://db" partitioned by HASH([INT, 20]),VALUE(2017.08.07..2017.08.11), engine='IOTDB'
// create a measurement point table with deviceId & location as sort key columns and timestamp as the last column
create table "dfs://db"."pt" (
        deviceId INT,
        location SYMBOL,
        timestamp TIMESTAMP,
        value IOTANY
)
partitioned by deviceId, timestamp,
sortColumns = [`deviceId, `location, `timestamp],
sortKeyMappingFunction = [hashBucket{, 50}, hashBucket{, 50}],
latestKeyCache = true
```

