loadTextEx

Syntax

loadTextEx(dbHandle, tableName, partitionColumns, filename, [delimiter], [schema], [skipRows=0], [transform], [sortColumns], [atomic=false], [arrayDelimiter], [containHeader], [arrayMarker])

Arguments

dbHandle the distributed database where the imported data will be saved. The database can be either in the distributed file system or an in-memory database.

tableName a string indicating the name of the table with the imported data.

partitionColumns a string scalar/vector indicating partitioning column(s). For sequential partition, partitionColumns is "" as it doesn't need a partitioning column. For composite partition, partitionColumns is a string vector.

filename is the input text file name with its absolute path. Currently only .csv files are supported.

delimiter (optional) is a STRING scalar indicating the table column separator. It can consist of one or more characters, with the default being a comma (',').

schema (optional) is a table. It can have the following columns, among which "name" and "type" columns are required.


Column	Data Type	Description
name	STRING scalar	column name
type	STRING scalar	data type
format	STRING scalar	the format of temporal columns
col	INT scalar or vector	the columns to be loaded

Note:

If "type" specifies a temporal data type, the format of the source data must match a DolphinDB temporal data type. If the format of the source data and the DolphinDB temporal data types are incompatible, you can specify the column type as STRING when loading the data and convert it to a DolphinDB temporal data type using the temporalParse function afterwards.

skipRows (optional) is an integer between 0 and 1024 indicating the rows in the beginning of the text file to be ignored. The default value is 0.

transform (optional) is a unary function. The parameter of the function must be a table.

sortColumns (optional) is a string scalar/vector indicating the columns based on which the table is sorted.

atomic (optional) is a Boolean value indicating whether to guarantee atomicity when loading a file with the cache engine enabled. If it is set to true, the entire loading process of a file is a transaction; set to false to split the loading process into multiple transactions.

Note:

It is required to set atomic = false if the file to be loaded exceeds the cache engine capacity. Otherwise, a transaction may get stuck: it can neither be committed nor rolled back.

arrayDelimiter (optional) is a single character indicating the delimiter for columns holding the array vectors in the file. Since the array vectors cannot be recognized automatically, you must use the schema parameter to update the data type of the type column with the corresponding array vector data type before import.

containHeader (optional) a Boolean value indicating whether the file contains a header row. The default value is null. See loadText for the detailed determining rules.

arrayMarker is a string containing 2 characters or a CHAR pair. These two characters represent the identifiers for the left and right boundaries of an array vector. The default identifiers are double quotes (").

It cannot contain spaces, tabs (\t), or newline characters (\t or \n).
It cannot contain digits or letters.
If one is a double quote ("), the other must also be a double quote.
If the identifier is ', ", or \, a backslash ( \ ) escape character should be used as appropriate. For example, arrayMarker="\"\"".
If delimiter specifies a single character, arrayMarker cannot contain the same character.
If delimiter specifies multiple characters, the left boundary of arrayMarker cannot be the same as the first character of delimiter.

chunkSize is a positive integer with a default value of 128 (in MB). It specifies the maximum size of each chunk during parallel import. The upper limit is max(maxMemSize / workerNum, 128MB), representing the greater of the maximum available memory per worker and 128MB.

Details

Load a text file into DolphinDB database.

If dbHandle is specified and is not empty string "": load a text file to a distributed database. The result is a table object with metadata of the table.

If dbHandle is empty string "" or unspecified: load a text file as a partitioned in-memory table. For this usage, when we define dbHandle with function database, the parameter directory must also be the empty string "" or unspecified.

If parameter transform is specified, we need to first execute createPartitionedTable and then load the data. The system will apply the function specified in parameter transform and then save the results into the database.

Function loadTextEx and function loadText share many common features, such as whether the first row is treated as the header, how the data types of columns are determined, how the system adjusts illegal column names, etc. For more details, please refer to function loadText.

Examples

Use the following script to generate the text file to be used:

n=10000
ID=rand(100, n)
dates=2017.08.07..2017.08.11
date=rand(dates, n)
vol=rand(1..10 join int(), n)
t=table(ID, date, vol)
saveText(t, "C:/DolphinDB/Data/t.txt");

Example: Load t.txt into a DFS database with a range domain on ID.

db = database(directory="dfs://rangedb", partitionType=RANGE, partitionScheme=0 51 101)
pt=loadTextEx(dbHandle=db,tableName=`pt, partitionColumns=`ID, filename="/home/DolphinDB/Data/t.txt");

Example: For a TSDB engine:

db = database(directory="dfs://rangedb", partitionType=RANGE, partitionScheme=0 51 101, engine='TSDB')
pt=loadTextEx(dbHandle=db, tableName=`pt, partitionColumns=`ID, filename="/home/DolphinDB/Data/t.txt", sortColumns=`ID`date);

Example: To load table pt from the database:

db = database("dfs://rangedb")
pt = loadTable(db, `pt);

Example: Load t.txt into a DFS database with a composite domain.

dbDate = database(directory="", partitionType=VALUE, partitionScheme=2017.08.07..2017.08.11)
dbID=database(directory="", partitionType=RANGE, partitionScheme=0 51 101)
db = database(directory="dfs://compoDB", partitionType=COMPO, partitionScheme=[dbDate, dbID])
pt = loadTextEx(dbHandle=db,tableName=`pt, partitionColumns=`date`ID, filename="/home/DolphinDB/Data/t.txt");

Example: Load t.txt into a partitioned in-memory table with value domain on date.

db = database(directory="", partitionType=VALUE, partitionScheme=2017.08.07..2017.08.11)
pt = db.loadTextEx(tableName="", partitionColumns=`date, filename="/home/DolphinDB/Data/t.txt");

pt.sortBy!(`ID`x);

Example: Convert all null values to 0 and then load the data into a dfs database with a composite domain.

dbDate = database(directory="", partitionType=VALUE, partitionScheme=2017.08.07..2017.08.11)
dbID=database(directory="", partitionType=RANGE, partitionScheme=0 51 101)
db = database(directory="dfs://compoDB", partitionType=COMPO, partitionScheme=[dbDate, dbID]);

pt=db.createPartitionedTable(table=t, tableName=`pt, partitionColumns=`date`ID)
pt=loadTextEx(dbHandle=db, tableName=`pt, partitionColumns=`date`ID, filename="/home/DolphinDB/Data/t.txt", transform=nullFill!{,0});

Example: Load array vectors

Create a csv file:

bid = array(DOUBLE[], 0, 20).append!([1.4799 1.479 1.4787, 1.4796 1.479 1.4784, 1.4791 1.479 1.4784])
ask = array(DOUBLE[], 0, 20).append!([1.4821 1.4825 1.4828, 1.4818 1.482 1.4821, 1.4814 1.4818 1.482])
TradeDate = 2022.01.01 + 1..3
SecurityID = rand(`APPL`AMZN`IBM, 3)
t = table(SecurityID as `sid, TradeDate as `date, bid as `bid, ask as `ask)
t;
saveText(t,filename="/home/DolphinDB/Data/t.csv",delimiter=',',append=true)

Load the file with loadTextEx

db = database(directory="dfs://testDB", partitionType=VALUE, partitionScheme=`APPL`AMZN`IBM, engine='TSDB')
path = "/home/DolphinDB/Data/t.csv"
schema = extractTextSchema(path);
update schema set type = "DOUBLE[]" where name="bid" or name ="ask"
loadTextEx(dbHandle=db, tableName=`t, partitionColumns=`sid, filename=path, schema=schema, arrayDelimiter=",", sortColumns=`sid`date);
select * from t;


sid	date	bid	ask
AMZN	2022.01.04	[1.479100,1.479000,1.478400]	[1.481400,1.481800,1.482000]
AMZN	2022.01.03	[1.479600,1.479000,1.478400]	[1.481800,1.482000,1.482100]
AMZN	2022.01.04	[1.479100,1.479000,1.478400]	[1.481400,1.481800,1.482000]
APPL	2022.01.02	[1.479900,1.479000,1.478700]	[1.482100,1.482500,1.482800]
APPL	2022.01.03	[1.479600,1.479000,1.478400]	[1.481800,1.482000,1.482100]
APPL	2022.01.04	[1.479100,1.479000,1.478400]	[1.481400,1.481800,1.482000]
IBM	2022.01.02	[1.479900,1.479000,1.478700]	[1.482100,1.482500,1.482800]
IBM	2022.01.03	[1.479600,1.479000,1.478400]	[1.481800,1.482000,1.482100]
IBM	2022.01.02	[1.479900,1.479000,1.478700]	[1.482100,1.482500,1.482800]

Example:Modify data types to be imported using the schema parameter, and use the transform parameter to add new columns.

The import requirements are as follows:

Add a new column during import to store the symbol.
Reorder the columns in the data file to match the target table schema.

This example demonstrates importing a file for a single symbol.

The file sz000001.csv contains the following columns in order: tradetime, open, close, high, low, vol.

The target table expects the following column order: symbol (to be added), datetime, vol, open, close, high, low.

dir = "/home/data/sz000001.csv"
sym = dir.split('/').tail(1).split('.')[0]

// Change the type of the tradetime column to DATETIME
schema = extractTextSchema(dir)
update schema set type="DATETIME" where name="tradetime"

if(existsDatabase("dfs://stock_data")) {
	dropDatabase("dfs://stock_data")
}

db=database(directory="dfs://stock_data", partitionType=VALUE, partitionScheme=2000.01M..2019.12M)

colNames=`sym`tradetime`vol`open`close`high`low
colTypes=[SYMBOL, DATETIME, INT, DOUBLE, DOUBLE, DOUBLE, DOUBLE]
t = table(1:0, colNames, colTypes)
pt = db.createPartitionedTable(t, `pt, `tradetime);
def mytrans(mutable t, sym, colNames){
        t.replaceColumn!(`tradetime, datetime(t.tradetime))
        // Add the first column sym
        t1 = select sym, * from t
        // Reorder the columns in the table using reorderColumns!
        t1.reorderColumns!(colNames)
        return t1
}

// Specify the schema and transform parameters to process the data
loadTextEx(db, `pt, `tradetime, dir, schema=schema, transform=mytrans{,sym, colNames})

select top 10 * from loadTable("dfs://stock_data", `pt)


sym	tradetime	vol	open	close	high	low
sz000001	2010.01.01T00:00:00	10,732	35.9484	35.4385	36.3261	35.9569
sz000001	2010.01.04T00:00:00	97,555	16.4653	13.6348	16.7255	14.9401
sz000001	2010.01.05T00:00:00	43,992	51.3616	53.1199	52.2155	50.4782
sz000001	2010.01.06T00:00:00	85,283	76.3469	80.0076	76.7017	74.9479
sz000001	2010.01.07T00:00:00	78,837	38.9411	35.8283	39.5269	38.2178
sz000001	2010.01.08T00:00:00	13,317	70.8803	67.9027	71.8693	70.8072
sz000001	2010.01.11T00:00:00	22,958	96.3163	97.4031	96.5605	96.364
sz000001	2010.01.12T00:00:00	45,621	17.2699	18.5016	17.7689	16.5028
sz000001	2010.01.13T00:00:00	54,886	75.7999	77.0245	76.4588	75.7482
sz000001	2010.01.14T00:00:00	3,132	49.8656	49.3416	50.7761	49.7972