loadText

Syntax

loadText(filename, [delimiter], [schema], [skipRows=0])

Arguments

filename the input text file name with its absolute path.

delimiter a STRING scalar indicating the table column separator. It can consist of one or more characters, with the default being a comma (‘,’).

schema a table. It can have the following columns, among which “name” and “type” columns are required.

column name	data type	meaning
name	STRING scalar	column name
type	STRING scalar	data type (BLOB not supported)
format	STRING scalar	the format of temporal columns
col	INT scalar or vector	the columns to be loaded

Note: If “type” specifies a temporal data type, the format of the source data must match a DolphinDB temporal data type. If the format of the source data and the DolphinDB temporal data types are incompatible, you can specify the column type as STRING when loading the data and convert it to a DolphinDB temporal data type using the temporalParse function afterwards.

skipRows is an integer between 0 and 1024 indicating the rows in the beginning of the text file to be ignored. The default value is 0.

Details

Load a text file into memory as a table. loadText loads data in single thread. To load data in multiple threads, use ploadText .

When loading a text file, the first row of data is read in string format, and the column names are parsed from that data. Please note that the upper limit for the first row is 256 KB. If none of the columns in the first row of the file starts with a number, the first row is treated as the header with column names of the text file. If at least one of the columns in the first row of the file starts with a number, the system uses col0, col1, … as the column names.
When loading a text file, the system determines the data type of each column based on a random sample of rows. This convenient feature may not always accurately determine the data type of all columns. We recommend users check the data type of each column with the extractTextSchema function after loading. If a column does not have the expected data type, then we need to enter the correct data type of the column in the schema table. Users can also specify data types for all columns. For a temporal column, if it does not have the expected data type, we also need to specify a format such as “MM/dd/yyyy” in the schema table. For details about temporal formats please refer to Parsing and Format of Temporal Variables.

As string in DolphinDB is encoded in UTF-8, we require input text files be encoded in UTF-8.

Column names in DolphinDB must only contain letters, numbers or underscores and must start with a letter. If a column name in the text file does not meet the requirements, the system automatically adjusts it:

If the column name contains characters other than letters, numbers or underscores, these characters are converted into underscores.

If the column name does not start with a letter, add “c” to the column name so that it starts with “c”.

A few examples:

Column name in data files	Adjusted column name
1_test	c1_test
test-a!	test_a_
[test]	c_test_

Note: From version 1.30.22 onwards, function loadText` supports loading a data file that contains a record with multiple newlines.

Examples

Use the following script to generate the data file to be used for the examples:

$ n=10
$ sym=rand(`AAPL`ORCL`MS`SUN,n)
$ permno=take(10001,n)
$ date=rand(2019.06.01..2019.06.10,n)
$ open=rand(100.0,n)
$ high=rand(200.0,n)
$ close=rand(200.0,n)
$ pre_close=rand(200.0,n)
$ change=rand(100.0,n)
$ vol=rand(10000,n)
$ amount=rand(100000.0,n)
$ t=table(sym,permno,date,open,high,close,pre_close,change,vol,amount)
$ saveText(t,"C:/DolphinDB/Data/stock.csv");

Ex 1. Use loadText without specifying any optional parameters:

$ tt=loadText("C:/DolphinDB/Data/stock.csv");

$ schema(tt).colDefs;

name	typeString	typeInt
sym	SYMBOL	17
permno	INT	4
date	DATE	6
open	DOUBLE	16
high	DOUBLE	16
close	DOUBLE	16
pre_close	DOUBLE	16
change	DOUBLE	16
vol	INT	4
amount	DOUBLE	16

Ex 2. Specify the data type of a column before loading the file.

We may want to change the data type of column “permno” to be SYMBOL. For this, we need to use function extractTextSchema to get the schema table, update it, then load the text file with the revised schema table.

$ schema=extractTextSchema("C:/DolphinDB/Data/stock.csv");
$ update schema set type=`SYMBOL where name=`permno;
$ tt=loadText("C:/DolphinDB/Data/stock.csv",,schema);
$ schema(tt).colDefs;

name	typeString	typeInt
sym	SYMBOL	17
permno	SYMBOL	17
date	DATE	6
open	DOUBLE	16
high	DOUBLE	16
close	DOUBLE	16
pre_close	DOUBLE	16
change	DOUBLE	16
vol	INT	4
amount	DOUBLE	16

We can also specify the data types of all columns:

$ schematable=table(`sym`permno`date`open`high`close`pre_close`change`vol`amount as name,`SYMBOL`SYMBOL`DATE`DOUBLE`DOUBLE`DOUBLE`DOUBLE`DOUBLE`INT`DOUBLE as type)
$ tt=loadText("C:/DolphinDB/Data/stock.csv",,schema)
$ schema(tt).colDefs;

name	typeString	typeInt
sym	SYMBOL	17
permno	SYMBOL	17
date	DATE	6
open	DOUBLE	16
high	DOUBLE	16
close	DOUBLE	16
pre_close	DOUBLE	16
change	DOUBLE	16
vol	INT	4
amount	DOUBLE	16

Ex 3. Load only a subset of columns.

For example, we may only need to load the following 7 columns: sym, date, open, high, close, vol, amount. Please note that we cannot change the order of columns when loading data. To change the order of columns in the loaded table, use function reorderColumns!.

$ schema=extractTextSchema("C:/DolphinDB/Data/stock.csv");
$ schema=select * from schema where name in `sym`date`open`high`close`vol`amount
$ schema[`col]=[0,2,3,4,5,8,9]

$ tt=loadText("C:/DolphinDB/Data/stock.csv",,schema);
$ tt;

sym	date	open	high	close	vol	amount
SUN	2019.06.10	18.675316	72.754005	136.463909	1376	31371.319038
AAPL	2019.06.05	42.098717	196.873587	41.513899	3632	9950.864129
ORCL	2019.06.05	62.223474	197.099027	123.785675	3069	38035.800937
SUN	2019.06.03	0.18163	50.669866	4.652098	6213	1842.198893
SUN	2019.06.06	32.54134	67.012502	130.312294	4891	55744.156823
SUN	2019.06.07	56.899091	81.709825	61.786176	1133	69057.849515
AAPL	2019.06.08	77.026838	38.504431	22.68496	3672	34420.187073
ORCL	2019.06.07	62.752656	39.33621	48.483091	4382	41601.601639
AAPL	2019.06.02	8.5487	17.623418	141.88325	8092	15449.159988
AAPL	2019.06.02	26.178685	197.320455	110.52407	5541	14616.820449

Ex 4. Skip the first 6 rows when loading.

Please note that as the header is the first line of the text file, it is also skipped.

$ re=loadText(filename="C:/DolphinDB/Data/stock.csv",skipRows=6);
$ re;

col0	col1	col2	col3	col4	col5	col6	col7	col8	col9
ORCL	10001	2019.06.06	96.132818	197.911394	157.364835	50.446378	54.036755	9614	55067.751277
ORCL	10001	2019.06.04	57.511043	89.232037	109.741833	104.814073	19.041139	3212	13821.150875
ORCL	10001	2019.06.03	75.718595	113.078262	154.271885	84.369967	95.376425	8438	1805.280661
ORCL	10001	2019.06.01	96.713015	83.344975	105.501576	126.438826	47.46568	2679	44953.587023
SUN	10001	2019.06.03	18.724487	8.685236	134.523515	20.608044	52.904951	6470	47650.962789

Ex 5. Specify the temporal format when loading the file.

Generate the text file to be used:

$ time=["20190623145457","20190623155423","20190623163025"]
$ sym=`AAPL`MS`IBM
$ qty=2200 5400 8670
$ price=54.78 59.64 65.23
$ t=table(time,sym,qty,price)
$ saveText(t,"C:/DolphinDB/Data/t2.csv");

$ schema=extractTextSchema("C:/DolphinDB/Data/t2.csv")
$ update schema set type = "DATETIME" where name = "time"
$ loadText("C:/DolphinDB/Data/t2.csv",,schema);

time	sym	qty	price

From the example above, if we load this text file without specifying the format of column “time”, column “time” is empty as the system cannot parse the raw data correctly. For this scenario we must specify the format of the column.

$ schema[`format]=["yyyyMMddHHmmss",,,];
$ loadText("C:/DolphinDB/Data/t2.csv",,schema);

time	sym	qty	price
2019.06.23T14:54:57	AAPL	2200	54.78
2019.06.23T15:54:23	MS	5400	59.64
2019.06.23T16:30:25	IBM	8670	65.23