loadText
Syntax
loadText(filename, [delimiter], [schema], [skipRows=0])
Arguments
filename the input text file name with its absolute path.
delimiter a STRING scalar indicating the table column separator. It can consist of one or more characters, with the default being a comma (‘,’).
schema a table. It can have the following columns, among which “name” and “type” columns are required.
column name |
data type |
meaning |
---|---|---|
name |
STRING scalar |
column name |
type |
STRING scalar |
data type (BLOB not supported) |
format |
STRING scalar |
the format of temporal columns |
col |
INT scalar or vector |
the columns to be loaded |
Note: If “type” specifies a temporal data type, the format of the source data must match a DolphinDB temporal data type. If the format of the source data and the DolphinDB temporal data types are incompatible, you can specify the column type as STRING when loading the data and convert it to a DolphinDB temporal data type using the temporalParse function afterwards.
skipRows is an integer between 0 and 1024 indicating the rows in the beginning of the text file to be ignored. The default value is 0.
Details
Load a text file into memory as a table. loadText loads data in single thread. To load data in multiple threads, use ploadText .
When loading a text file, the first row of data is read in string format, and the column names are parsed from that data. Please note that the upper limit for the first row is 256 KB. If none of the columns in the first row of the file starts with a number, the first row is treated as the header with column names of the text file. If at least one of the columns in the first row of the file starts with a number, the system uses col0, col1, … as the column names.
When loading a text file, the system determines the data type of each column based on a random sample of rows. This convenient feature may not always accurately determine the data type of all columns. We recommend users check the data type of each column with the extractTextSchema function after loading. If a column does not have the expected data type, then we need to enter the correct data type of the column in the schema table. Users can also specify data types for all columns. For a temporal column, if it does not have the expected data type, we also need to specify a format such as “MM/dd/yyyy” in the schema table. For details about temporal formats please refer to Parsing and Format of Temporal Variables.
As string in DolphinDB is encoded in UTF-8, we require input text files be encoded in UTF-8.
Column names in DolphinDB must only contain letters, numbers or underscores and must start with a letter. If a column name in the text file does not meet the requirements, the system automatically adjusts it:
If the column name contains characters other than letters, numbers or underscores, these characters are converted into underscores.
If the column name does not start with a letter, add “c” to the column name so that it starts with “c”.
A few examples:
Column name in data files |
Adjusted column name |
---|---|
1_test |
c1_test |
test-a! |
test_a_ |
[test] |
c_test_ |
Note: From version 1.30.22 onwards, function loadText` supports loading a data file that contains a record with multiple newlines.
Examples
Use the following script to generate the data file to be used for the examples:
$ n=10
$ sym=rand(`AAPL`ORCL`MS`SUN,n)
$ permno=take(10001,n)
$ date=rand(2019.06.01..2019.06.10,n)
$ open=rand(100.0,n)
$ high=rand(200.0,n)
$ close=rand(200.0,n)
$ pre_close=rand(200.0,n)
$ change=rand(100.0,n)
$ vol=rand(10000,n)
$ amount=rand(100000.0,n)
$ t=table(sym,permno,date,open,high,close,pre_close,change,vol,amount)
$ saveText(t,"C:/DolphinDB/Data/stock.csv");
Ex 1. Use loadText without specifying any optional parameters:
$ tt=loadText("C:/DolphinDB/Data/stock.csv");
$ schema(tt).colDefs;
name |
typeString |
typeInt |
comment |
---|---|---|---|
sym |
SYMBOL |
17 |
|
permno |
INT |
4 |
|
date |
DATE |
6 |
|
open |
DOUBLE |
16 |
|
high |
DOUBLE |
16 |
|
close |
DOUBLE |
16 |
|
pre_close |
DOUBLE |
16 |
|
change |
DOUBLE |
16 |
|
vol |
INT |
4 |
|
amount |
DOUBLE |
16 |
Ex 2. Specify the data type of a column before loading the file.
We may want to change the data type of column “permno” to be SYMBOL. For this, we need to use function extractTextSchema to get the schema table, update it, then load the text file with the revised schema table.
$ schema=extractTextSchema("C:/DolphinDB/Data/stock.csv");
$ update schema set type=`SYMBOL where name=`permno;
$ tt=loadText("C:/DolphinDB/Data/stock.csv",,schema);
$ schema(tt).colDefs;
name |
typeString |
typeInt |
comment |
---|---|---|---|
sym |
SYMBOL |
17 |
|
permno |
SYMBOL |
17 |
|
date |
DATE |
6 |
|
open |
DOUBLE |
16 |
|
high |
DOUBLE |
16 |
|
close |
DOUBLE |
16 |
|
pre_close |
DOUBLE |
16 |
|
change |
DOUBLE |
16 |
|
vol |
INT |
4 |
|
amount |
DOUBLE |
16 |
We can also specify the data types of all columns:
$ schematable=table(`sym`permno`date`open`high`close`pre_close`change`vol`amount as name,`SYMBOL`SYMBOL`DATE`DOUBLE`DOUBLE`DOUBLE`DOUBLE`DOUBLE`INT`DOUBLE as type)
$ tt=loadText("C:/DolphinDB/Data/stock.csv",,schema)
$ schema(tt).colDefs;
name |
typeString |
typeInt |
comment |
---|---|---|---|
sym |
SYMBOL |
17 |
|
permno |
SYMBOL |
17 |
|
date |
DATE |
6 |
|
open |
DOUBLE |
16 |
|
high |
DOUBLE |
16 |
|
close |
DOUBLE |
16 |
|
pre_close |
DOUBLE |
16 |
|
change |
DOUBLE |
16 |
|
vol |
INT |
4 |
|
amount |
DOUBLE |
16 |
Ex 3. Load only a subset of columns.
For example, we may only need to load the following 7 columns: sym, date, open, high, close, vol, amount. Please note that we cannot change the order of columns when loading data. To change the order of columns in the loaded table, use function reorderColumns!.
$ schema=extractTextSchema("C:/DolphinDB/Data/stock.csv");
$ schema=select * from schema where name in `sym`date`open`high`close`vol`amount
$ schema[`col]=[0,2,3,4,5,8,9]
$ tt=loadText("C:/DolphinDB/Data/stock.csv",,schema);
$ tt;
sym |
date |
open |
high |
close |
vol |
amount |
---|---|---|---|---|---|---|
SUN |
2019.06.10 |
18.675316 |
72.754005 |
136.463909 |
1376 |
31371.319038 |
AAPL |
2019.06.05 |
42.098717 |
196.873587 |
41.513899 |
3632 |
9950.864129 |
ORCL |
2019.06.05 |
62.223474 |
197.099027 |
123.785675 |
3069 |
38035.800937 |
SUN |
2019.06.03 |
0.18163 |
50.669866 |
4.652098 |
6213 |
1842.198893 |
SUN |
2019.06.06 |
32.54134 |
67.012502 |
130.312294 |
4891 |
55744.156823 |
SUN |
2019.06.07 |
56.899091 |
81.709825 |
61.786176 |
1133 |
69057.849515 |
AAPL |
2019.06.08 |
77.026838 |
38.504431 |
22.68496 |
3672 |
34420.187073 |
ORCL |
2019.06.07 |
62.752656 |
39.33621 |
48.483091 |
4382 |
41601.601639 |
AAPL |
2019.06.02 |
8.5487 |
17.623418 |
141.88325 |
8092 |
15449.159988 |
AAPL |
2019.06.02 |
26.178685 |
197.320455 |
110.52407 |
5541 |
14616.820449 |
Ex 4. Skip the first 6 rows when loading.
Please note that as the header is the first line of the text file, it is also skipped.
$ re=loadText(filename="C:/DolphinDB/Data/stock.csv",skipRows=6);
$ re;
col0 |
col1 |
col2 |
col3 |
col4 |
col5 |
col6 |
col7 |
col8 |
col9 |
---|---|---|---|---|---|---|---|---|---|
ORCL |
10001 |
2019.06.06 |
96.132818 |
197.911394 |
157.364835 |
50.446378 |
54.036755 |
9614 |
55067.751277 |
ORCL |
10001 |
2019.06.04 |
57.511043 |
89.232037 |
109.741833 |
104.814073 |
19.041139 |
3212 |
13821.150875 |
ORCL |
10001 |
2019.06.03 |
75.718595 |
113.078262 |
154.271885 |
84.369967 |
95.376425 |
8438 |
1805.280661 |
ORCL |
10001 |
2019.06.01 |
96.713015 |
83.344975 |
105.501576 |
126.438826 |
47.46568 |
2679 |
44953.587023 |
SUN |
10001 |
2019.06.03 |
18.724487 |
8.685236 |
134.523515 |
20.608044 |
52.904951 |
6470 |
47650.962789 |
Ex 5. Specify the temporal format when loading the file.
Generate the text file to be used:
$ time=["20190623145457","20190623155423","20190623163025"]
$ sym=`AAPL`MS`IBM
$ qty=2200 5400 8670
$ price=54.78 59.64 65.23
$ t=table(time,sym,qty,price)
$ saveText(t,"C:/DolphinDB/Data/t2.csv");
$ schema=extractTextSchema("C:/DolphinDB/Data/t2.csv")
$ update schema set type = "DATETIME" where name = "time"
$ loadText("C:/DolphinDB/Data/t2.csv",,schema);
time |
sym |
qty |
price |
---|---|---|---|
From the example above, if we load this text file without specifying the format of column “time”, column “time” is empty as the system cannot parse the raw data correctly. For this scenario we must specify the format of the column.
$ schema[`format]=["yyyyMMddHHmmss",,,];
$ loadText("C:/DolphinDB/Data/t2.csv",,schema);
time |
sym |
qty |
price |
---|---|---|---|
2019.06.23T14:54:57 |
AAPL |
2200 |
54.78 |
2019.06.23T15:54:23 |
MS |
5400 |
59.64 |
2019.06.23T16:30:25 |
IBM |
8670 |
65.23 |