extractTextSchema

Syntax

extractTextSchema(filename, [delimiter], [skipRows=0])

Arguments

filename is the input data file name with its absolute path. Currently only .csv files are supported.

delimiter (optional) is a string indicating the table column separator. It can consist of one or more characters, with the default being a comma (',').

skipRows (optional) is an integer between 0 and 1024 indicating the rows in the beginning of the text file to be ignored. The default value is 0.

Details

Generate the schema table for the input data file. The schema table has 2 columns: column names and their data types.

When the input file contains dates and times:

  • For data with delimiters (date delimiters "-", "/" and ".", and time delimiter ":"), it will be converted to the corresponding type. For example, "12:34:56" is converted to the SECOND type; "23.04.10" is converted to the DATE type.
  • For data without delimiters, data in the format of "yyMMdd" that meets 0<=yy<=99, 0<=MM<=12, 1<=dd<=31, will be preferentially parsed as DATE; data in the format of "yyyyMMdd" that meets 1900<=yyyy<=2100, 0<=MM<=12, 1<=dd<=31 will be preferentially parsed as DATE.
Note:

From version 1.30.22/2.00.10 onwards, function extractTextSchema supports a data file that contains a record with multiple newlines.

Examples

n=1000000
timestamp=09:30:00+rand(18000,n)
ID=rand(100,n)
qty=100*(1+rand(100,n))
price=5.0+rand(100.0,n)
t1 = table(timestamp,ID,qty,price)
saveText(t1, "C:/DolphinDB/Data/t1.txt")
schema=extractTextSchema("C:/DolphinDB/Data/t1.txt");
schema;
name type
timestamp SECOND
ID INT
qty INT
price DOUBLE