csvToTable¶
Load data from a CSV file and return the table name.
csvToTable(
path,
query="",
delimiter=",",
encoding="utf-8",
dataTypes=dict(),
nullStrings=array()
)
Duplicate column names
If there are multiple columns with the same name, only the left-most column is kept. Column names are case-insensitive. For more info, see the page about column names.
Arguments¶
path-
A string representation of the path to the CSV file.
Both absolute and relative paths are supported. If the path is relative, it's considered relative to the model file (e.g. the model workbook). Environment variables (e.g.
%temp%or%homepath%) are also supported. query-
An optional query to be applied to the table. See SQL queries for details.
An empty argument or empty string here means that no query will be performed, i.e. all data from the CSV file will be used.
delimiter-
The character that separates columns within each line of text.
For CSV (comma-separated values) files, this is a comma, but there are other types of delimiter-separated values files, e.g. using
;instead.Note that the file extension does not indicate the delimeter, but rather the content of the file (for example
*.csvfiles may contain data separated by|, or;) encoding-
The character encoding of the text file.
This refers to how characters are represented in storage, i.e. "the ones and zeroes" that make up the text file. Modern text files use
UTF-8. Another common encoding isISO-8859-1, sometimes known aslatin-1. dataTypes-
A mapping of column names to data type specifiers. See data types for details.
Since CSV files are text files, they contain only strings. Numbers are autodetected based on the first non-empty value in each column (see below). Other data types are NOT autodetected, and need to be explicitly specified in this argument, otherwise they will remain strings.
nullStrings-
An array of strings to recognize as empty / missing values (in addition to the empty string
"").Empty / missing values are represented as
Nonein Python,nullin Arrow / Feather files, and as empty cells in Excel.
Where is the dateColumns argument?
In earlier versions of Autory, the dataTypes argument was called dateColumns, and it was a list of column names
that were expected to contain dates. This was changed to a dictionary of column names to data types to allow for
more flexibility.
Old syntax:
csvToTable(…, …, …, …, array("a","b"))
New syntax:
csvToTable(…, …, …, …, dict("a","date", "b","date"))
For backwards compatibility, the old syntax is still supported, but it is recommended to use the new syntax.
Numeric detection¶
Interpreting strings as numbers is not always straightforward, as explained here. However, by popular demand, Autory will attempt to autodetect numbers using the following assumptions:
- Numbers are in base 10 (the decimal system).
- The "thousands separator" is not used.
- The "decimal separator" is a period (
.).
To suppress this behavior, you should explicitly specify another data type for the column.
This has advantages and disadvantages:
- Pro: When reading a CSV file with many numeric columns where the names of the columns are not necessarily known in advance, you don't have to specify the data type of each column explicitly.
- Con: String columns that are incorrectly detected as numeric columns (e.g., RSA ID numbers or cellphone numbers) will
have to be set to the
stringdata type explicitly.
NULL detection¶
Deciding which strings represent empty values is not always straightforward, as explained here.
By default, Autory will consider empty strings (with or without quotes) as NULL, as explained
here.
To change this behavior, you can specify additional strings to be considered as NULL in the nullStrings argument.
This depends on the conventions used by the software or persons that generated the CSV file.
Examples¶
See the example workbook.