Csv@2
Csv@2 provider lets you run advanced SQL queries against CSV files, without any drivers installed.
On this page
This provider is in Preview! Any feedback is highly appreciated.
You can use the Feedback button in the bottom right corner of this site. It might speed up the preview phase. Thank you! 😻
Basic Usage
Data Sources:
- Name: 'FinancialData'
Provider: Csv@2
Connection string: 'DataToTest\Invoices.csv'
CAT loads the data from a CSV into an in-memory DuckDB database. The CSV is loaded once for each executing thread (by defulat once only). The data is removed from memory after all tests are evaluated.
You can then refer to the table’s data in your test queries - the data will be loaded into a table with name data source name
.file name without extension
. In the example above, the table name will be FinancialData.Invoices
.
Connection String
In Csv@2 provider, you can specify any of these:
-
absolute file to a CSV file
-
relative path to a CSV file (elative paths are resolved against the directory where your
.cat.yaml
project file is located) -
absolute or relative path with wildcard
*
(wildcards are allowed only in file name, recursion is not allowed) -
more of the above mentioned, separated by a comma
Data Sources:
- Name: My CSV data
Provider: Csv@2
Connection string: >
DataToTest\*.csv,
D:\Export\Financial\Invoices-2024.csv
CAT loads each file that matches the connection string into a separate table (they are NOT combined into one table). Let’s say DataToTest
directory contains files Summary-2023.csv
and Summary-2024.csv
. You can refer to them in your tests in a query like this (example how to combine them):
SELECT * FROM "My CSV data"."Summary-2023"
UNION ALL
SELECT * FROM "My CSV data"."Summary-2024"
Settings
There are various settings you can use to tweak the CSV parsing and load. Example:
Data Sources:
- Name: 'FinancialData'
Provider: Csv@2
Connection string: 'DataToTest\Invoices.csv'
Settings: delim = '|', header = true, compression = 'gzip'
CAT uses DuckDB under the hood - for both loading the CSV file into a memory and for querying it. See all possible options here:
https://duckdb.org/docs/data/csv/overview.html#parameters
Querying
Because the CSVs are loaded into an in-memory DuckDB tables, you can use advanced SQL syntax. See these topics in our documentaiton:
Or refer to DuckDB documentation: https://duckdb.org/docs/sql/introduction