Csv@2

Csv@2 provider lets you run advanced SQL queries against CSV files, without any drivers installed.

Basic Usage

Data Sources:
- Name: 'FinancialData'
  Provider: Csv@2
  Connection string: 'DataToTest\Invoices.csv'

CAT loads the data from a CSV into an in-memory DuckDB database. The CSV is loaded once for each executing thread (by defulat once only). The data is removed from memory after all tests are evaluated.

You can then refer to the table’s data in your test queries - the data will be loaded into a table with name data source name.file name without extension. In the example above, the table name will be FinancialData.Invoices.

Connection String

In Csv@2 provider, you can specify any of these:

  • absolute file to a CSV file

  • relative path to a CSV file (elative paths are resolved against the directory where your .cat.yaml project file is located)

  • absolute or relative path with wildcard * (wildcards are allowed only in file name, recursion is not allowed)

  • more of the above mentioned, separated by a comma

Data Sources:
- Name: My CSV data
  Provider: Csv@2
  Connection string: >
     DataToTest\*.csv,
     D:\Export\Financial\Invoices-2024.csv

CAT loads each file that matches the connection string into a separate table (they are NOT combined into one table). Let’s say DataToTest directory contains files Summary-2023.csv and Summary-2024.csv. You can refer to them in your tests in a query like this (example how to combine them):

SELECT * FROM "My CSV data"."Summary-2023"
UNION ALL
SELECT * FROM "My CSV data"."Summary-2024"

Settings

There are various settings you can use to tweak the CSV parsing and load. Example:

Data Sources:
- Name: 'FinancialData'
  Provider: Csv@2
  Connection string: 'DataToTest\Invoices.csv'
  Settings:  delim = '|', header = true, compression = 'gzip'

CAT uses DuckDB under the hood - for both loading the CSV file into a memory and for querying it. See all possible options here:

https://duckdb.org/docs/data/csv/overview.html#parameters

Querying

Because the CSVs are loaded into an in-memory DuckDB tables, you can use advanced SQL syntax. See these topics in our documentaiton:

Or refer to DuckDB documentation: https://duckdb.org/docs/sql/introduction