Csv@2

Csv@2 provider lets you run advanced SQL queries against CSV files, without any drivers installed.

🚀

This provider is in Preview! Any feedback is highly appreciated.

You can use the Feedback button in the bottom right corner of this site. It might speed up the preview phase. Thank you! 😻

Basic Usage

Data Sources:
- Name: 'FinancialData'
  Provider: Csv@2
  Connection string: 'DataToTest\Invoices.csv'

CAT loads the data from a CSV into an in-memory DuckDB database. The CSV is loaded once for each executing thread (by defulat once only). The data is removed from memory after all tests are evaluated.

You can then refer to the table’s data in your test queries - the data will be loaded into a table with name data source name.file name without extension. In the example above, the table name will be FinancialData.Invoices.

Connection String

In Csv@2 provider, you can specify any of these:

absolute file to a CSV file
relative path to a CSV file (elative paths are resolved against the directory where your .cat.yaml project file is located)
absolute or relative path with wildcard * (wildcards are allowed only in file name, recursion is not allowed)
more of the above mentioned, separated by a comma

Data Sources:
- Name: My CSV data
  Provider: Csv@2
  Connection string: >
     DataToTest\*.csv,
     D:\Export\Financial\Invoices-2024.csv

CAT loads each file that matches the connection string into a separate table (they are NOT combined into one table). Let’s say DataToTest directory contains files Summary-2023.csv and Summary-2024.csv. You can refer to them in your tests in a query like this (example how to combine them):

SELECT * FROM "My CSV data"."Summary-2023"
UNION ALL
SELECT * FROM "My CSV data"."Summary-2024"

Settings

There are various settings you can use to tweak the CSV parsing and load. Example:

Data Sources:
- Name: 'FinancialData'
  Provider: Csv@2
  Connection string: 'DataToTest\Invoices.csv'
  Settings:  delim = '|', header = true, compression = 'gzip'

CAT uses DuckDB under the hood - for both loading the CSV file into a memory and for querying it. See all possible options here:

https://duckdb.org/docs/data/csv/overview.html#parameters

Querying

Because the CSVs are loaded into an in-memory DuckDB tables, you can use advanced SQL syntax. See these topics in our documentaiton:

Or refer to DuckDB documentation: https://duckdb.org/docs/sql/introduction

Prerequisites

Install Microsoft Visual C++ Redistributable

This is a very common prerequisite, used by many other softwares. So chances are good you already have it. Anyway, follow these instuctions to verify it or install.

Go to this Microsoft webpage: Latest Microsoft Visual C++ Redistributable Version

Download and install “X64” version:

If you get a dialog with “Repair”, “Uninstall” and “Close” buttons, just close it - you already have this prerequisite. Otherwise install it (next, next, next…).

Internet connection

DuckDB loads the extensions from this URL: http://extensions.duckdb.org. This needs to be accessible, so ensure you are connected to the Internet and white-list this URL if needed.

If you need to be able to test CSV data when offline, please file a request using the “Get Help” button on the right on this site.

Csv@2

Basic Usage #

Connection String #

Settings #

Querying #

Prerequisites #

Install Microsoft Visual C++ Redistributable #

Internet connection #