Driverless testing of CSV files
Thanks to a new experimental provider, we are now able to test CSV files without installation of any drivers. Advanced SQL constructs are allowed. Compared to CsvOleDB@1 driver, this one is well documented, advanced, and seems to work well.
See supported SQL constructs, functions etc. here: https://duckdb.org/docs/sql/introduction
As a teaser how it might be used, check the example below. It reads metadata from two CSV files, JOINs them and compares the resultset with a database metadata:
Data Sources: - Provider: DuckDB@1 # this provider is in preview Connection string: 'DataSource=:memory:' Name: duckdb - Provider: SqlServer@1 Connection string: data source=localhost;integrated security=true;initial catalog=WideWorldImporters; Name: sql Tests: - Name: Check all tables were created Description: compares metadata with the existing tables in database First Data Source: duckdb First Query: | SELECT t.TABLE_SCHEMA, t.TABLE_NAME, c.COLUMN_NAME FROM read_csv_auto('metadata_tables.csv', delim=',', header=True) as t JOIN read_csv_auto('metadata_columns.csv', delim=',', header=True) as c ON t.TABLE_SCHEMA = c.TABLE_SCHEMA AND t.TABLE_NAME = c.TABLE_NAME WHERE t.TABLE_TYPE = 'BASE TABLE' ORDER BY t.TABLE_SCHEMA, t.TABLE_NAME, c.COLUMN_NAME Second Data Source: sql Second Query: | SELECT c.TABLE_SCHEMA, c.TABLE_NAME, c.COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS as c JOIN INFORMATION_SCHEMA.TABLES as t ON t.TABLE_SCHEMA = c.TABLE_SCHEMA AND t.TABLE_NAME = c.TABLE_NAME WHERE t.TABLE_TYPE = 'BASE TABLE' ORDER BY TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME Expectation: sets match
Please consider everything related to DuckDB is a preview.
Anything may change, even the concept as a whole. Your feedback is crucial, please try it and let us know.
At the moment, this cannot be used for reading MS Excel files, but soon it will be.
We invested considerable time into testing the whole CAT system - and that is and will be continuous ongoing process.