Generate tests
How to generate tests from metadata?
Why Should I Generate Tests?
This feature is a huge time-saver and helps you to generate detailed tests with almost no effort. There are many use cases when generating tests comes in handy.
Technical Use-Cases
You can check things like:
- every table has a primary key,
- every staging table contains some technical column,
- every fact table has clustered columnstore index,
- …
Business Use-Cases
It is common to verify some rules e.g.,
- for every contract,
- for every customer,
- for every combinantion of contract and metric,
- …
How to Define Metadata for Tests?
So you need to create a test for every XXX. What is XXX, where it is? For examples mentioned above, these may be
- list of tables,
SELECT * FROM INFORMATION_SCHEMA.TABLES
(potentiallyWHERE TABLE_SCHEMA = ...
) - list of columns in some tables (again, you can leverage
INFORMATION_SCHEMA.COLUMNS
) - data in your
Customers
table, - …
The data need to be in one of data sources defined in your project file. If it is missing, add it and give it a name. The metadata can be in any supported Provider (relational database, CSV or MS Excel file, tabular model, …).
Data Sources:
- Name: AeroDWH
Provider: SqlServer@1
Connection string: '%AERO_DWH_CONNECTION_STRING%' # environment variable used here
Named sets: # here you define metadata
- Name: staging tables
Data source: AeroDWH
Query: |
SELECT *
FROM INFORMATION_SCHEMA.TABLES AS t
WHERE t.TABLE_SCHEMA LIKE 'STAGE%'
Remember - this is just an example. Your metadata may be anywhere and in any format.
That’s it. You are ready to generate tests.
How to “Expand” Test Based on Metadata?
The easiest and most intuitive way to write generated tests is:
- define a test for one of the entries only and run it
- “expand” it
In the above example, we have a list of all staging tables, right? Let’s write a test for one of them, STAGE.AIRLINES
. We are checking the table contains column SYS_INSERTED_TIMESTAMP
:
Tests:
- Name: Staging table STAGE.AIRLINES has a timestamp column
Description: |
Table AIRLINES in schema STAGE
must contain column SYS_INSERTED_TIMESTAMP,
because it is obligatory for all staging tables.
Data source: AeroDWH
Query: |
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'STAGE'
AND TABLE_NAME = 'AIRLINES'
AND COLUMN_NAME = 'SYS_INSERTED_TIMESTAMP'
Expectation: set is not empty
This is your test for one table. It works, but you want it for all of these tables:
Now you need two simple things:
- Tell the test it should use metadata (
Metadata
keyword). - Replace hardcorded values with %ColumnName%.
Like this:
Tests:
- Name: Staging table %TABLE_SCHEMA%.%TABLE_NAME% has a timestamp column
Metadata: staging tables # don't forget this! It's the link to metadata query
Description: |
Table %TABLE_NAME% in schema %TABLE_SCHEMA%
must contain column SYS_INSERTED_TIMESTAMP,
because it is obligatory for all staging tables.
Data source: AeroDWH
Query: |
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = '%TABLE_SCHEMA%'
AND TABLE_NAME = '%TABLE_NAME%'
AND COLUMN_NAME = 'SYS_INSERTED_TIMESTAMP'
Expectation: set is not empty
That’s it. You’ll now have one test per each row returned by the metadata query. Might look complicated at first sight, but actualy it is trivial - give it a try.