How to generate tests from metadata?
Why Should I Generate Tests?
This feature is a huge time-saver and helps you to generate detailed tests with almost no effort. There are many use cases when generating tests comes in handy.
You can check things like:
- every table has a primary key,
- every staging table contains some technical column,
- every fact table has clustered columnstore index,
It is common to verify some rules e.g.,
- for every contract,
- for every customer,
- for every combinantion of contract and metric,
How to Define Metadata for Tests?
So you need to create a test for every XXX. What is XXX, where it is? For examples mentioned above, these may be
- list of tables,
SELECT * FROM INFORMATION_SCHEMA.TABLES(potentially
WHERE TABLE_SCHEMA = ...)
- list of columns in some tables (again, you can leverage
- data in your
The data need to be in one of data sources defined in your project file. If it is missing, add it and give it a name. The metadata can be in any supported Provider (relational database, CSV or MS Excel file, tabular model, …).
Data Sources: - Name: AeroDWH Provider: SqlServer@1 Connection string: '%AERO_DWH_CONNECTION_STRING%' # environment variable used here Named sets: # here you define metadata - Name: staging tables Data source: AeroDWH Query: | SELECT * FROM INFORMATION_SCHEMA.TABLES AS t WHERE t.TABLE_SCHEMA LIKE 'STAGE%'
Remember - this is just an example. Your metadata may be anywhere and in any format.
That’s it. You are ready to generate tests.
How to “Expand” Test Based on Metadata?
The easiest and most intuitive way to write generated tests is:
- define a test for one of the entries only and run it
- “expand” it
In the above example, we have a list of all staging tables, right? Let’s write a test for one of them,
STAGE.AIRLINES. We are checking the table contains column
Tests: - Name: Staging table STAGE.AIRLINES has a timestamp column Description: | Table AIRLINES in schema STAGE must contain column SYS_INSERTED_TIMESTAMP, because it is obligatory for all staging tables. Data source: AeroDWH Query: | SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'STAGE' AND TABLE_NAME = 'AIRLINES' AND COLUMN_NAME = 'SYS_INSERTED_TIMESTAMP' Expectation: set is not empty
This is your test for one table. It works, but you want it for all of these tables:
Now you need two simple things:
- Tell the test it should use metadata (
- Replace hardcorded values with %ColumnName%.
Tests: - Name: Staging table %TABLE_SCHEMA%.%TABLE_NAME% has a timestamp column Metadata: staging tables # don't forget this! It's the link to metadata query Description: | Table %TABLE_NAME% in schema %TABLE_SCHEMA% must contain column SYS_INSERTED_TIMESTAMP, because it is obligatory for all staging tables. Data source: AeroDWH Query: | SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = '%TABLE_SCHEMA%' AND TABLE_NAME = '%TABLE_NAME%' AND COLUMN_NAME = 'SYS_INSERTED_TIMESTAMP' Expectation: set is not empty
That’s it. You’ll now have one test per each row returned by the metadata query. Might look complicated at first sight, but actualy it is trivial - give it a try.