Generate tests

How to generate tests from metadata?

Why Should I Generate Tests?

This feature is a huge time-saver and helps you to generate detailed tests with almost no effort. There are many use cases when generating tests comes in handy.

Technical Use-Cases

You can check things like:

  • every table has a primary key,
  • every staging table contains some technical column,
  • every fact table has clustered columnstore index,

Business Use-Cases

It is common to verify some rules e.g.,

  • for every contract,
  • for every customer,
  • for every combinantion of contract and metric,

How to Define Metadata for Tests?

So you need to create a test for every XXX. What is XXX, where it is? For examples mentioned above, these may be

  • list of tables, SELECT * FROM INFORMATION_SCHEMA.TABLES (potentially WHERE TABLE_SCHEMA = ...)
  • list of columns in some tables (again, you can leverage INFORMATION_SCHEMA.COLUMNS)
  • data in your Customers table,

The data need to be in one of data sources defined in your project file. If it is missing, add it and give it a name. The metadata can be in any supported Provider (relational database, CSV or MS Excel file, tabular model, …).

Data Sources:
- Name: AeroDWH
  Provider: SqlServer@1
  Connection string: '%AERO_DWH_CONNECTION_STRING%' # environment variable used here
Named sets: # here you define metadata
- Name: staging tables
  Data source: AeroDWH
  Query: |
    SELECT  *
    FROM INFORMATION_SCHEMA.TABLES AS t
    WHERE t.TABLE_SCHEMA LIKE 'STAGE%'

Remember - this is just an example. Your metadata may be anywhere and in any format.

That’s it. You are ready to generate tests.

How to “Expand” Test Based on Metadata?

The easiest and most intuitive way to write generated tests is:

  1. define a test for one of the entries only and run it
  2. “expand” it

In the above example, we have a list of all staging tables, right? Let’s write a test for one of them, STAGE.AIRLINES. We are checking the table contains column SYS_INSERTED_TIMESTAMP:

Tests:
- Name: Staging table STAGE.AIRLINES has a timestamp column
  Description: |
      Table AIRLINES in schema STAGE
      must contain column SYS_INSERTED_TIMESTAMP,
      because it is obligatory for all staging tables.
  Data source: AeroDWH
  Query: |
    SELECT  *
    FROM    INFORMATION_SCHEMA.COLUMNS
    WHERE   TABLE_SCHEMA = 'STAGE'
            AND TABLE_NAME = 'AIRLINES'
            AND COLUMN_NAME = 'SYS_INSERTED_TIMESTAMP'
  Expectation: set is not empty

This is your test for one table. It works, but you want it for all of these tables:

Metadata for test

Now you need two simple things:

  1. Tell the test it should use metadata (Metadata keyword).
  2. Replace hardcorded values with %ColumnName%.

Like this:

Tests:
- Name: Staging table %TABLE_SCHEMA%.%TABLE_NAME% has a timestamp column
  Metadata: staging tables # don't forget this! It's the link to metadata query
  Description: |
      Table %TABLE_NAME% in schema %TABLE_SCHEMA%
      must contain column SYS_INSERTED_TIMESTAMP,
      because it is obligatory for all staging tables.
  Data source: AeroDWH
  Query: |
    SELECT  *
    FROM    INFORMATION_SCHEMA.COLUMNS
    WHERE   TABLE_SCHEMA = '%TABLE_SCHEMA%'
            AND TABLE_NAME = '%TABLE_NAME%'
            AND COLUMN_NAME = 'SYS_INSERTED_TIMESTAMP'
  Expectation: set is not empty

That’s it. You’ll now have one test per each row returned by the metadata query. Might look complicated at first sight, but actualy it is trivial - give it a try.