Complete example
Complex example of a CAT project file
This example shows a few data sources, a few tests, how to generate tests from metadata, load test definition from external sources, setting multiple outputs, configure CAT to run in more threads and many more.
Presenting many CAT features in a single file while keeping the example meaningful is challenging. Instead of focusing on whether the tests make sense, focus on the configuration possibilities this example demonstrates.
We’ve often received feedback that our documentation includes only short, focused snippets, which can make it hard for beginners to see how everything fits together. This example gives you the whole picture. Simply remove the parts that aren’t relevant to your project!
Data sources: # define one or more data sources
- Name: FlightsSystem # friendly name, you use this in test definitions to refer to this data source
Provider: SqlServer@2 # see Providers in the docs
Connection string: >
Server=sql-aero-flights-prod.database.windows.net;
Authentication=Active Directory Service Principal;
Database=sqldb-aero-flights-prod;
User Id=%FLIGHTS_TESTING_PRINCIPAL_ID%;
Password=%FLIGHTS_TESTING_PRINCIPAL_SECRET%;
- Name: PassengersSystem
Provider: Postgres@1
Connection string: "%PASSENGERS_CONNECTION_STRING%"
Named sets: # here you can define metadata for generating tests
- Name: all gates # in this case we want to generate one test for each Gate we have
Data source: AERO_PROD # must be name of an existing data source
Query: |
SELECT GATE_ID, GATE_NUMBER
FROM DIM.GATES
# You can instruct CAT to load definition of data sources or tests from other sources
Get list of data sources from:
# in this case we want to load data sources from directory DataSources/PROD.yaml (we use environment variable to change the location at runtime)
- Provider: Yaml@1
Connection string: "./DataSources/%ENVIRONMENT%.yaml"
Query: /Data sources # this is name of top-level node in the YAML file
Get list of tests from:
# we can load tests from database, either from database table or procedure
- Provider: SqlServer@1
Connection string: data source=localhost;integrated security=true;initial catalog=Testing
Query: SELECT * FROM [Test] WHERE Skipped = 0
# or you can get list of tests from a procedure:
# Query: EXEC Testing.GetListOfTests @ENVIRONMENT='DEV'
# but we also want to load tests defined in MS Excel file:
- Provider: Excel@1
Connection string: MyTests.xlsx # file in the same directory as your project file
Query: SELECT * FROM [Sheet1]
# Tests and data sources can also be defined directly in your project file:
Tests:
- Test suite: Consistency checks # optinal, for hierarchical categorization
Test case: flights vs passengers # optinal, for hierarchical categorization
Order: 1 # this is metadata only, it does not influence anything
Name: Number of passengers is consistent between systems # short friendly name
# Description is optional but highly recommended:
Description: >
This verifies that the flight system passengers table
does not significantly differ in count of rows from
the passengers table in the passengers system.
First Data Source: FlightsSystem
First Query: |
SELECT COUNT(*)
FROM dbo.Passenger
WHERE DeletedFlag = 0
Second Data source: PassengersSystem
Second Query: |
SELECT COUNT(*) FROM public.passenger
Expectation: sets match # set is empty, set is not empty, set rowcount, sets match, contains
Tolerance: 5.0 # this means 5.0 % (so it is exactly 5.0 when the base is 100)
Tolerance mode: percent # you can also use 'absolute'
Tags: DevOnly # comma or semicolon separated, optional
- Test suite: Migration to cloud checks
Test case: Passengers data
Name: Check important columns in passengers table
Description: >
Verifies the passengers data vere completely migrated, including
correct non-english characters etc.
First data source: PassengersSystem
First Query: |
SELECT p.passenger_id, p.first_name, p.last_name, p.passport_number, p.date_of_birth
FROM public.passenger as p
LEFT OUTER_JOIN public.passengers_do_not_migrate as dnm on p.passenger_id = dnm.passenger_id
WHERE dnm.passenger_id IS NULL
ORDER BY p.passenger_id
Second data source: FlightsSystem
SELECT PassengerID, FirstName, LastName, PassportNumber, DateOfBirth
FROM Dim.Passenger
WHERE IsActive = 1
ORDER BY PassengerID
Expectation: sets match
Key: PassengerID # Not obligatory but recommended.
Tags: Consistency checks
- Name: Update of table
Test Suite: Timeliness Data Quality Dimension
Description: >
This is to check if the staging table, with
the column LastProcessDay, was updated today
Data Source: DWH
Query: |
SELECT TOP(5) *
FROM Staging.Shipment
WHERE LastProcessDay > CAST(GETDATE() AS DATE);
Expectation: set is not empty
Tags: Monitoring
- Name: Actual flights for gate %GATE_NUMBER% are loaded
Test Suite: BusinessTests
Description: Flights in FACT.DEPARTURES table for gate %GATE_NUMBER% are loaded.
Data Source: AERO_PROD
Query: |
SELECT *
FROM FACT.DEPARTURES AS d
WHERE d.DATE_OF_FLIGHT BETWEEN GETDATE() AND DATEADD(DAY, 14, d.DATE_OF_FLIGHT)
AND d.GATE_ID = '%GATE_ID%'
Expectation: set is not empty
Metadata: all gates # this tells CAT this is a "template" for metadata driven tests
# CAT will generate a new test for each row returned by query in named set 'all gates'
# values like '%GATE_ID%' will be replaced by respective column value in each row.
- Name: There are no new errors
Description: >
Checks there are no new errors in the logging table.
Data Source: DWH
Query: |
SELECT *
FROM Audit.LogIngestError
WHERE IsResolved = 0
Expectation: set is empty
Maximum errors logged: 50
Log number of errors: true
Tags: Monitoring
# short syntax for outputs:
# Output: xlsx, json, junit
Output:
- File: LatestTestResults.xlsx
Format: xlsx
- File: TestResults/TestResults-{timestamp}.xml
Format: junit
- Database: FlightsSystem # name of a defined data source
Table: Tests.CatTestResults
# or
# Procedure: dbo.SaveCatTestResult
Threads: 2
You will find all necessary details for everything in our documentaiton. Still, here is a collection of brief tips:
-
You can define more data sources.
-
Both data sources use environment variables to hide secrets. FlightsSystem data source hides only service principal ID and secret. PassengersSystem data source hides the entire connection string. CAT will automatically pick up values from environment variables FLIGHTS_TESTING_PRINCIPAL_ID, FLIGHTS_TESTING_PRINCIPAL_SECRET and PASSENGERS_CONNECTION_STRING and use those values.
-
Note the first test uses
Test suite
,Test case
andName
. This is hierarchy of tests in CAT. One suite has more test cases, one test cases has more tests (test names). Synonyms and various naming conventions are supported, e.g. for suite you can useTEST_SUITE
,test suite
,suite
etc. AlsoOrder
is used - it has no influence on anything in CAT and is used only for identification purposes (it is pure metadata of a test). CAT enforces uniqueness on combinaion of suite, test case, order and name (so called “test full name”). -
The first test uses
Description
. It is not mandatory, but it is highly recommended to always use it. It should contain information about why the test exists, what it checks, what to do if it fails. -
The first test compares data from different systems. It counts rows from tables in Azure SQL database and from local PostgreSQL database. The test will pass if the difference is lower than 5 percent (the lower number is used as base). When tolerance is set, it applies to all numeric columns. It is for convenience, if you need different tolerance for different columns, split your test to more tests with different tolerance settings (checking different columns with different tolerance).
-
The second test compares data across systems. It checks important columns for passengers tables. The
sets match
requires the data to be sorted, note both queries haveORDER BY
clause. CAT can compare data without knowing which column is a unique key. However, specifying a unique key using the Key property improves performance and makes error messages more helpful. The name of the key column can be taken from first or second set, it does not matter. CAT does NOT do any mapping based on column names - you can use different column names in queries. CAT on the other hannd DOES require the columns come in the same order in both queries. -
In Outputs sections we defined we want results in MS Excel format to file LatestTestResults.xlsx, JUnit XML file (for Azure DevOps) in a specific directory and with a timestamp in the name. We also defined that we want to output results of each test into a database table. If the table does not exist, CAT will attempt to create it. Schema must exist.
-
Threads
setting means CAT will be evaluating tests in two threads. There will be a queue of ALL tests. Each thread will pick up next test from a queue, remove it from the queue and evaluate it (independently on other threads). Each thread will end once it finds out the queue is already empty. -
Note
Tags
property. For each test, you can define one or more tags, separated by comman or semicolon. Tags are optional, you don’t have to specify them. Tags can contain spaces. You can include/exclude tests with or without tags when you are running tests from your projects. -
The second to last test is metadata driven. CAT will execute query defined in named set ‘all gates’. For each row, it will create a test. All values between percent will be replaced with a value of that column for the respective row (if such column exist).
-
The last test uses
set is empty
expectation. Note we usedMaximum errors logged
to tweak how many rows we will get in the sample of erroneous rows (1 by default, 50 maximum, 0 for no data in message). That works for all expectations. There is one more setting that works only forset is empty
expectation:log number of errors
. That will enrich the output with total number of errors (just a number). This is different - how many erroneous rows I see in the message (maximum errors logged
) vs how many errors exist (log number of errors
). -
The example also demonstrates how you can retrieve test definitions from other sources, such as from relational database table or procedure or from MS Excel sheet. Whatever data you are able to test with CAT can also store test definitions for you. CAT expects same names for columns (in database table/procedure or MS Excel Sheet or CSV or whatever) you are used to from YAML.