Introduction
Introduction to basic concepts and terms in CAT. If you are new to CAT, read carefully.
What is CAT About
If you end up here, we expect you have some data and you want to trust them. CAT is about turning your assumptions about data into confidence - you will not assume the data is correct, you will know it is. Or you will exactly know what is wrong. You achieve that very simply by creating very simple automated tests.
This page summarizes basic terms and concepts used in CAT.
Data Source
Every company has lots of data. Very often, the data are in various formats:
The data are often “travelling” through the company systems, e.g., they are created in a CRM system, loaded into DWH and from DWH to Power BI wokspace for reporting. It is a best practice to have automated tests, that give you enough confidence that all data are trustworthy.
A Data Source in CAT is a definition of where the data is and how to connect to them. You give them a friendly name (such as CRM, DWH, Power BI, …). This friendly name is then used in your tests, where you define your expectation about the data in the system. This way you dont’ have to repeat all connection information in every test - you just specify, that your test examines data in e.g., DWH.
Example of a Data Source:
Data Sources:
- Provider: SqlServer@1
Connection string: data source=localhost;integrated security=true;initial catalog=DWH
Name: DWH
The example is in YAML, but you can have them defined also in a relational database table, in MS Excel file, …
For details, see Data Source.
Test Definition
A Test Definition describes your expectations about the data. E.g., you expect that a SQL statement issued against your CRM system (MySQL) returns exactly the same data as another SQL statement issued against your DWH (SQL Server). Or that none of your tables in a Power BI model is empty.
Example of a simple smoke test:
Tests:
- Test Suite: Smoke tests
Test Name: Dimension.Customer has data
Data Source: DWH
Query: SELECT * FROM Dimension.Customer
Expectation: set is not empty
And again - you can define your tests wherever you want - in YAML (as in the example), in MS Excel, in a relational database table.
For details, see Test Definition.
Provider
As mentioned, in CAT you crated named Data Sources - they tell to CAT where your data is. But that does not mean, that CAT can connect to any data source. CAT has predefined set of providers - they define to what types of data is CAT able to connect, such as MS SQL Server, ORACLE, PostgreSQL, MS Excel, CSV, Power BI, …
Note that in the data source definition the provider (and version) must be specified:
Data Sources:
- Provider: SqlServer@1 # without this line CAT wouldn't know HOW to connect (what drivers to use)
Connection string: data source=localhost;integrated security=true;initial catalog=DWH
Name: DWH
For details, see Providers
Expectation
In an Excpectation you express, what you expect from the dataset(s) returned by a query (or queries) you defined in a Test Definition.
CAT comes with predefined set of expectations, such as SetIsEmpty, SetRowCount or SetsMatch.
For details about Expectations, see Expectations.
For overview of all defined Expectations, see Expectations.
Output
CAT PowerShell module by default outputs the results of tests (and log messages) to console. For interactive testing sessions it is enough, but in various scenarios, you might need something else:
- Developers or business users often need results of tests in MS Excel format
- Azure DevOps or other automation tools need special file formats in order to display nice charts and details of test results
- You may be tasked to automate further processes, such as sending emails, insert results into database, invoke other processes etc.
CAT helps you significantly by creating various Outputs, such as MS Excel files, JSON, TRX or YAML files etc.
Output: xlsx, json
For details, see Outputs.
Project File
CAT needs to know somehow what data do you want to tests (data sources) and what exactly you need to check (tests).
For every testing project, you need to create a new CAT project file. You can do it simly with New-CatProject
command.
In simple scenarios, you simly define data sources and tests directly in that project file, something like this:
Data Sources: # where to connect
- Provider: SqlServer@1 # it is MS SQL server
Connection string: data source=localhost;integrated security=true;initial catalog=DWH
Name: DWH # friendly name for this data source
Tests:
- Name: Active customers have contracts # name for the test
Description: Every active customer has at least one active contract.
First Data Source: DWH #
First Query: |
SELECT CustomerID
FROM dbo.Customer as cust
WHERE cust.IsActive = 1
ORDER BY cust.CustomerID
Second Data Source: DWH
Second Query: |
SELECT DISTINCT c.CustomerID
FROM dbo.Contract as c
WHERE c.IsActive = 1
ORDER BY c.CustomerID
Expectation: SetsMatch # what do I expect from the result datasets
Output: xlsx
But often, you may need to load the test definitions (or even data source definitions) from e.g., relational database table or from a MS Excel file. You can do so, just specify what do you want in the project file.
In project files you also define outputs you need (if simple output to console window is not enough).
For full reference, see Project files.