Introduction

What is a Provider in CAT? Why do I need to know?

What is a Provider in CAT?

In CAT, you define Data sources - there you tell CAT where is the data you want to test and give friendly names to these sources. But CAT also needs to know about the format of the data - is it CSV file, is it MS Excel workbook, is it ORACLE database? CAT accesses data using Providers.

Simply said, Provider is something that can return row(s) of data. CAT has comes out-of-the-box with many implemented providers, see Overview of implemented providers.

Where Do I Use Providers?

Data Sources Definitions

When you define Data sources you want to test, you have to specify a provider. Example:

Data Sources:
- Name: DWH
  Provider: SqlServer@1
  Connection string: data source=localhost;integrated security=true;initial catalog=DWH
- Name: DwhModel
  Provider: PowerBI@1
  Connection string: "DwhModel"

See the Provider row in DWH data source? It tells CAT that it should connect to a SQL server database. In the other data source, you instruct CAT to use its skills to connect to locally open Power BI Desktop file (DwhModel.pbix) and gives you opportunity to test your model before you share it with the rest of your team or with users.

List of Tests and/or Data sources

In CAT you have a freedom where you define and maintain your tests and data sources. You can manage them in a YAML file(s), in a relational database, whatever suits you.

In fact, any Provider implemented in CAT can be used to provide Data Sources and Tests definitions.

So if your tests are in a relational table in a PostgreSQL database, tell it to CAT like this:

Get list of tests from:
- Provider: Postgres@1
  Connection string: "%DWH_CONNECTION_STRING%" # environment variable is used here
  Query: select * from public.test_definitions;

You can use different providers for tests and data sources, more providers for each etc. See Project files for more details.

Versions

OK, I get it, but what is the @1 at the end in all the examples? Well, databases (and other software) evolve, and so do the necessary drivers. To make things even more complicated, there are usually multiple ways how to connect to the data. These facts simply cannot be hidden and a user must be aware of that.

The documentation contains details about how exactly is CAT connecting to the data, such as what driver it uses and in what versions.

There are two primary reasons why versions were introduced to providers:

  • More ways how to access data. E.g., for SqlServer provider, we have a version that uses .NET System.Data.SqlClient namespace (@1) and a version that uses newer .NET Microsoft.Data.SqlClient namespace (@2 - not yet implemented).

  • Backward compatibility. Whenever there will be troubles with backward compatibility of a new version of a driver, we’ll keep the existing one and create new one with a raised version.

This also means that more versions of a single CAT provider can be supported at the same time.