Introduction
What is a Provider in CAT? Why do I need to know?
What is a Provider in CAT?
In CAT, you define Data sources - there you tell CAT where is the data you want to test and give friendly names to these sources. But CAT also needs to know about the format of the data - is it CSV file, is it MS Excel workbook, is it ORACLE database? CAT accesses data using Providers.
Simply said, Provider is something that can return row(s) of data. CAT has comes out-of-the-box with many implemented providers, see Overview of implemented providers.
Where Do I Use Providers?
Data Sources Definitions
When you define Data sources you want to test, you have to specify a provider. Example:
Data Sources:
- Name: DWH
Provider: SqlServer@1
Connection string: data source=localhost;integrated security=true;initial catalog=DWH
- Name: DwhModel
Provider: PowerBI@1
Connection string: "DwhModel"
See the Provider row in DWH
data source? It tells CAT that it should connect to a SQL server database. In the other data source, you instruct CAT to use its skills to connect to locally open Power BI Desktop file (DwhModel.pbix
) and gives you opportunity to test your model before you share it with the rest of your team or with users.
List of Tests and/or Data sources
In CAT you have a freedom where you define and maintain your tests and data sources. You can manage them in a YAML file(s), in a relational database, whatever suits you.
In fact, any Provider implemented in CAT can be used to provide Data Sources and Tests definitions.
So if your tests are in a relational table in a PostgreSQL database, tell it to CAT like this:
Get list of tests from:
- Provider: Postgres@1
Connection string: "%DWH_CONNECTION_STRING%" # environment variable is used here
Query: select * from public.test_definitions;
You can use different providers for tests and data sources, more providers for each etc. See Project files for more details.
Versions
OK, I get it, but what is the @1
at the end in all the examples? Well, databases (and other software) evolve, and so do the necessary drivers. To make things even more complicated, there are usually multiple ways how to connect to the data. These facts simply cannot be hidden and a user must be aware of that.
The documentation contains details about how exactly is CAT connecting to the data, such as what driver it uses and in what versions.
There are two primary reasons why versions were introduced to providers:
-
More ways how to access data. E.g., for
SqlServer
provider, we have a version that uses .NETSystem.Data.SqlClient
namespace (@1) and a version that uses newer .NETMicrosoft.Data.SqlClient
namespace (@2 - not yet implemented). -
Backward compatibility. Whenever there will be troubles with backward compatibility of a new version of a driver, we’ll keep the existing one and create new one with a raised version.
This also means that more versions of a single CAT provider can be supported at the same time.