Explore sample project
In this tutorial, you will create a sample project and explore it.
Create a Sample Project
CAT offers templates for new projects. You can get their list using this command:
Import-Module CAT;
Get-CatProjectTemplate -Online | Format-List;
Using PowerShell 7, you can create a sample project like this:
$myDocuments = [Environment]::GetFolderPath("MyDocuments");
New-CatProject -Template getStartedWindows -Name CatSampleProject `
-Path "$myDocuments\CAT" -Commented -Wrap -Online;
Explanation:
-Template
- What project template you want (do not change this one for the purpose of this tutorial)
-Path
- Target directory where the new sample project should be created. Change it to whatever existing location on your file-system.
-Name
- Name for your new project, leave CatSampleProject for the purpose of the tutorial
-Wrap
- The getStartedWidnows template contains more files. This indicates you want them to be put into a directory (the directory will have the same name as the project name).
-Online
- CAT will refresh the template from CAT online server. Omit this one if you are on a machine with no access to the Internet.
Run the project. CAT will:
- create CatSampleProject folder
- create the sample files in it (we’ll explore them in this tutorial)
Explore and Run Tests
Open the generated folder in a file explorer:
CAT PowerShell module generated two CSV files with sample data, .cat.yaml
project file, a script for running the example from command-line and read-me files.
The CatSampleProject.cat.yaml
file is the main file for CAT - we call it a CAT project file. It holds the definitions for the tests and for the data sources. Open the file in your favorite editor (the configuration uses YAML format).
Notice how simple it is to add a new test. You only need a name for the test, one or two SQL or DAX statements and an expectation.
OK, so we created a project file, there are some test definitions there, it should test some data in the CSV files that are in the same folder. Now, let’s run it.
This is the magic of automated data tests. You can whenever run your tests again and again, using a single command:
Invoke-CatProject -Path "$myDocuments\CAT\CatSampleProject"
When you are in a folder that contains only one .cat.yaml
file, it is enough to simpl run Invoke-CatProject
, without any parameters.
You will get this result:
CAT also created a new MS Excel file with the results of the tests - it is in the TestResults
folder next to your .cat.yaml
file. This is because the CAT project file contains this: Output: xlsx
. Explore the generated file.
See Failed Test Details
Now, let’s experiment. Notice one of the tests checks for numbers in surnames. If any last name contains a number, the test should fail. So far it is passing, so let’s simulate we got wrong data. Open the XXX521260_Passengers.csv
. On row three, change the surname from “Brown” to “Br0wn1”. Save the CSV files and re-run all the tests.
Invoke-CatProject -Path "$myDocuments\CAT\CatSampleProject"
OK, now the test failed. You should see details in both command line and in generated MS Excel file:
Notice the wrong data was found. Examine the entire output, including the details of the error message. For the failed test, you have all the details - the name and the description of the failed test, a sample of erroneous data and SQL statement(s) used to define the test. You immediatelly know what is wrong, where, and you have at your hands all the weapons you need to troubleshoot the problem in your data.
Summary and What Next
You now have CAT PowerShell module installed. You created a sample project and are able to explore the tests it contains and details of a failed test. We did not explain much what the tests are actually about - we leave that to you as a home-work :-). Explore the structure of the CSVs and details of the tests (names, descriptions, SQL queries).
But there is one much more important thing:
Think about what everything can go wrong with your data. Did the yesterday’s pipeline run actually add any new data? Are there no new rows in your error-log table? Are all input files processed? Is your new Power BI meassure correct? CAT can automate these checks for you - you will have all your answers at hand any time with a single click of a button. Again and again.
The next step is: think of only one thing you’d like to check in your data.
Automate YOUR first test!
Alternatively, if you need more examples of tests, to get better idea what the tests may look like, create another sample project. There will be a new template named “AERO” (currently under construction) that contains many test examples. Explore that one to see what CAT can do.