Explore sample project

In this tutorial, you will create a sample project and explore it.

Create a Sample Project

CAT offers templates for new projects. You can get their list using this command:

catcli new --list --online

Using CAT CLI, you can create a sample project using this command:

catcli new -t getStartedWindows -d "%USERPROFILE%\Documents\CAT" -n CatSampleProject -wo

The %USERPROFILE% variable works only in Windows command-line. Replace it with $HOME if you run the examples from any version of PowerShell.

Explanation:

-t or --template
What project template you want (do not change this one for the purpose of this tutorial)
-d or --directory
Target directory where the new sample project should be created. Change it to whatever existing location on your file-system.
-n or --name
Name for your new project, leave CatSampleProject for the purpose of the tutorial
-w or --wrap
The getStartedWidnows template contains more files. This indicates you want them to be put into a directory (the directory will have the same name as the project name).
-o or --online
The -o switch means CAT will refresh the template from CAT online server. Omit this one if you are on a machine with no access to the Internet.

Run the project. CAT will:

  • create CatSampleProject folder
  • create the sample files in it (we’ll explore them in this tutorial)

Explore and Run Tests

Open the generated folder in a file explorer:

Generated content CAT CLI generated two CSV files with sample data, .cat.yaml project file, a script for running the example from command-line and read-me files.

The CatSampleProject.cat.yaml file is the main file for CAT - we call it a CAT project file. It holds the definitions for the tests and for the data sources. Open the file in your favorite editor (the configuration uses YAML format).

CAT project file Notice how simple it is to add a new test. You only need a name for the test, one or two SQL or DAX statements and an expectation.

OK, so we created a project file, there are some test definitions there, it should test some data in the CSV files that are in the same folder. Now, let’s run it.

This is the magic of autoamted data tests. You can whenever run your tests with a single command.

catcli run -p "%USERPROFILE%\Documents\CAT\CatSampleProject\CatSampleProject.cat.yaml"

If you are in the folder where your .cat.yaml project file is, you can just run catcli run (if there is only one .cat.yaml file, it will find it and run it).

You will get this result:

CAT CLI result of a run

CAT also created a new MS Excel file with the results of the tests - it is in the TestResults folder next to your .cat.yaml file. This is because the CAT project file contains this: Output: xlsx. Explore the generated file.

See Failed Test Details

Now, let’s experiment. Notice one of the tests checks for numbers in surnames. If any last name contains a number, the test should fail. So far it is passing, so let’s simulate we got wrong data. Open the XXX521260_Passengers.csv. On row three, change the surname from “Brown” to “Br0wn1”. Save the CSV files and re-run all the tests. Now we are interested not only in summary, but also in failed test details, so let’s run with -l Error or -loggingLevel Error:

Change the current directory:

cd "%USERPROFILE%\Documents\CAT\CatSampleProject\"

Now, run:

catcli run -l Error

OK, now the test failed. You should see details in both command line and in generated MS Excel file:

Failed test details

Notice the wrong data was found. Examine the entire output, including the details of the error message. For the failed test, you have all the details - the name and the description of the failed test, a sample of erroneous data and SQL statement(s) used to define the test. You immediatelly know what is wrong, where, and you have at your hands all the weapons you need to troubleshoot the problem in your data.

Summary and What Next

You now have CAT CLI installed. You created a sample project and are able to explore the tests it contains and details of a failed test. We did not explain much what the tests are actually about - we leave that to you as a home-work :-). Explore the structure of the CSVs and details of the tests (names, descriptions, SQL queries).

But there is one much more important thing:

Think about what everything can go wrong with your data. Did the yesterday’s pipeline run actually add any new data? Are there no new rows in your error-log table? Are all input files processed? Is your new Power BI meassure correct? CAT can automate these checks for you - you will have all your answers at hand any time with a single click of a button. Again and again.

The next step is: think of only one thing you’d like to check in your data.
Automate YOUR first test!

Alternatively, if you need more examples of tests, to get better idea what the tests may look like, create another sample project. There will be a new template named “AERO” (currently under construction) that contains many test examples. Explore that one to see what CAT can do.