Databricks
Can I test Databricks data?
Can CAT Automate Tests Against my Databricks Data?
Yes. You can test Databricks delta tables data. You can even easily generate metadata-driven tests based on infromation in Unity Catalog. See tutorial how to set everything up.
We highly recommend running tests against your Databricks data outside the Databricks notebooks.
CAT’s goal is to check the data is correct, which requires doing (not only) cross-systems comparison checks. From that perspective, it makes much more sense running tests outside the Databricks workflow, from other place in your architecture.
Quick summary, details are described in the tutorial:
- you find suitable machine from where you will run tests (your development machine or any machine in your architecture from where it makes sense to runt tets)
- instlal Simba driver
- create ODBC entry
- create CAT project and start testing
But I Want to Run Tests from Databricks Notebooks
It is possible. But we warned you - CAT is designed to be albe to verify the end users work with correct numbers. In our opinion it makes little running some tests from Databricks - tests such as
- testing data in sources and reporting problems (e.g., data not ready)
- incremental load correctness
- all Power BI tests
- any cross-system checks
- and others.
Run the tests from some different place. Really. CAT is a whitness that all works as you designed it, end to end - it makes little sense to place it directly into Databricks.
But it is technically possible. The prerequisities are:
- installed: .NET 8 runtime
- and our pythonnet package.
This is possible to have on your cluster.
Libraries
Go to Libraries settings of your cluster.
Install justcatit package using pip. https://pypi.org/project/justcatit/
.NET runtime
Create InitCluster.sh in your workspace or repository and add the code below. You can also merge the code into you existing cluster initialization script, if you have any.
#!/bin/bash
echo 'Installation of .NET 8 runtime start.'
sudo apt-get update && \
sudo apt-get install -y dotnet-sdk-8.0
echo 'Installation of .NET 8 runtime end.'
In cluster configuration, edit your cluster, go to Advanced and add init script (type Workspace).
Use CAT
If all of the above works for you, you can create a .cat.yaml project file in your workspace and run it using invoke_project function. All will work the same way as if you were using CAT from your local machine.