Databricks

Can I test Databricks data?

Can CAT Automate Tests Against my Databricks Data?

Yes. You can test Databricks delta tables data. You can even easily generate metadata-driven tests based on infromation in Unity Catalog. See tutorial how to set everything up.

Quick summary, details are described in the tutorial:

  • you find suitable machine from where you will run tests (your development machine or any machine in your architecture from where it makes sense to runt tets)
  • instlal Simba driver
  • create ODBC entry
  • create CAT project and start testing

But I Want to Run Tests from Databricks Notebooks

It is possible. But we warned you - CAT is designed to be albe to verify the end users work with correct numbers. In our opinion it makes little running some tests from Databricks - tests such as

  • testing data in sources and reporting problems (e.g., data not ready)
  • incremental load correctness
  • all Power BI tests
  • any cross-system checks
  • and others.

Run the tests from some different place. Really. CAT is a whitness that all works as you designed it, end to end - it makes little sense to place it directly into Databricks.

But it is technically possible. The prerequisities are:

  • installed: .NET 8 runtime
  • and our pythonnet package.

This is possible to have on your cluster.

Libraries

Go to Libraries settings of your cluster.

Install justcatit package using pip. https://pypi.org/project/justcatit/

.NET runtime

Create InitCluster.sh in your workspace or repository and add the code below. You can also merge the code into you existing cluster initialization script, if you have any.

#!/bin/bash
echo 'Installation of .NET 8 runtime start.'
sudo apt-get update && \
  sudo apt-get install -y dotnet-sdk-8.0
echo 'Installation of .NET 8 runtime end.'

In cluster configuration, edit your cluster, go to Advanced and add init script (type Workspace).

Use CAT

If all of the above works for you, you can create a .cat.yaml project file in your workspace and run it using invoke_project function. All will work the same way as if you were using CAT from your local machine.