Test Databricks

How to test data in Databricks Delta tables?

CAT allows you to automate tests against your data in Databricks delta tables.

Before you write your fist test against a delta table, you’ll need to install a driver and create an ODBC entry. You need to do it only once. Testing of delta tables relies on Odbc@1 provider.

Do I Need a Driver?

Yes. Download and install 64-bit Simba ODBC driver for Windows from this URL (if you don’t have it installed yet):

https://www.databricks.com/spark/odbc-drivers-download

The installation is simple “next, next, next…”.

Access Token

You’ll need an access token. Go to your Databricks workspace and go to User settings (you can find it in the upper right corner under you user name). Then in the left menu go to Developer, to Access Tokens and click the Manage button. Click the Generate new token button. Get the value of the token, you’ll need it. Keep the value secret.

How to Create the ODBC Entry?

We assume you have a running cluster and have access to it. Go to your cluster page to Advanced options. You’ll need the yellow-highlighted information for creation of the ODBC entry:

Databricks cluster Advanced Setting

Now, you can create a 64-bit user ODBC entry. Ensure you are creating a 64 bit entry (NOT 32 bit) and you are creating User DSN (NOT System DSN).

Simba ODBC settings

Use the correct values from your cluster page here.

Field Value
Data source name a friendly name for your data source - you’ll use this one in CAT project file.
Host(s) value from Server Hostname
Port value from Port
Database default or name of the database you want to connect to
Mechanism User Name and Password
User Name token (this is “hardcoded value”, just use the word “token”)
Password Token value you generated in Databricks

Then you need two more things - click on SSL Options.. button and check Enable SSL. Then click on HTTP Options… button and fill in value from HTTP Path:

HTTP Path in ODBC

Test the ODBC entry, it should work without problems.

Tests Databricks Delta Tables

Data Sources:
- Name: My Databricks Data
  Provider: Odbc@1
  Connection string: DSN=AERO_DWH

The example assumes you have a 64bit ODBC entry named AERO_DWH.

You can then start with your tests:

Tests:
- Name: Account table is not empty
  Suite: Smoke tests
  Data source: My Databricks Data
  Query: |
      SELECT  *
      FROM    dim.account
      LIMIT   10
  Expectation: set is not empty

Feel free to compare data in delta tables with other sources, such as SQL Server, ORACLE, PostgreSQL etc.

Generate Tests from Unity Catalog Data

You can be very efficient if you use generate test feature in CAT. Data in the Unity catalog may be great source for such tests. Leverage data in information_schema.tables and information_schema.columns for generating your tests. Remember you can use data in any format for generating your tests (MS Excel, relational databases, …) - not only Unity catalog data.