Test Databricks
How to test data in Databricks Delta tables?
CAT allows you to automate tests against your data in Databricks delta tables.
Before you write your fist test against a delta table, you’ll need to install a driver and create an ODBC entry. You need to do it only once. Testing of delta tables relies on Odbc@1 provider.
Do I Need a Driver?
Yes. Download and install 64-bit Simba ODBC driver for Windows from this URL (if you don’t have it installed yet):
https://www.databricks.com/spark/odbc-drivers-download
The installation is simple “next, next, next…”.
Access Token
You’ll need an access token. Go to your Databricks workspace and go to User settings (you can find it in the upper right corner under you user name). Then in the left menu go to Developer, to Access Tokens and click the Manage button. Click the Generate new token button. Get the value of the token, you’ll need it. Keep the value secret.
How to Create the ODBC Entry?
We assume you have a running cluster and have access to it. Go to your cluster page to Advanced options. You’ll need the yellow-highlighted information for creation of the ODBC entry:
Now, you can create a 64-bit user ODBC entry. Ensure you are creating a 64 bit entry (NOT 32 bit) and you are creating User DSN (NOT System DSN).
Use the correct values from your cluster page here.
Field | Value |
---|---|
Data source name | a friendly name for your data source - you’ll use this one in CAT project file. |
Host(s) | value from Server Hostname |
Port | value from Port |
Database | default or name of the database you want to connect to |
Mechanism | User Name and Password |
User Name | token (this is “hardcoded value”, just use the word “token”) |
Password | Token value you generated in Databricks |
Then you need two more things - click on SSL Options.. button and check Enable SSL. Then click on HTTP Options… button and fill in value from HTTP Path:
Test the ODBC entry, it should work without problems.
Tests Databricks Delta Tables
Data Sources:
- Name: My Databricks Data
Provider: Odbc@1
Connection string: DSN=AERO_DWH
The example assumes you have a 64bit ODBC entry named AERO_DWH
.
You can then start with your tests:
Tests:
- Name: Account table is not empty
Suite: Smoke tests
Data source: My Databricks Data
Query: |
SELECT *
FROM dim.account
LIMIT 10
Expectation: set is not empty
Feel free to compare data in delta tables with other sources, such as SQL Server, ORACLE, PostgreSQL etc.
Generate Tests from Unity Catalog Data
You can be very efficient if you use generate test feature in CAT. Data in the Unity catalog may be great source for such tests. Leverage data in information_schema.tables
and information_schema.columns
for generating your tests. Remember you can use data in any format for generating your tests (MS Excel, relational databases, …) - not only Unity catalog data.