Test examples

I want to see examples of automated tests against data.

No problem, here you are. Just beware the details are explained later in the documentation. The configuration format is simple and self-explanatory, why not have a look right now…

Uniqueness

This test verifies the uniqueness of your data:

- Name: Check there are no duplicities
  Test Suite: Uniqueness Data Quality Dimension
  Description: This test finds if a column contains duplicate values
  Data Source: CRM
  Query: |
    SELECT customer_code, COUNT(*)
    FROM contacts.customer
    GROUP BY customer_code
    HAVING COUNT(*) > 1;
  Expectation: set is empty

Completness

This tests checks the completeness of your data:

- Name: Check there are no null values
  Test Suite: Completeness Data Quality Dimension
  Description : This test finds if a column contains null values
  Data Source: CRM
  Query: |
    SELECT *
    FROM contacts.customer
    WHERE full_name IS NULL OR full_name = '';
  Expectation: set is empty

Accuracy

These tests check that your data are accurate according to your standards:

- Name: Status can contain only values as Open, Closed, Suspended
  Test Suite: Completeness Data Quality Dimension
  Description : This test finds if column status has wrong value
  Data Source: Your data source
  Query: |
      SELECT *
      FROM staging.payments
      WHERE status NOT IN ('Open', 'Closed', 'Suspended');
 Expectation: set is empty
- Name: The code must have 14 characters.
  Test Suite: Completeness Data Quality Dimension
  Description : >
    This test finds if your super important code
    hasn't got an icorrect length.
  Data Source: DWH
  Query: |
       SELECT DISTINCT OrderCode
       FROM   [Staging].[Order]
       WHERE LENGTH(OrderCode) <> 14;
  Expectation: set is empty

Consistency

This test checks the consistency of your data. We need to make sure that the same values are consistent across our datasets or datasources:

- Name: Same numbers between source file and database
  Test Suite: Consistency Data Quality Dimension
  Description: >
    This tests checks that SUM of salery is same
    between source excel file and  database
  First Data Source: DWH
  First Query: |
      SELECT SUM(payments_first) FROM staging.payments
  Second Data source: ExcelInput
  Second Query: |
      SELECT SUM(payments_january) FROM "Excel"."Payments"
  Expectation: Sets match

Timeliness

This test checks the timeliness of your data by monitoring the last processing day:

- Name: Update of table
  Test Suite: Timeliness Data Quality Dimension
  Description: >
    This is to check if the staging table, with
    the column LastProcessDay, was updated today
  Data Source: DWH
  Query: |
        SELECT TOP(5) *
        FROM Staging.Shipment
        WHERE LastProcessDay > CAST(GETDATE() AS DATE);
  Expectation: set is not empty

The tests are simple, straightforward and take seconds to automate. You have a full power of raw SQL or DAX to express your expectations about your data. You can compare data across more different systems (such as MS Excel with ORACLE, SQL server with Power BI semantic model). As you’ll learn later in our tutorial, you can execute all (or part of) your tests with a single command or press of a button.