Introduction

You expect the results of two queries are the same.

The sets match is by far the most used and most useful type of expectation in CAT. In fact, you can rewrite any other expectation to sets match. But that does not mean you should - other types of expectation add clarity and readability to your tests, and that is crucial for their maintenance.

Example

You may want to verify, that all customers from your CRM system are also present in your analytical system:

Tests:
- Name: All customers are loaded
  Suite: smoke tests
  First Data Source: CRM
  First Query: |
    SELECT  CustomerID
    FROM    Contacts.Customer
    ORDER BY Customer_ID
  Second Data Source: DWH
  Second Query: |
    SELECT  CustomerID
    FROM    Dimension.Customer
    ORDER BY CustomerID
  Expectation: sets match

Results

If data returned by both queries are the same, there are no missing and no additional rows, the result is Passed.

If there is at least one difference, the result is Failed.

If the underlying provider of any of the queries throws an exception, the result is Error.

Remarks

Both queries must return sets with the same structure. Column names and data types may differ. Number of columns must be the same, otherwiset the test will immediatly end with Failed result.

CAT iterates simultaneously through the first set and through second set. The iteration goes row by row, that means CAT will NOT read the entire sets to RAM and the comparison is very efficent. You can safely iterate very large query results without worying about the machine resources.

🐱

The order of the data in both sets IS important. This is because of performance. The underlying providers of data usually can sort the data in an efficient way (use ORDER BY clause) and CAT can then perform the comparison very quickly.

If the sencod set runs out of rows sooner than the first, or viceversa, test result is Failed.

CAT compares both rows column by column for every row. Order of columns IS important, names of columns are NOT important, data types are NOT important.

When comparing two values, CAT uses .NET representation of the values (exactly the same what underlying provider returned).

CAT first tries to compare if the values are NULL or not - if one is not and the other is, result is Failed. If both are NULL, all is OK and the evaluation contiunes to the next pair of values.

CAT then tries compare the values as they are, as their string representations, as decimal numbers (if both can be converted to them), as datetimes (if both can be converted to them). Whenever the values are clearly different, CAT marks the test as Failed, otherwise continues.

Case is ignored. Some Data is the same as some data.

When CAT iterates through both sets and does not find any differences, the test result is Passed.

Introduction

Example #

Results #

Remarks #

Example

Results

Remarks