Sets Match - Error Message
This should help you to understand the structure of an error message for sets match expectation.
The sets match
is one of the most used expectations. When such test fails, the message contains:
- summary (do the sets match or not, how many errors were found)
- sample of erroneous rows (with values, can be turned off)
- info whether the rows in the sample are complete or incomplete
Summary
Example:
The sets do NOT match. 5 differences found.
Please beware the number of found differences depends on whether you do or don’t specify a Key column.
Sample of erroneous rows
When CAT finds a difference between rows, it stops the evaluation of the test. This is the default behavior, you can influence it using MaximumErrorsLogged
property of a test, as described below.
Anyway, by default CAT stops the evaluatioin when the very first difference between rows is found and that row goes to the message:
┌────┬────┬────────┬────────────────┬─────┬─────────┬─────────────┬────────────┬─────────────┐
│(1) │(2) │ID │PASSPORT_NUMBER │AGE │GENDER │TICKET_PRICE │TICKET_TYPE │LUGGAGE_TYPE │
╞════╪════╪════════╪════════════════╪═════╪═════════╪═════════════╪════════════╪═════════════╡
│ < │ │2185893 │! 118463324 │! 73 │! Female │! 109 │! 2 │4 │
│ │ > │2185893 │! 128463324 │! 17 │! Male │! 106 │! 1 │4 │
╞════╪════╪════════╪════════════════╪═════╪═════════╪═════════════╪════════════╪═════════════╡
│(1) │(2) │ID │PASSPORT_NUMBER │AGE │GENDER │TICKET_PRICE │TICKET_TYPE │LUGGAGE_TYPE │
└────┴────┴────────┴────────────────┴─────┴─────────┴─────────────┴────────────┴─────────────┘
Maximum errors logged
The default behavior aims at best performance. But for troubleshooting you might want to see bigger sample of erroneous rows, not just the first difference. Or for compliance, you might want to have NO data in the messages/logs. This is where Maximum errors logged
property comes in handy. It is a property of a test. The same way you define Name
, First query
, Second query
etc., add Maximum errors logged: 10
. This influences what you see in the “sample of erroneous rows” section of the message. Maximum errors logged
can be set to a value between 0 and 50.
If MaximumErrorsLogged
is 0, you get this message instead of the table with data:
MaximumErrorsLogged is set to 0. Data related to the found problem is intentionally missing here.
Raise the MaximumErrorsLogged to 1 or higher, if security and compliance rules allow you to have data in logs.
Otherwise, you get a table with rows that caused the test to fail:
┌────┬────┬────────┬────────────────┬────┬───────┬─────────────┬────────────┬─────────────┬────────────┬──────────────┬─────────────────────┐
│(1) │(2) │ID │PASSPORT_NUMBER │AGE │GENDER │TICKET_PRICE │TICKET_TYPE │LUGGAGE_TYPE │NATIONALITY │FLIGHT_NUMBER │SYS_INSERTED_DT │
╞════╪════╪════════╪════════════════╪════╪═══════╪═════════════╪════════════╪═════════════╪════════════╪══════════════╪═════════════════════╡
│ ! │ │125156 │044468502 │17 │Female │93 │3 │1 │Pole │XXX480481 │8/15/2023 2:15:03 PM │
│ ! │ │662636 │I49555994 │36 │Female │176 │3 │1 │Other │XXX830110 │8/15/2023 2:15:24 PM │
│ │ ! │662636 │I49555994 │36 │Male │175 │3 │1 │Other │XXX830110 │8/15/2023 2:15:24 PM │
│ ! │ │1779187 │A16052548 │24 │Female │78 │3 │2 │Czech │XXX513423 │8/15/2023 2:16:08 PM │
│ │ ! │1868884 │740629805 │60 │Male │99 │3 │1 │German │XXX823782 │8/15/2023 2:16:11 PM │
╞════╪════╪════════╪════════════════╪════╪═══════╪═════════════╪════════════╪═════════════╪════════════╪══════════════╪═════════════════════╡
│(1) │(2) │ID │PASSPORT_NUMBER │AGE │GENDER │TICKET_PRICE │TICKET_TYPE │LUGGAGE_TYPE │NATIONALITY │FLIGHT_NUMBER │SYS_INSERTED_DT │
└────┴────┴────────┴────────────────┴────┴───────┴─────────────┴────────────┴─────────────┴────────────┴──────────────┴─────────────────────┘
The first row contains names of the columns from the first query. The last row contains names of the columns from the second query (the names of columns may differ, CAT expects you have the same order of columns in both queries, it does no mapping of columns).
An exclamation mark in the column (1)
means the row was found in the first set but was NOT found in the second set.
An exclamation mark in the column (2)
means the row was found in the second set but was NOT found in the first set.
It is highly recommended to specify a Key for comparison. When you don’t, you may under some circumstances also se question marks in columns (1)
and (2)
. CAT is designed to find first up to 50 differences between the two sets. The algorithm to compare the sets without the key works like this:
-
CAT expects both sets ordered.
-
It reads row by row on each side. If the row is the same on both sides, it continues.
-
Once CAT spots the first difference, it tries to find the same row on the other side. It reads up to 50 rows on each side, trying to find a match.
-
If a match is found within 50 rows, the comparison continues.
-
If not, it is not clear, whether “same” row will appear on the other side later. CAT found first 50 differences, but cannot tell for sure the row will not “come later”. Thus, it marks the rows from the buffer with the question marks. Because it is simply not sure the row is “missing”.
It is clear that the comparison is better, when you specify the key column.
The table is a mess!
When your queries return lots of columns, you may face a problem: in narrow console windows or in Azure DevOps test details, the table layout may become distorted or misaligned due to limited horizontal space, causing columns and rows to appear scattered.
Here are a few tips to tackle that:
-
In queries, skip columns you don’t need to compare
-
Use
Key
for comparison (se below) - the sample will then contain only columns with at least one difference -
Maximize your console window
-
Examine the results in the log files instead in the console (CAT by default logs to
Logs
directory in the same directory where you have your .cat.yaml file.) -
Copy-paste and examine the table in your favorite text editor.
By default, CAT can only recognize rows missing in the first set or missing in the second set. If you want CAT to report also different rows, you need to provide a Key.
Complete or Incomplete
CAT does NOT compare the sets completelly by default. This is managed using Maximum Errors Logged setting.
Based on that, you can either get
The scan was not complete. There may be also other errors. Raise MaximumErrorsLogged setting if necessary.
This means, CAT did NOT compared the sets till the end, because it already found maximum errors it should have found.
Other messsage you may get is:
CAT scanned both sets completely - there are no other differences.
In this case, you can be sure there are no other differences (if the data did not change since the test was run). E.g., if you set Maximum errors logged
to 20 and there are only 3 rows that do not match, CAT scanned both sets till the very end, trying to find next difference.