Sets Match - Using Key

How and Why to Specify a Key for Comparison

Why Comparison Key?

When you use sets match expectation, you need to specify only the two queries and against what datasources they should be run. That’s it, nothing else. Both the sets need to be sorted (this is a job the database will do with the best performance).

CAT by default does the job and in fact, you don’t need to care. If you are happy with the error messages you get, this topic is of no interest for you.

Specifying a comparison key in fact provides only one additional benefit:

Without key, there is simply no way to recognize, whether the row is similar enough to be reported as different. When you don’t specify key, you only get report of

  • rows missing in the first set
  • rows missing in the second set

When you specify a comparison key, you get report of:

  • rows missing in the first set
  • rows missing in the second set
  • different rows.

Remember the Key is opitonal. It only gives you better error messages.

Where Do I Set the Comparison Key?

Just provide the Key in YAML file, in the database table, in the Excel sheet - whereever you store your tests. Example of a test in a project file:

Tests:
- Name: Compare staging table with fact table - Passengers
  Suite: Consistency checks
  First Data Source: DWH
  First Query: |
     SELECT  * FROM STAGE.PASSENGERS ORDER BY ID
  Second Data Source: DWH
  Second Query: |
     SELECT  * FROM FACT.PASSENGERS ORDER BY ID
  Expectation:  sets match
  Key: ID
  Maximum Errors Logged: 20

What the Difference Looks Like?

So OK, providing a key gives me better error message, but is it important?

It is up to you. Here you can see an error message without and with the key provided.

Key Was NOT Provided

β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚(1) β”‚(2) β”‚ID      β”‚PASSPORT_NUMBER β”‚AGE β”‚GENDER β”‚TICKET_PRICE β”‚TICKET_TYPE β”‚LUGGAGE_TYPE β”‚NATIONALITY β”‚FLIGHT_NUMBER β”‚SYS_INSERTED_DT      β”‚
β•žβ•β•β•β•β•ͺ════β•ͺ════════β•ͺ════════════════β•ͺ════β•ͺ═══════β•ͺ═════════════β•ͺ════════════β•ͺ═════════════β•ͺ════════════β•ͺ══════════════β•ͺ═════════════════════║
β”‚ !  β”‚    β”‚2185893 β”‚118463324       β”‚73  β”‚Female β”‚109          β”‚2           β”‚4            β”‚Slovak      β”‚XXX331759     β”‚8/15/2023 2:16:23 PM β”‚
β”‚    β”‚ !  β”‚2185893 β”‚128463324       β”‚17  β”‚Male   β”‚106          β”‚1           β”‚1            β”‚British     β”‚XXX331759     β”‚8/15/2023 2:16:23 PM β”‚
β”‚ !  β”‚    β”‚2453078 β”‚K27742062       β”‚42  β”‚Female β”‚110          β”‚3           β”‚4            β”‚Czech       β”‚XXX260037     β”‚8/15/2023 2:16:34 PM β”‚
β”‚    β”‚ !  β”‚2453078 β”‚K27742062       β”‚42  β”‚Female β”‚110          β”‚3           β”‚4            β”‚Czech       β”‚XXX677662     β”‚8/15/2023 2:16:34 PM β”‚
β”‚    β”‚ !  β”‚2888700 β”‚A27834187       β”‚35  β”‚Female β”‚110          β”‚3           β”‚4            β”‚Czech       β”‚XXX948757     β”‚8/15/2023 2:16:50 PM β”‚
β”‚    β”‚ !  β”‚3048201 β”‚P77272017       β”‚39  β”‚Female β”‚172          β”‚3           β”‚4            β”‚Slovak      β”‚XXX594553     β”‚8/15/2023 2:16:57 PM β”‚
β”‚ !  β”‚    β”‚3104872 β”‚K21988419       β”‚43  β”‚Female β”‚247          β”‚2           β”‚1            β”‚Austrian    β”‚XXX303521     β”‚8/15/2023 2:16:59 PM β”‚
β”‚ !  β”‚    β”‚3435893 β”‚705598821       β”‚57  β”‚Female β”‚140          β”‚3           β”‚3            β”‚Czech       β”‚XXX697294     β”‚8/15/2023 2:17:12 PM β”‚
β”‚    β”‚ !  β”‚3590747 β”‚920512139       β”‚44  β”‚Female β”‚64           β”‚3           β”‚1            β”‚Czech       β”‚XXX825049     β”‚8/15/2023 2:17:17 PM β”‚
β”‚ !  β”‚    β”‚4164729 β”‚521221195       β”‚37  β”‚Male   β”‚91           β”‚3           β”‚2            β”‚British     β”‚XXX732170     β”‚8/15/2023 2:17:39 PM β”‚
β”‚    β”‚ !  β”‚4164729 β”‚521221195       β”‚37  β”‚Male   β”‚91           β”‚3           β”‚3            β”‚British     β”‚XXX732170     β”‚8/15/2023 2:17:39 PM β”‚
β•žβ•β•β•β•β•ͺ════β•ͺ════════β•ͺ════════════════β•ͺ════β•ͺ═══════β•ͺ═════════════β•ͺ════════════β•ͺ═════════════β•ͺ════════════β•ͺ══════════════β•ͺ═════════════════════║
β”‚(1) β”‚(2) β”‚ID      β”‚PASSPORT_NUMBER β”‚AGE β”‚GENDER β”‚TICKET_PRICE β”‚TICKET_TYPE β”‚LUGGAGE_TYPE β”‚NATIONALITY β”‚FLIGHT_NUMBER β”‚SYS_INSERTED_DT      β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key WAS provided

β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚(1) β”‚(2) β”‚(ID)    β”‚PASSPORT_NUMBER β”‚AGE  β”‚GENDER   β”‚TICKET_PRICE β”‚TICKET_TYPE β”‚LUGGAGE_TYPE β”‚NATIONALITY β”‚FLIGHT_NUMBER β”‚
β•žβ•β•β•β•β•ͺ════β•ͺ════════β•ͺ════════════════β•ͺ═════β•ͺ═════════β•ͺ═════════════β•ͺ════════════β•ͺ═════════════β•ͺ════════════β•ͺ══════════════║
β”‚ <  β”‚    β”‚2185893 β”‚! 118463324     β”‚! 73 β”‚! Female β”‚! 109        β”‚! 2         β”‚! 4          β”‚! Slovak    β”‚XXX331759     β”‚
β”‚    β”‚ >  β”‚2185893 β”‚! 128463324     β”‚! 17 β”‚! Male   β”‚! 106        β”‚! 1         β”‚! 1          β”‚! British   β”‚XXX331759     β”‚
β”‚ <  β”‚    β”‚2453078 β”‚K27742062       β”‚42   β”‚Female   β”‚110          β”‚3           β”‚4            β”‚Czech       β”‚! XXX260037   β”‚
β”‚    β”‚ >  β”‚2453078 β”‚K27742062       β”‚42   β”‚Female   β”‚110          β”‚3           β”‚4            β”‚Czech       β”‚! XXX677662   β”‚
β”‚    β”‚ !  β”‚2888700 β”‚A27834187       β”‚35   β”‚Female   β”‚110          β”‚3           β”‚4            β”‚Czech       β”‚XXX948757     β”‚
β”‚    β”‚ !  β”‚3048201 β”‚P77272017       β”‚39   β”‚Female   β”‚172          β”‚3           β”‚4            β”‚Slovak      β”‚XXX594553     β”‚
β”‚ !  β”‚    β”‚3104872 β”‚K21988419       β”‚43   β”‚Female   β”‚247          β”‚2           β”‚1            β”‚Austrian    β”‚XXX303521     β”‚
β”‚ !  β”‚    β”‚3435893 β”‚705598821       β”‚57   β”‚Female   β”‚140          β”‚3           β”‚3            β”‚Czech       β”‚XXX697294     β”‚
β”‚    β”‚ !  β”‚3590747 β”‚920512139       β”‚44   β”‚Female   β”‚64           β”‚3           β”‚1            β”‚Czech       β”‚XXX825049     β”‚
β”‚ <  β”‚    β”‚4164729 β”‚521221195       β”‚37   β”‚Male     β”‚91           β”‚3           β”‚! 2          β”‚British     β”‚XXX732170     β”‚
β”‚    β”‚ >  β”‚4164729 β”‚521221195       β”‚37   β”‚Male     β”‚91           β”‚3           β”‚! 3          β”‚British     β”‚XXX732170     β”‚
β”‚ <  β”‚    β”‚4684428 β”‚124528923       β”‚36   β”‚Male     β”‚127          β”‚3           β”‚2            β”‚British     β”‚! XXX260038   β”‚
β”‚    β”‚ >  β”‚4684428 β”‚124528923       β”‚36   β”‚Male     β”‚127          β”‚3           β”‚2            β”‚British     β”‚! XXX260039   β”‚
β”‚ !  β”‚    β”‚4890294 β”‚225776255       β”‚5    β”‚Male     β”‚433          β”‚2           β”‚1            β”‚Czech       β”‚XXX889446     β”‚
β•žβ•β•β•β•β•ͺ════β•ͺ════════β•ͺ════════════════β•ͺ═════β•ͺ═════════β•ͺ═════════════β•ͺ════════════β•ͺ═════════════β•ͺ════════════β•ͺ══════════════║
β”‚(1) β”‚(2) β”‚(ID)    β”‚PASSPORT_NUMBER β”‚AGE  β”‚GENDER   β”‚TICKET_PRICE β”‚TICKET_TYPE β”‚LUGGAGE_TYPE β”‚NATIONALITY β”‚FLIGHT_NUMBER β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Remarks

Notice a few important things:

  • Key column is always first in the table and is enclosed in brackets.
  • When key is specified, only columns with difference are reported (notice the SYS_INSERTED_DT column is missing in the second example.
  • When key is specified, values with the difference have ! prefix.
  • Different rows (with the same key, but at least one different column) are marked as < (values from first set) and > (values from second set).
  • Because without the key specified rows are reported only as missing in first or missing in second, you get more errors with the key specified - even though the same Maximum Errors Logged setting was specified.