By default, QueryPairs execute with an exact-match algorithm applied to every column in the Source and Target queries. And while users have the ability to apply certain thresholds to affect the outcome of the cell-by-cell comparison, until now there has been no way in which they could choose to ignore a particular Source and Target column entirely. By specifying one or more ignorable columns, a QueryPair will pass even if data failures exist within these columns.
Specifying an Ignorable Column
To identify one or more columns to be ignored during analysis, open the QueryPair in question and navigate to the Properties tab. Expand the Ignorable Columns fieldset.
The Ignorable Column Origin indicates whether the column name is being parsed from the Source of Target query. Whichever origin you choose, the column in the equivalent ordinal position in the opposing query will be ignored, as well.
Click on the gear icon in the toolbar to automatically parse and add every column from your selected origin query.
Alternatively, click on the plus icon in the toolbar to manually add column names one-by-one.
Use the toggle buttons associated with each added column in the grid to ignore it during analysis.
Use the minus icon next to each added column in the grid to remove it. Alternatively, use the X icon in the toolbar to remove all columns.
As with any modification to a QueryPair property, click the Save button to persist your changes.
There are a few restrictions that apply to Ignorable Columns, primarily surrounding the concurrent use of shared key column definitions, column threshold options, and ignorable columns.
The restrictions are simple.
- If a column is configured as a shared key, it cannot be ignored during analysis.
- If a column has a threshold applied to it, it cannot be ignored during analysis.
A Worked Example
First, let's run a QueryPair with known failures in the address_type column. We'll do so without setting up any ignorable columns. This means you'll need to remove any columns you added, and set the Ignorable Column Origin to None.
This means all failures will be taken in account and will affect the overall result of the execution.
Execute the QueryPair
Navigate to the Design-Time Run tab and click the
After the execution completes, we can see that the comparison has failed. This is expected, since there were known data failures in the address_type column.
Click the View Query Results button and navigate to the Failures tab to view the data failures in detail.
Note the differences in the address_type column. The Source query returned SHIPPING, while the Target query returned SHIP. If these differences are known and acceptable within the context of your testing effort, you can mark this column to be ignored during analysis.
Setup an Ignorable Column
Navigate to the Properties tab of the QueryPair and expand the Ignorable Columns fieldset. Select the From Source Query radio button, and add the address_type column to the grid. Make sure you toggle the column to on so that it is ignored.
Remember to click the Save button to save your changes.
Re-execute the QueryPair
Now that you've added an ignorable column, navigate back to the Design-Time Run tab and click the Run Again button. When the QueryPair execution completes, the failures previously observed will have been ignored and the QueryPair passes.
Click the View Query Results button and navigate to either the Source or Target tab.
Note the information icon next to the address_type column, which indicates that this column was ignored during analysis. You'll also notice that the Failures tab is empty.
Data Intelligence Reporting
The addition of the Ignorable Column feature within a QueryPair carries over to several reports in the Data Intelligence Reporting section. Specifically, the following reports now make explicit use of the new setting.
- Root Cause Analysis Summary
- Root Cause Analysis Detail
- Root Cause Analysis Failures