Root Cause Analysis Reports in QuerySurge
Much of the reporting in QuerySurge offers users a look at the row-level metrics of test executions. The Root Cause Analysis reporting allows users to analyze their QueryPairs from a column-level perspective, as quicker way to isolate core data issues. Rather than focusing on the failing rows, this report informs users about QueryPair columns in which there was at least one mismatch. The report also provides a breakdown of the percentage of data failures that exist in each column, as well as the counts of null and unique values. This report can assist data architects, database administrators, and testers in tracking down the problematic columns and tables during ETL development.
There are three flavors of the Root Cause Analysis report (Summary, Detail, Failure). The Detail report starts with two summary-level tables: a listing of the the QueryPairs with the most failing columns, and a listing of the Source/Target column pairs with the greatest percentage of data failures. The body of the report shows a detailed analysis for each Source/Target column pair. It includes the data failures by count and percentage, as well as unique and null value counts and percentages. Lastly, any QueryPairs excluded from the analysis are listed in the final section of the report.
NOTE: Only QueryPairs that (a) have been configured with Shared Keys, and (b) which have no errors can be included in the Root Cause Analysis reporting.
The other two Root Cause Analysis reports (Summary and Failure) are derived from the Detail report. The Summary report provides the same information as the Detail report, minus the unique and null value metrics per column. The Failures report provides the same breadth of information as the Detail report, but includes only failing QueryPairs and Source/Target column pairs.
Top QueryPairs by Failing Column Count
Top QueryPairs by Failing Column Count summary table shows the QueryPairs in the Scenario with the greatest number of failing columns. This table will display up to ten QueryPairs, and is designed to give a summary-level view of your most problematic QueryPairs from the column perspective.
In the screenshot above, each of the three QueryPairs has five columns. The QueryPairs are shown in descending order of the number of failing columns.
Sample QP 2 has data failures in two of the five columns, which accounts for the 40% failure rate.
Sample QP 1 has data failures in only one of the columns, and thus has a smaller failure rate of 20%.
Sample QP 3 does not have any data failures in any columns.
Passing v. Failing Columns
A passing column is defined as any Source/Target column pair in which there were no data failures. With respect to that column, each row in the result set was a match between Source and Target. A failing column, therefore, is any Source/Target column in which there were one or more data failures.
The failure percentage is simply derived from the column-wise failure rate of your QueryPair. It compares the the number of failing columns to the total number of columns in the QueryPair as a percentage (the
Failure % column).
Top Columns by Data Failure Percentage
Top Columns by Data Failure Percentage summary table represents the column pairs in each of the QueryPairs in the Scenario with the greatest percentage of data failures. This table will display up to ten QueryPairs, and is designed to give a summary-level view of your most problematic columns (along with the associated QueryPair).
In the screenshot above, the columns with the greatest percentage of data failures (regardless of the QueryPair with which they are associated) are listed in descending order. The Source and Target columns are extracted from the respective Source and Target queries. The
Data Failure Count per Column and
Data Failure % per Column represent the number of rows for which a data failure occurred in that specific column.
Source and Target Column Pairs
Each row in the
Top Columns by Data Failure Percentage table represents a pair of columns (one from the Source and another from the Target) that were compared against one another during the test execution. Along with these column names, the table provides the associated QueryPair IDs and names for reference.
Data Failure Count per Column
Data Failure Count per Column represents the number of rows in the result set in which a data failure was identified in the specified column. Generally, for example, a relatively small number of data failures in a column may signal an anomaly that needs investigating, while a much larger number of data failures may indicate a more serious flaw in the ETL logic.
Data Failure Percentage per Column
The failure percentage is simply derived from the column-wise failure rate of your QueryPair. It compares the the number of failing columns to the total number of columns in the QueryPair as a percentage (see the column titled
Data Failure % per Column).
Column Analysis Summary
Column Analysis Summary table offers a broad range of information. It displays each Source/Target column pair, along with their
Non-Matching Rows, and
Data Failure Counts and
Percentages. Column pairs are grouped by their associated QueryPair, and QueryPairs are further grouped by their parent Test Suites.
In the screenshot above, the QueryPair (
Sample QP 2) contains five columns, listed in the order in which they appear in the Source and Target queries. Focusing on the Country with Fails and VAT Rate with Fails column pairs, users can see the Source and Target Row Counts (19), Source and Target Non-Matching Rows (0), Data Failure Counts (3 and 18, respectively), and Data Failure Percentages (15.79 and 94.74, respectively).
Detailed Column Analysis
Detailed Column Analysis table offers all of the information displayed in the
Column Analysis Summarytable, along with a breakdown of unique and null values in each column. For each column on the Source and Target side of the QueryPair, this table will provide row count, unique value count and percentage, and null value count and percentage figures. This table is only available in the Root Cause Analysis Detail and Root Cause Analysis Failure reports.
In the screenshot above, the QueryPair (
Sample QP 2) contains five columns, listed in the order in which they appear in the Source and Target queries. Focusing on the
Country with Fails and
VAT Rate with Fails column pairs, users can see the Source and Target Row Counts (19), Source and Target Non-Matching Rows (0), Data Failure Counts (3 and 18, respectively), and Data Failure Percentages (15.79 and 94.74, respectively).
In addition to the information provided in the
Column Analysis Summary table, the row count, unique count/percentage, and null count/percentage is broken down for each column in the Source and Target.
Excluded QueryPairs table lists any QueryPairs that were omitted from the report and the reason why they were omitted. QueryPairs can be excluded for two reasons:
- If a Shared Key Column Definition was not set in the Properties section of the QueryPair before execution, the QueryPair is omitted from the report.
- If the QueryPair returned an error during execution, the QueryPair is omitted from the report.
In the screenshot above,
Sample QP 4 w/o Shared Key was excluded because it was missing a Shared Key Column Definition.
Sample QP 5 w/ Syntax Error was excluded because it returned an error during execution.