IN THIS SECTION
Duplicate Data
Your data may have duplicates in it - rows that are identical in Source or in Target. To be clear, this does not mean rows that are identical aside from any key columns. It means rows that have no keys and are otherwise identical in their data content.
QuerySurge can analyze data containing duplicate rows on either Source or Target or both, however there are some important notes concerning this option, in terms of both configuration and QuerySurge performance.
QuerySurge supports all of these options.
Note: Using this QueryPair Property does not create a test that checks for duplicate rows. It executes your QueryPair on the assumption that it has duplicate rows on either Source or Target. If you need to create a test that checks for duplicate rows, see this Knowledge Base article.
Configuring QuerySurge QueryPairs for Duplicate Data
By default, QuerySurge assumes that you have no duplicate rows returning in your QueryPairs. This is because:
- It is a best practice to employ a key column for all of your QueryPairs - QuerySurge can perform the strongest comparison when you use keys.
- QuerySurge must perform additional tasks when checking for duplicates, which can affect performance significantly.
If you want to configure any QueryPair for duplicate checking, all you need to do is open the Properties tab for the QueryPair.
Then, open the Duplicate Row Options panel, and choose Enable Analysis Support for Duplicates. Save your configuration changes.
Alternately, you can set the configuration for an entire folder by right-clicking on the folder and selecting Bulk Duplicate Row Check Update… on the menu. This will set the behavior for all QueryPairs in the selected folder.
It is important to note that this option does not test whether your data has duplicates in it - it assumes that you have duplicate rows in either Source or Target, and alters the comparison algorithm to handle this situation.
Important Considerations
As noted above, asking QuerySurge to Check for Duplicates significantly increases the workload on the QuerySurge server. Because of this, QuerySurge should be used to check for duplicates sparingly, and only when you are sure that your hardware will respond appropriately.
Following are important considerations that you should follow before selecting the Check for Duplicates option for QueryPairs:
Note: Choosing the "Check for Duplicates" option can seriously affect QuerySurge performance even if your data has no duplicates. For this reason, we recommend that you use it carefully.
Note: Check out the load that "Check for Duplicates" places on your system with Design Time Runs prior to using this option for Scenario execution.
While QuerySurge does let you select "Check for Duplicates" for an entire folder, use this feature with due caution for the reasons noted above.
Comments
0 comments
Please sign in to leave a comment.