QuerySurge offers two options for accessing delimited Flat Files for querying and analysis as part of a QuerySurge Big Data or Data Warehouse data testing implementation: querying via a Flat File JDBC Driver or querying via Apache Drill. Which is best?
The answer, of course, depends on your needs - each of these options has its advantages. The purpose of this article is to indicate the strengths of each option and its weaknesses - so that you can make an informed decision about the best route to pursue.
File Support Comparison
|File Support||Flat File JDBC Driver||Apache Drill|
|Delimited File (single-character delimiter)||✓||✓|
|Delimited File (multi-character delimiter)||✓||-|
|Delimited File (Unicode delimiters)*||✓||-|
*Apache Drill only accepts delimiters in the ASCII range (0-127)
|Option||Flat File JDBC Driver||Apache Drill|
|Support Internal File Headers||✓||✓|
|Impose Auto-generated Headers||✓||✓|
|Impose Custom Headers||✓||✓|
|Support Non-Ascii File Encodings||✓||-|
|Impose Data Typing||✓||-|
|Support Compressed Files*||✓||✓|
|Support Joins between Files||✓||-|
|Support Custom Functions in SQL||✓||-|
|Support multiple file configurations per connection||-||✓|
*Compression options differ between drivers
Files with random data were generated and run with both the Flat File JDBC driver and Apache Drill, where drill ran on an external 16 GB Windows system on which the QuerySurge Agent also ran to establish a performance comparison.
Files were created with random data, using a variable number of rows to generate a specific file size. (See the Resources section below for full details of these executions.)
Results are as follows:
Conclusions from this comparison are straightforward:
- For ASCII encoded, delimited Flat Files, Apache Drill performs significantly better for larger files - by about a factor of about 3.5x for files in the multi-GB range.
- For ASCII encoded fixed-width Flat Files, the Flat File JDBC driver is the option.
- For Flat Files that require more highly customized options, including multi-character delimiters, non-ASCII file encodings, data typing, file joins, the Flat File JDBC driver is the option.
- If a custom SQL function is required as part of the QuerySurge execution, the Flat File JDBC driver is the option.
QuerySurge (5.1.14) was installed on a dedicated server. Apache Drill (v.1.11) was deployed on a desktop along with the QuerySurge Agent used. Hardware specs for these systems were:
|Apache Drill (v.1.11)||QuerySurge (5.1.14)|
|OS||Windows 7 Pro||Windows Server 2012 Standard|
|CPU Type||Intel i7-4770||Intel Xeon E5-2637 v4|
|CPU Speed||3.4 GHz||3.5 GHz|
|RAM||16 GB||64 GB|
|DB Data Disk||HDD||SSD|
|DB Temp Disk||-||SSD|
Files all contained 20 columns with the first column character text and the remaining columns integer digits. Sizes were as follows:
|File A||File B||File C||File D|
|File Size||9.92 MB||1.04 GB||5.21 GB||9.54 GB|