With the QuerySurge's built-in JDBC driver for Excel, data testers can use familiar SQL syntax to retrieve query data from a Microsoft Excel sheet and validate it against any QuerySurge-supported Source or Target.
In this document we discuss the approach for extracting data from a Microsoft Excel sheet. QuerySurge can extract data from both entire sheets or specific ranges within them. Data can also be extracted from joined excel sheet ranges either within the same sheet, or from different sheets, or from sheets in different Excel files.
Connection Setup
QuerySurge Excel connections can be set up in the QuerySurge Admin view, and be utilized in the same manner as any other Connection types (i.e. in QueryPairs and Staging Queries, and in Test Suite Connection overrides).
For details on setting up a Connection to Microsoft Excel, see the article Configuring Connections: Microsoft Excel via the QuerySurge JDBC Driver for Excel.
Querying Excel Sheets
The query syntax for QuerySurge Excel is based on H2 SQL and supports most of the H2 SQL grammar and its function calls, so SELECT-type queries against Excel follow a familiar form. The main difference is the FROM clause:SELECT * FROM [fileName].[sheetName$].[range]
- fileName (optional): A file within the Connection-specified directory path.
- sheetName: The sheet being queried. Case-sensitive.
- range (optional): The specific range within a sheet being queried. Note, the alpha characters specified for the range must be upper case. For example, [A1:D9] will work but [a1:d9] will generate an exception.
The fileName or sheetName should be enclosed in double-quotes if either starts with numeric characters. (i.e. ... FROM ["1fileName"].["2sheetName"$]). Note that only the name is enclosed in double quotes.
The following is a sample SQL statement against an example Excel sheet:
SELECT * FROM [financial_report.xlsx].[Sheet1$].[A1:D9]
When executed in QuerySurge, the following results are returned:
Note: Notice that eight rows are returned when nine were specified in the query's range. This is because the Excel Connection used has its Column Headers setting set to Header in first row. So while nine rows are extracted from the sheet, the first is used as header information.
Queries can include standard syntax like JOINs, GROUP BYs, single-valued functions, aggregate functions, aliases, and the like. Note that references to column names with spaces, special characters, and starting numerical characters must be surrounded by double quotes:
SELECT
rpt."City",
rpt."Quantity sold",
rpt."Pareto line",
rpt."Pareto 80%:"
FROM [financial_report.xlsx].[Sheet1$].[A1:D9] AS rpt;
Note: The QuerySurge JDBC driver for Excel does not support duplicate column names in your Excel sheet. Therefore, all your Excel column names must be unique in a sheet (or range, if specified).
Data Type Conversion
When an Excel Connection's Data Types setting is set to Use Excel Data Types, columns are returned with data types relative to their in-sheet formatting rules. Because these values must conform to the driver's underlying H2's data type formatting, values may not appear in the same format as in Excel. This setting allows the use of data type-specific transformation functions like SUM, TRIM, etc. See the H2 syntax documentation for more information.
Excel to H2 Format Conversion:
Excel Format String | Data type |
General | VARCHAR |
0 | INTEGER |
0.00 | DECIMAL |
#,##0 | INTEGER |
#,##0.00 | DECIMAL |
$#,##0_);($#,##0) | INTEGER |
$#,##0_);[Red]($#,##0) | INTEGER |
$#,##0.00);($#,##0.00) | DECIMAL |
$#,##0.00_);[Red]($#,##0.00) | DECIMAL |
0% | DECIMAL |
0.00% | DECIMAL |
0.00E+00 | DECIMAL |
# ?/? | DECIMAL |
# ??/?? | DECIMAL |
m/d/yy | TIMESTAMP |
d-mmm-yy | TIMESTAMP |
d-mmm | TIMESTAMP |
mmm-yy | TIMESTAMP |
h:mm AM/PM | TIMESTAMP |
h:mm:ss AM/PM | TIMESTAMP |
h:mm | TIMESTAMP |
h:mm:ss | TIMESTAMP |
m/d/yy h:mm | TIMESTAMP |
#,##0_);[Red](#,##0) | INTEGER |
#,##0.00_);(#,##0.00) | DECIMAL |
#,##0.00_);[Red](#,##0.00) | DECIMAL |
_(* #,##0_);_(* (#,##0);_(* \"-\"_);_(@_) | INTEGER |
_($* #,##0_);_($* (#,##0);_($* \"-\"_);_(@_) | INTEGER |
_(* #,##0.00_);_(* (#,##0.00);_(* \"-\"??_);_(@_) | DECIMAL |
_($* #,##0.00_);_($* (#,##0.00);_($* \"-\"??_);_(@_) | DECIMAL |
mm:ss | TIMESTAMP |
[h]:mm:ss | TIMESTAMP |
mm:ss.0 | TIMESTAMP |
##0.0E+0 | DECIMAL |
@ | VARCHAR |
Note: Columns with mismatching formats will return as the VARCHAR data type.
Empty Cells
There are cases when empty sheet cells might be queried, returning a result set with null cells, rows, or even entire ranges. If the specified range in a query exceeds a sheet's internal range (the range of available data), the excess cells and rows will come back as null. However, if the entire range in a query is outside a sheet's internal range, no rows will be returned.
Note: The QuerySurge JDBC driver for Excel does support merged cells, however only if Data in first row is selected under Data Settings > Column Headers on the Connection. If your Excel sheets contain merged cells, either specify Data in first Row or set ranges to exclude merged cells, or un-merge the cells.
Comments
0 comments
Article is closed for comments.