QuerySurge comes bundled with a Flat File connection option in the Connection Wizard, which will use sensible defaults for many scenarios. However, your Flat File may have certain characteristics that require additional handling not available in the Connection Wizard settings. In this case, you can create a generic connection using QuerySurge's Connection Extensibility feature. This approach allows you to take advantage of the power of the underlying connection URL to explicitly set the driver properties. There are some additional steps required to configure this connection:
- Create a schema file
- Create a JDBC connection using the proper Connection URL
- Create a QueryPair with a query that uses the file name as the table name
Schema XML File
A schema file is an XML document with a root tag of <schema> and (ordinarily) a single child tag of <table>. For joining multiple tables see Flat File SQL Joins.
The schema file is intended to specify data types and local properties for your CSV/text files. By default, it should be located in the same directory where CSV/text files are contained. To specify another path use the driver property "schema".
Delimited Flat File
In the example below, the schema file contains a table definition for a pipe delimited file with three columns.
<schema>
<table name = "MyFlatFile.txt" separator = "|" suppressHeaders = "false">
<column name = "ORDER_ID" pos = "1" type = "Integer"/>
<column name = "STATUS" pos = "2" type = "Varchar" size = "64"/>
<column name = "TOTAL" pos = "3" type = "Decimal"/>
</table>
</schema>
Note: The pos attribute defines the numeric position of the column within the flatfile.
Fixed-Width Flat File
In the example below, the schema file contains a table definition for a fixed-width file with three columns. The begin and end attributes define the initial and final positions for each of the columns.
<schema>
<table name = "MyFlatFileFixedWidth.prn" separator = "fixed" suppressHeaders = "false">
<column name = "ORDER_ID" begin = "1" end = "8" type = "Integer"/>
<column name = "STATUS" begin = "9" end = "16" type = "Varchar" size="32"/>
<column name = "TOTAL" begin = "17" end = "24" type = "Decimal"/>
</table>
</schema>
File Mask (Wildcards/Regular Expression)
If you have multiple Flat Files with the same format, you can implement a "file mask" to match all of those files with a single table mapping.
In the example below, we use the * wildcard character, which matches zero or more of any character. This schema can be applied to any files that have the ".txt" extension, and follow the defined column and data type layout.
<schema>
<table name = "*.txt" separator = "|" suppressHeaders = "false">
<column name = "ORDER_ID" pos = "1" type = "Integer"/>
<column name = "STATUS" pos = "2" type = "Varchar" size = "64"/>
<column name = "TOTAL" pos = "3" type = "Decimal"/>
</table>
</schema>
We can also use the ? character to define any single character. For example, we can match file names like "MyFlatFile0001.txt", "MyFlatFile0002.txt", or "MyFlatFile0003.txt" using the ? character.
<schema>
<table name = "MyFlatFile????.txt" separator = "|" suppressHeaders = "false">
<column name = "ORDER_ID" pos = "1" type = "Integer"/>
<column name = "STATUS" pos = "2" type = "Varchar" size = "64"/>
<column name = "TOTAL" pos = "3" type = "Decimal"/>
</table>
</schema>
The driver also allows the use of regular expressions to define a table name, by specifying "regex:" and then the desired expression. For example, we use a regular expression that will match files like "MyFlatFile2009.txt", "MyFlatFile2015.txt", "MyFlatFile2016.txt", etc.
<schema>
<table name = "regex:MyFlatFile_20[0-9][0-9]" separator = "|" suppressHeaders = "false">
<column name = "ORDER_ID" pos = "1" type = "Integer"/>
<column name = "STATUS" pos = "2" type = "Varchar" size = "64"/>
<column name = "TOTAL" pos = "3" type = "Decimal"/>
</table>
</schema>
Creating the Custom Connection
When creating your custom Connection, you will not utilize the built-in Flat File connection type in the Connection Wizard. Instead, use the Connection Wizard's Connection Extensibility feature. To do this, select the "All Other JDBC Connections (Connection Extensibility)" option from the drop-down menu.
For your Driver Class, enter jstels.jdbc.csv.CsvDriver2. If you did not select the Flat File drivers during your initial installation, you'll need to add them before continuing. You can do this with the QuerySurge installer. Make sure to use the same version installer as the version of your deployed QuerySurge.
Next, you'll need to enter your Connection URL. The URL requires the paths to schema and data files. A basic URL template is:
jdbc:jstels:csv:<PathToFlatFileFolder>?dbInMemory=false&commentLine=--&separator=|&suppressHeaders=false&fileExtension=.txt&schema=<PathToSchemaFile>&dateFormat=yyyy-MM-dd HH:mm:ss.SSS | yyyy-MM-dd HH:mm:ss | yyyy-MM-dd | HH:mm:ss.SSS | HH:mm:ss&escapeEOLInQuotes=false
To use this URL, the <PathToFlatFileFolder> and <PathToSchemaFile> need to be replaced by the actual paths to the data file folder and to the schema file.
Note: <PathToFlatFileFolder> is a path to a directory. <PathToSchemaFile> is the full path to the schema file, including the schema file name.
Note: Even when you define a separator in your schema file, you must include the schema value in your URL.
Common JDBC URL Properties
<PathtoFlatFileFolder> should be replaced with the actual path to the folder NOT including the file name. <PathToSchemaFile> should be replace with the actual path to the schema file including the file name.
commentLine specifies a string value used for comment lines. Multiple values can be added by using a "|" to separate them, e.g.: // | #. Default value is not defined.
separator defines the separator for delimited flat files. For delimited files (often termed CSV files) files you should set the property value to the delimiter. With Fixed Width flat files this should be set to "fixed" (the layout of a Fixed Width file is specified in the schema.xml; see above for an example).
suppressHeaders specifies whether or not the first row in the file contains column name headers. A value of false indicates that the first line contains headers (default is false).
fileExtension is used to set a file extension for the connection, if it is not specified explicitly in a file name (see below).
dateFormat is used to specify a format for date/time values. A value of this property is a sequence of date/time formats separated by the '|' character, e.g: "dd.MM.yy | dd.MM | dd". (Default is "yyyyMMddHH:mm:ss.SSS | yyyyMMddHH:mm:ss | yyyyMMdd | HH:mm:ss.SSS | HH:mm:ss").
escapeEOLInQuotes is used to escape row delimiters within double quotes.
trimBlanks is used to trim leading and trailing spaces for string values (default is true).
QueryPair Syntax
Once your schema file and Connection have been set up you are ready to create a QueryPair. The SQL statement should reference the file name as the table name. If the file extension has been defined in your schema or Connection URL then you can simply use the file name. Table names with any non-alphanumeric characters (i.e. special characters, including the period before a file extension) must be enclosed in double quotes.
Information abstracted from StelsCSV JDBC Driver v6.0 Documentation
Resources
Comments
0 comments
Please sign in to leave a comment.