QuerySurge Technical Whitepaper No. 3
QuerySurge and Data Security
QuerySurge, as a bulk data testing/verification tool for ETL, Data Migration and related implementations, is regularly used with sensitive data. Typically, data cannot be masked, as the actual “conditions” found in the data must be handled by the ETL code, and therefore must be tested via QuerySurge; masking is therefore not a viable option. The typical QuerySurge deployment is behind the organization’s firewall, and this has generally been considered to provide sufficient security for the data QuerySurge processes. However, as more and more organizations contend with new security challenges, interest in deploying QuerySurge behind the firewall with additional security measures is growing. This whitepaper addresses these needs.
Basic QuerySurge Security
In this section, we describe the basic security features of QuerySurge, which follow standard security approaches. It is important to underscore that the security design of QuerySurge in its standard deployment assumes that QuerySurge is installed behind the organization’s firewall, and has the protection afforded by an industry-standard firewall and networking protections.
QuerySurge is a Web 2.0 application, with three major components (the QuerySurge Application Server, the QuerySurge database and the QuerySurge Agent). A fourth component, the QuerySurge command-line execution API is also part of the ecosystem. Therefore, there are multiple points of security focus:
- Browser-to-QuerySurge Application Server roundtrip communications
- QuerySurge Agent-to-QuerySurge Application Server roundtrip communications
- QuerySurge Database storage
- Command-line execution API communications
Figure 1. QuerySurge with Standard Level Security
Since QuerySurge manages both database/data source connections and actual transactional query data, there are two broad categories of data that may be deemed sensitive: a) the credentials that QuerySurge stores as well as b) the data that QuerySurge processes as part of data quality testing (ETL, data migration, operations monitoring, development, etc.). Both credentials and data are transmitted over the network and stored in the QuerySurge database. These points are depicted in Figure 1. Since all of this activity (network and database) is conducted behind the organization’s firewall, this standard level of security for QuerySurge is deemed sufficient in many cases.
QuerySurge and QuerySurge Application Credentials
QuerySurge application user passwords are stored in the QuerySurge database for authentication at user sign-on. QuerySurge application passwords are not stored in plain text in the database. Standard hashing procedures are used for this storage.
However, QuerySurge is a Web 2.0 application, and under standard deployment conditions, QuerySurge application credentials are part of the HTTP communications between the browser and the QuerySurge application server. Therefore, by default, QuerySurge application credentials are in plain text on the wire when passwords are created or modified. As noted above, the default security assumption is that QuerySurge has been deployed behind a firewall, so credentials are secure by virtue of firewall protection.
QuerySurge and Source/Target Credentials
As indicated, QuerySurge stores Source and Target credentials. Source and Target credentials are not stored as plain text in the QuerySurge database, but are encrypted using current standards (AES-256). In addition, Source and Target credentials travel both between QuerySurge and the browser (when they are input or edited) and between QuerySurge and the QuerySurge Agent (at execution time). In the case of the QuerySurge-to-browser loop, Source and Target credentials are in plain text on the wire when they are first entered or when they are modified. Otherwise, they are not sent. Between QuerySurge and the QuerySurge Agent, Source and Target credentials remain in encrypted format on the wire.
QuerySurge and Data
Much of the data that QuerySurge is used to test is likely to be sensitive and not amenable to masking, as noted previously. In the standard deployment, data travels on the wire between the QuerySurge Agent(s) and the QuerySurge application server, and between the Browser and the QuerySurge application server, in unencrypted format. Again, this deployment pattern assumes that the typical behind-the-firewall deployment provides sufficient security.
In addition, all data processed by QuerySurge as part of a test is stored in the QuerySurge database. The idea is to allow users to keep data for as long as necessary, in order to compare older executions with newer executions and newly modified results with their older counterparts. None of this data is stored in an encrypted format by default.
Higher Level Security
The three areas of security focus indicated above can all be covered by a higher level of security in the typical behind-the-firewall deployment of QuerySurge. This is done by enhancing the security level of each of these areas by the deployment of additional standard security technologies, as described below.
QuerySurge HTTP Communications
The QuerySurge Application Server communicates with Browsers/Users, the QuerySurge Agent and the QuerySurge command-line API component by HTTP. The solution for higher security for QuerySurge HTTP communications is straightforward – QuerySurge can take advantage of standard Web security technology by using HTTPS (HTTP via SSL). The primary purpose of utilizing HTTPS in a QuerySurge instance is to encrypt all data passing between the QS Application Server and the other HTTP-based QuerySurge components. HTTPS is managed by a certificate, typically obtained from a recognized Certificate Authority, which confirms the identity of the server. Well-known Certificate Authorities include companies such as Verisign and Network Solutions, however there are numerous other authorities in the space. The certificate authorities vet companies and their servers, and then provide certificates. Once certificates are deployed to application servers, the server can confirm its identity to incoming requests via the certificate. Your organization will have policies concerning how certificates are to be obtained, which authority should be used, etc.
Browser-to-QuerySurge Application Server Roundtrip Communications
Once the QuerySurge Application Server is configured with an SSL certificate obtained by your organization, all communications between the Browser and QuerySurge will run with the extra layer of security afforded by HTTPS. All communications on the wire between the Browser and the QuerySurge Application Server are encrypted.
QuerySurge Agent-to-QuerySurge Application Server Roundtrip Communications
Higher security QuerySurge Agent communications with the QuerySurge Application Server can be implemented in much the same way that Browser security can be increased. QuerySurge Agents can be configured to run via HTTPS, in a manner similar to the Browser. There are some minor differences in implementation: running a Browser via HTTPS mainly requires configuration at the server, while running a QuerySurge Agent under HTTPS requires a modest amount of configuration on each Agent installation in addition to the server configuration. As with the Browser, running the QuerySurge Agent under HTTPS encrypts all Agent communications and therefore secures the internal API that the Agent uses.
QuerySurge Execution API-to-QuerySurge Application Server Roundtrip Communications
The QuerySurge Execution API components (if deployed) require similar local configuration steps to those required for the QuerySurge Agent in order to run under HTTPS. As with the Browser, running the QuerySurge Execution API under HTTPS encrypts all API communications and therefore secures the API that the Execution API component uses.
QuerySurge Application Server-to-QuerySurge Database Roundtrip Communications
The QuerySurge App server connects to the QuerySurge MySQL database using standard JDBC connections. MySQL can be set up to support encrypted JDBC connections.
QuerySurge Database Storage
Database security presents a somewhat different set of issues than network communications present, as the database stores all QuerySurge data on disk. QuerySurge uses an "embedded" MySQL database, and in order to attain a higher level of security, the database files that back the tables need to be encrypted. Fortunately, there are a number of options available that can be used with MySQL to handle disk-level encryption. Following is a listing of tools that support MySQL disk encryption. Please note that this is not an exhaustive listing. Your organization may already have some of these tools available or similar tools in place that can be applied.
With the standard level of security, QuerySurge stores neither QuerySurge application passwords nor Source and Target credentials as plain text. In addition, Source and Target credentials are not sent in plain text on the wire to the QuerySurge Agent. These basic security measures assume a typical behind-the-firewall deployment.
Figure 2. QuerySurge with Higher Level Security
Where a higher level of security is desired even behind the firewall, the following additional approaches can be implemented:
- The QuerySurge application server can be configured to run under HTTPS, both with the Browser (for user access) and with the QuerySurge Agent(s), and the command-line execution API.
- The QuerySurge database server can be configured to run with disk encryption products, to ensure that data stored on disk by the database is fully encrypted.
Figure 2 shows both of these approaches, imposed on the QuerySurge architecture diagram. These security measures are independent of one another, and either of them can be implemented without the other. So, if the major security concern is about data on the wire, HTTPS can be implemented alone. Similarly, if the major concern is database disk storage, disk encryption can be implemented without HTTPS.
Finally, it is important to note that use of the enhanced security measures discussed in this whitepaper may result in performance issues which could be significant. It is advised that trials are run both with and without these enhanced security measures to insure that any performance effects are acceptable to the organization.
 Regarding the trip from the database/data source to the QuerySurge Agent, the data format is controlled by the JDBC driver.
 Certificates may also be generated by an organization. These certificates lack the trustworthiness of certificates from a recognized Certificate Authority.
 For details, see: dev.mysql.com/doc/refman/5.5/en/using-encrypted-connections.html.
 RTTS neither endorses nor recommends these products. This listing is for informational purposes only.
 Recent versions of Windows offer disk-level encryption. See, for example: windows.microsoft.com/en-us/windows/encrypt-decrypt-folder-file#1TC=windows-7
 Outside-the-firewall QuerySurge deployment is not discussed here, as the data security issues go well beyond the basic discussion in this whitepaper. Suffice it to say that such a deployment should only be considered after all the data security needs have been fully considered along with costs for enhanced security required for private data transmission on the public internet.