QuerySurge Technical Whitepaper No. 3

QuerySurge and Data Security

Introduction

QuerySurge, as a bulk data testing/verification tool for ETL, Data Migration and related implementations, is regularly used with sensitive data. Typically, data cannot be masked, as the actual “conditions” found in the data must be handled by the ETL code, and therefore must be tested via QuerySurge; masking is therefore not a viable option. The typical QuerySurge deployment is behind the organization’s firewall, and this has generally been considered to provide sufficient security for the data QuerySurge processes. However, as more and more organizations contend with new security challenges, interest in deploying QuerySurge behind the firewall with additional security measures is growing. This whitepaper addresses these needs.

Basic QuerySurge Security

In this section, we describe the basic security features of QuerySurge, which follow standard security approaches. It is important to underscore that the security design of QuerySurge in its standard or default deployment assumes that QuerySurge is installed behind the organization’s firewall, and has the protection afforded by an industry-standard firewall and networking protections. For organizations that require a higher security level, see the section Enhanced Security Options.

QuerySurge is a Web 2.0 application, with three major components (the QuerySurge Application Server, the QuerySurge database and the QuerySurge Agent). Additional components including the QuerySurge command-line execution API, the QuerySurge RESTful API, and QueryMerge utility, are also part of the ecosystem. Therefore, there are multiple points of security focus:

Browser-to-QuerySurge Application Server roundtrip communications
QuerySurge Agent-to-QuerySurge Application Server roundtrip communications
QuerySurge Database communications and storage
API communications

Figure 1. QuerySurge with Standard Level Security

Since QuerySurge manages both database/data source connections and actual transactional query data, there are two broad categories of data that may be deemed sensitive: a) the credentials that QuerySurge stores as well as b) the data that QuerySurge processes as part of data quality testing (ETL, data migration, operations monitoring, development, etc.). Both credentials and data are transmitted over the network and stored in the QuerySurge database. These points are depicted in Figure 1. Since all of this activity (network and database) is conducted behind the organization’s firewall, this standard level of security for QuerySurge is deemed sufficient in many cases.

QuerySurge and QuerySurge Application Credentials

QuerySurge application user passwords are stored in the QuerySurge database for authentication at user sign-on. QuerySurge application passwords are stored in the QuerySurge database using industry-standard methods.

However, QuerySurge is a Web 2.0 application, and under standard deployment conditions, QuerySurge application credentials are part of the HTTP communications between the browser and the QuerySurge application server. Therefore, by default, QuerySurge application credentials are in plain text on the wire when passwords are created or modified. As noted above, the default security assumption is that QuerySurge has been deployed behind a firewall, so credentials are secure by virtue of firewall protection.

QuerySurge and Source/Target Credentials

As indicated, QuerySurge stores Source and Target credentials. Source and Target credentials are stored in the QuerySurge database, using industry-standard encryption techniques. In addition, Source and Target credentials travel both between QuerySurge and the browser (when they are input or edited) and between QuerySurge and the QuerySurge Agent (at execution time). In the case of the QuerySurge-to-browser loop, Source and Target credentials are in plain text on the wire when they are first entered or when they are modified. Otherwise, they are not sent. Between QuerySurge and the QuerySurge Agent, Source and Target credentials are encrypted on the wire.

QuerySurge and Data

Much of the data that QuerySurge is used to test is likely to be sensitive and not amenable to masking, as noted previously. In the standard deployment, data travels on the wire between the QuerySurge Agent(s) and the QuerySurge application server, and between the Browser and the QuerySurge application server, in unencrypted format.[1] Again, this deployment pattern assumes that the typical behind-the-firewall deployment provides sufficient security.

In addition, all data processed by QuerySurge as part of a test is stored in the QuerySurge database. From an application perspective, the idea is to allow users to keep data for as long as necessary, in order to compare older executions with newer executions and newly modified results with their older counterparts. None of this data is stored in an encrypted format by default.

Enhanced Optional Security

The three areas of security focus indicated above can all be covered by a higher level of security in the typical behind-the-firewall deployment of QuerySurge. This is done by enhancing the security level of each of these areas by the deployment of additional standard security technologies, as described below.

QuerySurge HTTP Communications

The QuerySurge Application Server communicates with Browsers/Users, the QuerySurge Agent and the QuerySurge API components by HTTP. The solution for higher security for QuerySurge HTTP communications is straightforward – QuerySurge can take advantage of standard Web security technology by using HTTPS (HTTP via SSL/TLS). The primary purpose of utilizing HTTPS in a QuerySurge instance is to encrypt all data passing between the QS Application Server and the other HTTP-based QuerySurge components. HTTPS is managed by a certificate, typically obtained from a recognized Certificate Authority, which confirms the identity of the server.[2] Well-known Certificate Authorities include companies such as Verisign and Network Solutions, however there are numerous other authorities in the space. The certificate authorities vet companies and their servers, and then provide certificates. Once certificates are deployed to application servers, the server can confirm its identity to incoming requests via the certificate. Your organization will have policies concerning how certificates are to be obtained, which authority should be used, etc. QuerySurge supports TLSv1.2 (QuerySurge 6.3 - 10.0) and TLSv1.3 (QuerySurge 10.1 and higher).

Browser-to-QuerySurge Application Server Roundtrip Communications

Once the QuerySurge Application Server is configured with an SSL certificate obtained by your organization, all communications between the Browser and QuerySurge will run with the extra layer of security afforded by HTTPS. All communications on the wire between the Browser and the QuerySurge Application Server are encrypted.

QuerySurge Agent-to-QuerySurge Application Server Roundtrip Communications

Higher security QuerySurge Agent communications with the QuerySurge Application Server can be implemented in much the same way that Browser security can be increased. QuerySurge Agents can be configured to run via HTTPS, in a manner similar to the Browser. There are some minor differences in implementation: running a Browser via HTTPS mainly requires configuration at the server, while running a QuerySurge Agent under HTTPS requires a modest amount of configuration on each Agent installation in addition to the server configuration. As with the Browser, running the QuerySurge Agent under HTTPS encrypts all Agent communications and therefore secures the internal API that the Agent uses.

QuerySurge Execution API-to-QuerySurge Application Server Roundtrip Communications

The QuerySurge Execution API components (if deployed) require similar local configuration steps to those required for the QuerySurge Agent in order to run under HTTPS. As with the Browser, running the QuerySurge Execution API under HTTPS encrypts all API communications and therefore secures the API that the Execution API component uses.

QuerySurge Application Server-to-QuerySurge Database Roundtrip Communications

The QuerySurge App server connects to the QuerySurge database using standard JDBC connections. The database can be set up to support encrypted JDBC connections.[3]

QuerySurge Agent-to-QuerySurge Database Communications

The Agent connects directly to the QuerySurge database in one case only: the execution of Staging Queries. This is performed using a standard JDBC connection. As noted, the database can be set up to support encrypted JDBC connections.[3]

QuerySurge Database Storage

"Data-at-rest" security presents a different set of issues than "data-in-motion" presents. The QuerySurge database stores all QuerySurge test data on disk (subject to user configuration). In order to attain a higher level of security than the default, the database backing files may be encrypted. Fortunately, there are a number of products available that can be used to handle disk-level encryption.

Following is a listing of tools that support disk encryption for the QuerySurge database. Please note that this is not an exhaustive listing. Your organization may already have some of these tools available or similar tools in place that can be applied.[4]

Vormetric
http://www.vormetric.com/products/encryption/database-encryption
http://www.vormetric.com/data-security-solutions/data-protection/database-protection

Netlib (Windows[5])
http://www.netlib.com/mysql-encryption.asp

Gazzang (Linux)
http://www.gazzang.com/

Summary

With the standard level of security, QuerySurge stores QuerySurge application passwords and Source and Target credentials using industry-standard security methods. In addition, Source and Target credentials are encrypted on the wire to the QuerySurge Agent. These basic security measures assume a typical behind-the-firewall deployment.

Figure 2. QuerySurge with Higher Enhanced Security Options

Where an enhanced level of security is desired even behind the firewall,[6] the following additional approaches can be implemented:

The QuerySurge application server can be configured to run under HTTPS, both with the Browser (for user access) and with the QuerySurge Agent(s), and the APIs.
The QuerySurge database server can be configured to run with disk encryption products, to ensure that data stored on disk by the database is fully encrypted.
The QuerySurge database can be configured to run with SSL/TLS encrypted connections.
Many Source/Target database products support SSL/TLS on the wire.

Figure 2 shows all of these approaches, imposed on the QuerySurge architecture diagram. These security measures are independent of one another, and either of them can be implemented without the other. So, if the major security concern is about data on the wire, HTTPS/SSL/TLS can be implemented alone. Similarly, if the major concern is database disk storage, disk encryption can be implemented without wire encryption. In addition, it may be helpful to note that QuerySurge has a number of application-level security features, which are described in this article.

Finally, it is important to consider that use of the enhanced security measures discussed in this whitepaper may result in performance issues which could be significant. It is advised that comparison trials are run both with and without these enhanced security measures to insure that any performance effects are acceptable to the organization.

[1] Regarding the trip from the database/data source to the QuerySurge Agent, the data format is controlled by the JDBC driver.

[2] Certificates may also be generated by an organization. These certificates lack the trustworthiness of certificates from a recognized Certificate Authority.

[3] For an example, see this article.

[4] RTTS neither endorses nor recommends these products. This listing is for informational purposes only.

[5] Recent versions of Windows offer disk-level encryption. See, for example this article on BitLocker Drive Encryption.

[6] Outside-the-firewall QuerySurge deployment is not discussed here, as the data security issues go well beyond the basic discussion in this whitepaper. Suffice it to say that such a deployment should only be considered after all the data security needs have been fully considered along with costs for enhanced security required for private data transmission on the public internet.

Articles in this section