Apache Cassandra, a prominent NOSQL database, is supported by the DataStax driver stack. While DataStax provides a Java driver (among other language bindings), the driver is not a JDBC driver. QuerySurge (similar to other data tools) requires a JDBC driver to connect to Cassandra, and the need for a JDBC-compliant driver has been filled by the open source Wu wrapper around the DataStax Java driver (available on GitHub). Note that this wrapper is released under the Apache 2.0 license. Setup details for using this Cassandra JDBC wrapper for the DataStax Java Driver with QuerySurge follow.
Note: The documentation for this Cassandra JDBC wrapper contains the following advice: "You may find this [driver] helpful if you came from RDBMS world and hoping to get your hands on Apache Cassandra right away. Having said that, it is NOT recommended to use this for production but development and research. You should use DataStax Java Driver, Spark and maybe Presto if you want to do something serious."
Note: Cassandra uses Cassandra Query Language (CQL), a variant of SQL. For more information, please see the DataStax CQL documentation.
Setting up a QuerySurge Connection with the Cassandra JDBC Wu DataStax Wrapper
A QuerySurge Connection to Cassandra is set up using the Connection Extensibility feature of the QuerySurge Connection Wizard. Following are the details you'll need to set up your QuerySurge Connection to Cassandra with the "shaded" DataStax wrapper.
- Download the "shaded" DataStax JDBC wapper. You can download the shaded driver wrapper with all dependencies here (there are multiple versions available for download).
- Deploy the Cassandra JDBC driver to your Agent(s). The procedure for deploying a new driver to a QuerySurge Agent is here (for Agents on Windows) and here (for Agents on Linux).
- Log into QuerySurge as a QuerySurge Admin user, and navigate to the Admin view. Steps for using the Connection Extensibility feature can be found here. To use the Connection Extensibility option in the Connection Wizard with the DataStax wrapper, you'll need the following information:
You'll need to provide the server and the port number. For the server, use either the fully qualified name of the server or the server IP address. Verify the port with a Cassandra admin or knowledgeable resource; port 9042 is the default Cassandra port. In addition, you should provide the keyspace that you want to connect to. Authentication credentials are typically needed as well. When you've entered your information, the Connection Wizard will look like:
Note: You may want or need a more elaborate JDBC URL; see the GitHub documentation for details.
- If you have a CQL Test Query, feel free to enter it in order to verify that your Connection parameters are correct. It should be a standard query that returns a small amount of information - one row is enough. You can use the Test Connection button if you've entered a Test Query:
- Once you have the information entered and (optionally) verified, click the Save button to save the Connection. You're ready to use the Connection in a QueryPair.
Cassandra JDBC Wu DataStax Wrapper Driver Configuration
The Wu DataStax wrapper JDBC driver limits the maximum records returned to 10K rows by default. There are two ways to modify this limit (additional details can be found here).
- Use "Magic Comments" feature to set the
no_limitproperty as part of the query:
This will need to be done for each QueryPair that needs to avoid the default 10K record limit.
- Change the
rowLimitparameter in the driver's config.yaml configuration file. To do this, you will need a tool like 7-Zip that allows you to navigate driver (JAR file) internals.
- Shutdown the Agent service (https://querysurge.zendesk.com/hc/en-us/articles/218053543-Stopping-and-Starting-QuerySurge-Services)
- Open the driver with 7-Zip File Manager (7-Zip must be run as administrator)
- Locate the config.yaml file and right-click to select the Edit option
- Change the rowLimit parameter value from 10000 to 0
- Close and Save the config.yaml file
- Click OK on the 7-Zip confirmation
- Restart the Agent service
This will remove the 10K record limit for all QueryPairs that use the Connection.
Data Type Considerations
The JDBC driver handles certain data types in ways that require a bit of explanation. Following are notes on a few specific data types that should be helpful when writing queries against Cassandra.
- timeuuid and uuid types. The JDBC driver returns the following data types as an object type that can be cast either to ascii or text type when used in QuerySurge: timeuuid and uuid. For more information, see the DataStax CQL data types reference.
- date type. The date type can be cast to a timestamp (in addition to ascii and text). Keep in mind, however, that Cassandra considers your Agent's local UTC time when converting a date value to timestamp. So, for example, the date "12-01-2017", would be read as "12-01-2017 00:00:00" at UTC+0, and when cast to a timestamp by an Agent located in UTC-5 would become "11-30-2017 19:00:00" because of the five-hour difference.
- text, ascii, varchar types. The CQL text, ascii, varchar and inet types (ascii, text and varchar are CQL synonyms) are 2GB in size, qualifying them as CLOB types. For more information on how QuerySurge handles CLOBs, see the our Knowledge Base article on QuerySurge and CLOB Data.
Note: The JDBC driver discussed in this article is open-source software that may or may not be maintained for future Apache Cassandra releases.