Apache Kafka is an open source data streaming platform that provides handling for real-time data feeds. Confluent offers the open-source KSQL engine which supports SQL-like querying of Kafka. This article reviews how to connect QuerySurge to Kafka via the KSQL engine, and how to query data from KSQL streams and tables.
Note: Querying against stream data differs from standard relational database systems, for a discussion on data retrieval techniques, see this Knowledge Base article.
The KSQL JDBC driver
To issue queries against your KSQL instance from QuerySurge, you will need to build and deploy the open -source KSQL JDBC driver to your Agent(s). This Github repository contains the information required to build the driver from source.
Once the driver is built, you'll need to deploy it to your QuerySurge Agents. For instructions on how to deploy a JDBC driver to a QuerySurge Agent, see this article for Agents on Windows, and this article for Agents on Linux.
Creating a Connection
To create a connection using the KSQL JDBC driver, we use QuerySurge's Connection Extensibility feature. For more details on using this feature, see this Knowledge Base article. Key points specific for this driver are:
- Class: com.github.mmolimar.ksql.jdbc.KsqlDriver
- URL: jdbc:ksql://<ksql-host>:<port>
- KSQL port (the default KSQL port is 8088)
- If you are using SSL append ?secured=true to the URL
In the example below, we use the simplest KSQL URL: jdbc:ksql://192.168.0.37:8088, however more elaborate URLs may be required to deal with query termination (see discussion below).
You can enter an optional test query to check the connectivity between QuerySurge and the KSQL engine. A generic KSQL query to try is: SHOW TABLES
Once the connection has been created, you can query data from your KSQL tables or streams using QuerySurge QueryPairs, using the SQL dialect supported by the KSQL driver.
Query "Configuration" and the KSQL JDBC URL
A critical aspect of KSQL querying is that KSQL queries are by definition non-terminating. Therefore, if you issue a KSQL query from QuerySurge (or any other standard query tool) without any instructions about termination, the connection will open but the query will never terminate for the usual post-processing. With KSQL, there are a few options for query termination. One is to use a limit clause on each of your queries and the second is to put a connection-level timeout on your JDBC URL. The URL timeout parameter is set using the following syntax (note that the timeout is in ms, not sec):
Note: The value you choose for your timeout should be chosen with your testing goals in mind; we choose 5 sec here only as an illustration. See this Knowledge Base article for more discussion of query termination.
Another important aspect of querying KSQL is in regard to where the KSQL engine should report from. That is, should the engine look for the most recent message or for the oldest message in a topic (this is called the query offset)? The default offset is for KSQL to start with the 'latest' message in the topic, but for data testing purposes it is common to query from the 'earliest' message. This value can be set on your KSQL URL, as follows:
Note that two key-value pairs must be set on the URL: the properties key-value pair must be set to 'true' and the auto.offset.reset must be set to 'earliest'.
Finally, combining the timeout parameter and the offset parameters on the URL gives the following: