This procedure is for QuerySurge Agents deployed on Windows. The drivers on which this procedure is based are the Cloudera Hive JDBC drivers, which may be downloaded from the Cloudera website. Use a current version of these drivers. The setup here assumes that your Hive server uses Kerberos authentication with an LDAP server in the background.
1. JDBC Driver files and related files
1a. Download the Cloudera driver zip, and select the JDBC 4.1 drivers. The "Cloudera-JDBC Driver-for-Apache-Hive-Install-Guide" pdf that comes with the driver download has a full description of the driver setup and options; this article is based in part on these instructions. You will need to distribute the driver jar files to each QuerySurge Agent (depending on version, your download may have multiple jar files or it may have a single jar file).
Deploy the driver jars to your Agent(s); you can find instructions for this procedure here.
Note: If you have previously deployed Hive driver jars, you should remove them from the Agent jdbc directory before you deploy new driver jar(s).
2. Kerberos Configuration Files
2a. krb5.conf (krb5.ini) file - From your Kerberos admin or other knowledgeable resource, obtain a krb5.conf file. On Windows, the file may be called krb5.ini.
2b. The keytab file - If you're authenticating to Kerberos via a keytab, you'll need to obtain a keytab file (usually generated by a Kerberos admin or other knowledgeable resource), and the user principal associated with the keytab.
2c. You'll need a gss-jaas.conf file, which points to the keytab file. The basic gss-jaas.conf file layout is below. Note the dummy path to the keytab and the dummy principal in this file template:
keyTab="<QuerySurge Install Dir>/QuerySurge/agent/mykeytab.keytab"
Note: Specify your keytab path with forward slashes (even though your deployment is on Windows). Using the default install path on Windows, this becomes:
Note: A principal has three parts: the primary, the instance, and the realm. The format of a typical Kerberos V5 principal is: primary/instance@REALM. Your principal for your gss-jaas.conf may need the full notation (both the primary and the instance@REALM):
or it may only require the instance@REALM (as shown above) :
Note: The instance field of the principal may be a user name myuserprin or it may be a fully qualified name myuserprin.dom.com, depending on your configuration.
Note: The principal used in your gss-jaas.conf is usually a user principal, while the principal used in the JDBC URL (see below) is usually a service principal, depending on your Kerberos configuration.
Note: In the example above, the leading name Client, which appears before the opening curly brace, is the Login Context. Your Login Context may be different; check with a Kerberos-knowledgeable resource in your organization.
Note: keytab files may have a single key for a single principal or it may have multiple keys.
3. Agent Setup
Note: We recommend using the QuerySurge Agent directories for the file locations in this setup step, since this should be the same for all your Agents. However, there is no requirement to use these specific directories.
3a. Go to the directory: <QuerySurge Install Dir>\QuerySurge\agent\, and run QuerySurgeAgentw.exe as Administrator. (Make sure to run QuerySurgeAgentw.exe and not QuerySurgeAgent.exe.) On the General tab, use the Stop button (lower left) to stop the Agent.
3b. In <QuerySurge Install Dir>\QuerySurge\agent\jdbc, delete any hadoop or hive jar files.
Copy the jar files you downloaded (step 1a) to this directory.
3c. Copy your krb5.ini (or krb5.conf), gss-jaas.conf and keytab files to <QuerySurge Install Dir>\QuerySurge\agent\.
3d. Create a Windows environment variable for the Kerberos cache. Right-click on the computer icon and select Properties. Click on Advanced System Settings. In the System Properties dialog, click on the Advanced tab. Click on the Environment Variables button. Under System Variables, click New. In the New System Variable dialog, type KRB5CCNAME in the Variable Name field. In the Variable Value field, type the path for your credential cache file (e.g. C:\TEMP\krb5cache). Click the OK buttons to save your changes. A system restart is advisable.
3e. You may need to run a kinit command locally, on your Agent box(es). You can use the kinit command that is part of the Java distribution that comes with QuerySurge (found at: <QuerySurge Install Dir>\QuerySurge\java\bin\kinit.exe). Or, you may install a KDC client on the Agent box(es). Not all clients are compatible with all Kerberos implementations, so this should be done in consultation with a resource knowledgeable about your organizations's Kerberos setup.
4. Agent Configuration
4a. In QuerySurgeAgentw.exe, click on the Java tab.
In the Java Options box, add the following on separate lines. Do not delete anything from the Java Options box! Note that some of the values are file paths, which are dummy paths in the sample below:
Note: You'll need to replace the dummy paths in these settings with actual paths.
Note: During setup, a high level of debug output is often helpful. Add the following command-line options while you are debugging your connection:
4b. In QuerySurgeAgentw.exe, click on the General tab.
Use the Start button (lower left) to start your Agent.
5. QuerySurge Connection Wizard (using the Connection Extensibility option)
5a. Open the Connection Wizard in the QuerySurge Admin view. Select the Connection Extensibility option in the Data Source dropdown. Use the following values and templates:
Hive JDBC URL:
A simple sample dummy Hive URL:
Once you have set up a URL, try a Connection Test or a QueryPair with a simple query of the form:
SELECT * FROM mydatabase.mytable LIMIT 5
Notes and General Comments:
- You may need to deploy unlimited strength policy jars to your QuerySurge java. (An error output of: found unsupported keytype (18) may indicate the need for unlimited strength jars.) For Java 7, these jars (local_policy.jar, US_export_policy.jar) are available here. For Java 8, the jars are available here. The jars should be deployed to <QuerySurge Install Dir>\QuerySurge\java\lib\security. The existing policy jars should be cached in another folder.
- The Connection test is subject to the browser timeout. If you get a timeout message during the Connection test, that may not be an indication of a true timeout. If you receive a timeout message, create a QueryPair using your test query for both Source and Target queries.
- Kerberos has many possible ways that it can be set up. These instructions go through a common path, but it may not be completely correct for your environment. You will likely have to consult with a Kerberos admin or other knowledgeable resource on this setup.
- This procedure should be performed on all Agents that you have deployed.