This procedure is for QuerySurge Agents deployed on Linux. The drivers on which this procedure is based are the standard Apache Hive JDBC drivers. We strongly encourage using the specific Apache drivers from your Hadoop distro and version.
1. JDBC Driver files and related files
1a. Get the JDBC jars for your Hive distro from your Hive server (asterisks indicate distribution-specific notations):
You may also need:
For HDP distros ('X's refer to version numbers):
For CDH distros ('X's refer to version and distro numbers):
For Other distros:
If your distro and version are different, you'll need to find paths and files corresponding to your version, and copy the files to your QuerySurge Agent.
Deploy the driver jars to your Agent; you can find instructions for this procedure here.
Note: If you have previously deployed Hive driver jars, you should delete them from the Agent driver directory before you deploy the driver jars from your Hive server.
2. Kerberos Configuration Files
2a. krb5.conf file - From your Kerberos admin or other knowledgeable resource, obtain a krb5.conf file.
2b. The keytab file - If you're authenticating to Kerberos via a keytab, you'll need to obtain a keytab file (usually generated by a Kerberos admin or other knowledgeable resource), and the user principal associated with the keytab.
2c. You'll need a gss-jaas.conf file, which points to the keytab file. The basic gss-jaas.conf file layout is below. Note the dummy path to the keytab and the dummy principal in this file template:
keyTab="/<QuerySurge Install Dir>/QuerySurge/agent/mykeytab.keytab"
Note: A principal has three parts: the primary, the instance, and the realm. The format of a typical Kerberos V5 principal is: primary/instance@REALM. Your principal for your gss-jaas.conf may need both the primary and the instance:
or it may only require the instance (as shown above) :
Note: The instance field of the principal may be a user name myuserprin or it may be a fully qualified name myuserprin.dom.com, depending on your configuration.
Note: The principal used in your gss-jaas.conf may need to be a user principal, while the principal used in the JDBC URL (see below) may need to be a service principal, depending on your Kerberos configuration.
Note: In the example above, the leading name com.sun.security.jgss.initiate, which appears before the opening curly brace, is the Login Context. Your Login Context may be different; check with a Kerberos-knowledgeable resource in your organization.
Note: keytab files may have a single key for a single principal or may have multiple keys.
3. Agent Setup
Note: We recommend using the QuerySurge Agent directories for the file locations in this setup step, since this should be the same for all your Agents. However, there is no requirement to use these specific directories.
3a. Stop the Agent with the command:
[sudo] service QuerySurge stop agent
3b. Make sure that the proper set of JDBC jar files is deployed to <QuerySurge Install Dir>/QuerySurge/agent/jdbc per step 1a.
3c. Copy your krb5.conf, gss-jaas.conf and keytab files to /<QuerySurge Install Dir>/QuerySurge/agent/
3d. You may need to use the Kerberos client to run a kinit command locally, on your Agent box(es). This requires installation of a KDC client on the Agent box. The command has the following form:
kinit -kt mykeytab.keytab user@PRINCIPAL.DOM.COM
where the principal is the user principal for the user that is connecting to Hive.
Note: You may need to use the full path to the keytab file.
Note: You may have to copy the krb5.conf to /etc directory.
4. Agent Configuration
4a. Make a copy of /<QuerySurge Install Dir>/QuerySurge/agent/ctl.sh:
[sudo] cp ctl.sh ctl.sh.orig
4b. CAREFULLY edit your ctl.sh to add the following to the line starting with nohup nice. These switches
should be added immediately after -XX:-UseSplitVerifier and before -cp (with single-space separation):
-Djava.security.krb5.conf=/<QuerySurge Install Dir>/QuerySurge/agent/krb5.conf
-Djava.security.auth.login.config=/<QuerySurge Install Dir>/QuerySurge/agent/gss-jaas.conf
Note: You'll have to replace the dummy paths in these settings with actual paths to the files.
Note: If you do not need a gss-jaas.conf file, then omit the
Note: During setup, a high level of debug output is often helpful. This can be implemented by adding the following command-line option:
For the default installation in /opt/QuerySurge, the switches are:
You can download a sample copy of ctl.kerb.sh with these default values (see the Resources section at the end of the article). The purpose of this file is for illustration of the changes required, not for deployment to your instance.
See Resources section: ctl.kerb.sh
4c. Start your Agent:
[sudo] service QuerySurge start agent
5. QuerySurge Connection Wizard (using the Connection Extensibility option)
5a. Open the Connection Wizard in the QuerySurge Admin view. Select the Connection Extensibility option in the Data Source dropdown. Use the following values and templates:
org.apache.hive.jdbc.HiveDriver (do not use the legacy value org.apache.hadoop.hive.HiveDriver)
Hive JDBC URL:
A sample dummy Hive URL:
A more elaborate Hive URL including the user looks like:
A Zookeeper URL may be used (if enabled). Examples include:
Note: The principal used in the JDBC URL often needs a service principal; however depending on your Kerberos configuration, the URL may require a user principal.
Once you have set up a url, try a Connection Test or a QueryPair with a simple query of the form:
SELECT * FROM mydatabase.mytable LIMIT 5
Notes and General Comments:
- You may need to deploy unlimited strength policy jars to your QuerySurge java. (An error output of: found unsupported keytype (18) may indicate the need for unlimited strength jars.) For Java 7, these jars (local_policy.jar, US_export_policy.jar) are available here. The jars should be deployed to /<QuerySurge Install Dir>/QuerySurge/java/lib/security. The existing policy jars should be cached in another folder.
- The Connection test is subject to the browser timeout. If you get a timeout message during the Connection test, that may not be an indication of a true timeout. If you receive a timeout message, create a QueryPair using your test query for both Source and Target queries.
- Kerberos has many possible ways that it can be set up. These instructions go through a common path, but it may not be completely correct for your environment. You will likely have to consult with a Kerberos admin or other knowledgeable resource on this setup.
- This procedure should be done for all QuerySurge Agents that you have deployed.
1. Agent ctl.sh shell modified (file: ctl.kerb.sh) with Kerberos switches using the default install paths.