This procedure is for QuerySurge Agents deployed on Linux. The drivers on which this procedure is based are the standard Apache Hive JDBC drivers. We strongly encourage using the specific Apache drivers from your Hadoop distro and version.
1. JDBC Driver files and related files
Note: Due to changes in recent releases of the Apache Hive "standalone" driver jar (see 1a), depending on your version, the list of required jars shown may not be correct or complete for kerberized Hive connections. The typical result in this case is that your connection attempt will error out with either a NoClassDefFoundError or a ClassNotFoundException. If you receive this outcome, we recommend trying one of the other options listed below (1b or 1c).
1a. Apache Hive "standalone" driver. Get the JDBC jars for your Hive distro from your Hive server (asterisks indicate distribution-specific notations):
hive-jdbc-***-standalone.jar
hadoop-common-***.jar
hadoop-auth-***.jar
You may also need:
commons-configuration-X.X.jar
xercesImpl-X.X.X.jar
xml-apis-X.X.X.jar
Note: For Zookeeper URLs (see Step 5), additional jar files may be required: curator-client-X.X.X.jar, curator-framework-X.X.X.jar.
1b. Commercial JDBC drivers. QuerySurge's vendor, RTTS, partners with CData; you can see information about the CData driver offering here.
1c. Tim Veil's Hive JDBC "uber" driver. This is a true "standalone" driver jar. You can download this driver from this github. Note that you'll need to match your Hive version to the driver version that you download.
Deploy the driver jars to your Agent; you can find instructions for this procedure here.
Note: If you have previously deployed Hive driver jars, you should delete them from the Agent driver directory before you deploy the driver jars from your Hive server.
2. Kerberos Configuration Files
2a. krb5.conf file - From your Kerberos admin or other knowledgeable resource, obtain a krb5.conf file.
2b. The keytab file - If you're authenticating to Kerberos via a keytab, you'll need to obtain a keytab file (usually generated by a Kerberos admin or other knowledgeable resource), and the user principal associated with the keytab.
2c. You'll need a gss-jaas.conf file, which points to the keytab file. The basic gss-jaas.conf file layout is below. Note the dummy path to the keytab and the dummy principal in this file template:
com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=false
doNotPrompt=true
useKeyTab=true
keyTab="/<QuerySurge Install Dir>/QuerySurge/agent/mykeytab.keytab"
principal="myuserprin@PRINCIPAL.DOM.COM"
debug=true;
};
Note: A principal has three parts: the primary, the instance, and the realm. The format of a typical Kerberos V5 principal is: primary/instance@REALM. Your principal for your gss-jaas.conf may need both the primary and the instance:principal="hive/myuserprin@PRINCIPAL.DOM.COM"
or it may only require the instance (as shown above) :principal="myuserprin@PRINCIPAL.DOM.COM"
Note: The instance field of the principal may be a user name myuserprin or it may be a fully qualified name myuserprin.dom.com, depending on your configuration.
Note: The principal used in your gss-jaas.conf may need to be a user principal, while the principal used in the JDBC URL (see below) may need to be a service principal, depending on your Kerberos configuration.
Note: In the example above, the leading name com.sun.security.jgss.initiate, which appears before the opening curly brace, is the Login Context. Your Login Context may be different; check with a Kerberos-knowledgeable resource in your organization.
Note: keytab files may have a single key for a single principal or may have multiple keys.
3. Agent Setup
Note: We recommend using the QuerySurge Agent directories for the file locations in this setup step, since this should be the same for all your Agents. However, there is no requirement to use these specific directories.
3a. Stop the Agent with the command:[sudo] service QuerySurge stop agent
3b. Make sure that the proper set of JDBC jar files is deployed to <QuerySurge Install Dir>/QuerySurge/agent/jdbc per step 1a.
3c. Copy your krb5.conf, gss-jaas.conf and keytab files to /<QuerySurge Install Dir>/QuerySurge/agent/
3d. You may need to use the Kerberos client to run a kinit command locally, on your Agent box(es). This requires installation of a KDC client on the Agent box. The command has the following form:kinit -kt mykeytab.keytab user@PRINCIPAL.DOM.COM
where the principal is the user principal for the user that is connecting to Hive.
Note: You may need to use the full path to the keytab file.
Note: You may have to copy the krb5.conf to /etc directory.
4. Agent Configuration
4a. Make a copy of /<QuerySurge Install Dir>/QuerySurge/agent/ctl.sh:[sudo] cp ctl.sh ctl.sh.orig
4b. CAREFULLY edit your ctl.sh to add the following to the line starting with nohup nice. These switches should be added immediately before-cp (with single-space separation):-Djavax.security.auth.useSubjectCredsOnly=false
-Djava.security.krb5.conf=/<QuerySurge Install Dir>/QuerySurge/agent/krb5.conf
-Djava.security.auth.login.config=/<QuerySurge Install Dir>/QuerySurge/agent/gss-jaas.conf
Note: You'll need to replace the dummy paths in these settings with actual paths to the files.
Note: If you do not need a gss-jaas.conf file, then omit the -Djava.security.auth.login.config switch.
Note: During setup, a high level of debug output is often helpful. Add the following command-line options while you are debugging your connection:-Dsun.security.krb5.debug=true
-Djava.security.debug=gssloginconfig,configfile,configparser,logincontext
You can download a sample copy of Agent ctl.sh files with the default values (see the Resources section at the end of the article). The purpose of this file is for illustration of the changes required, not for deployment to your instance.
See Resources section: ctl.kerb.sh
4c. Start your Agent:[sudo] service QuerySurge start agent
5. QuerySurge Connection Wizard (using the Connection Extensibility option)
5a. Open the Connection Wizard in the QuerySurge Admin view. Select the Connection Extensibility option in the Data Source dropdown. Use the values and templates below to set up your connection.
5b. Driver Class:
org.apache.hive.jdbc.HiveDriver (do not use the legacy value org.apache.hadoop.hive.HiveDriver)
5c. Hive JDBC URL:jdbc:hive2://[my_hive_server]:[port]/database;principal=prin.dom.com@APRINCIPAL.DOM.COM
A sample dummy Hive URL:jdbc:hive2://server.dom.com:10000/mydatabase;principal=hive/principal.dom.com@PRINCIPAL.DOM.COM
A sample dummy Hive URL using the _HOST specifier:jdbc:hive2://server.dom.com:10000/mydatabase;principal=hive/_HOST@PRINCIPAL.DOM.COM
Note: For compliant systems, if _HOST is used in the principal, it will be resolved to the actual hostname of the HiverServer2 instance. Check your documentation.
A more elaborate Hive URL including the user looks like:
jdbc:hive2://server.dom.com:10000/mydatabase;user=someuser@PRINCIPAL.DOM.COM;principal=hive/principal.dom.com@PRINCIPAL.DOM.COM
A Zookeeper URL may be used (if enabled). Examples include:
jdbc:hive2://server1.dom.com:2181,server2.dom.com:2181/mydatabase;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
jdbc:hive2://server1.dom.com:2181,server2.dom.com:2181/mydatabase;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;kerberosAuthType=fromSubject;auth=kerberos
jdbc:hive2://server1.dom.com:2181,server2.dom.com:2181/mydatabase;serviceDiscoveryMode=zooKeeper;principal=hive/principal.dom.com@PRINCIPAL.DOM.COM
Note: The principal used in the JDBC URL typically must be a service principal; however depending on your Kerberos configuration, the URL may require a user principal.
5d. Once you have set up a url, try a Connection Test or a QueryPair with a simple query of the form:SELECT * FROM mydatabase.mytable LIMIT 5
Notes and General Comments:
- You may need to deploy unlimited strength policy jars to your QuerySurge java. (An error output of: found unsupported keytype (18) may indicate the need for unlimited strength jars.) For Java 7, these jars (local_policy.jar, US_export_policy.jar) are available here. For Java 8, the jars are available here. The jars should be deployed to /<QuerySurge Install Dir>/QuerySurge/java/lib/security. The existing policy jars should be cached in another folder.
- The Connection test is subject to the browser timeout. If you get a timeout message during the Connection test, that may not be an indication of a true timeout. If you receive a timeout message, create a QueryPair using your test query for both Source and Target queries.
- Kerberos has many possible ways that it can be set up. These instructions go through a common path, but it may not be completely correct for your environment. You will likely have to consult with a Kerberos admin or other knowledgeable resource on this setup.
- This procedure should be done for all QuerySurge Agents that you have deployed.
Resources
1. Agent ctl.sh shell modified (file: ctl.kerb.sh) with Kerberos switches using the default install paths.