This procedure is for QuerySurge Agents deployed on Windows. The drivers on which this procedure is based are the standard Apache Hive JDBC drivers. We strongly encourage using the specific Apache drivers from your Hadoop distro and version.
1. JDBC Driver files and related files
Note: Due to changes in recent releases of the Apache Hive "standalone" driver jar (see 1a), depending on your version, the list of required jars shown may not be correct or complete for kerberized Hive connections. The typical result in this case is that your connection attempt will error out with either a NoClassDefFoundError or a ClassNotFoundException. If you receive this outcome, we recommend trying one of the other options listed below (1b or 1c).
1a. Apache Hive "standalone" driver. Get the JDBC driver jars for your Hive distro from your Hive server (asterisks indicate distribution-specific notations):
hive-jdbc-***-standalone.jar
hadoop-common-***.jar
hadoop-auth-***.jar
You may also need:
commons-configuration-X.X.jar
xercesImpl-X.X.X.jar
xml-apis-X.X.X.jar
Note: For Zookeeper URLs (see Step 5), additional jar files may be required: curator-client-X.X.X.jar, curator-framework-X.X.X.jar.
1b. Commercial JDBC drivers. QuerySurge's vendor, RTTS, partners with CData; you can see information about the CData driver offering here.
1c. Tim Veil's Hive JDBC "uber" driver. This is a true "standalone" driver jar. You can download this driver from this github. Note that you'll need to match your Hive version to the driver version that you download.
Deploy the driver jar(s) to your Agent; you can find instructions for this procedure here.
Note: If you have previously deployed Hive driver jars, you should delete them from the Agent driver directory before you deploy the driver jars from your Hive server.
You will also need hadoop winutils.exe for 1a and 1b.
Note: winutils.exe is not distributed with Hadoop. You can download it from Hortonworks here or (at the time of this writing) from this github.
2. Kerberos Configuration Files
2a. krb5.conf (krb5.ini) file - From your Kerberos admin or other knowledgeable resource, obtain a krb5.conf file. On Windows, the file may be called krb5.ini.
2b. The keytab file - If you're authenticating to Kerberos via a keytab, you'll need to obtain a keytab file (usually generated by a Kerberos admin or other knowledgeable resource), and the user principal associated with the keytab.
2c. You'll need a gss-jaas.conf file, which points to the keytab file. The basic gss-jaas.conf file layout is below. Note the dummy path to the keytab and the dummy principal in this file template:com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=false
doNotPrompt=true
useKeyTab=true
keyTab="<QuerySurge Install Dir>/QuerySurge/agent/mykeytab.keytab"
principal="myuserprin@PRINCIPAL.DOM.COM"
debug=true;
};
Note: Specify your keytab path with forward slashes (even though your deployment is on Windows). Using the default install path on Windows, this becomes:keyTab="C:/Program Files/QuerySurge/agent/mykeytab.keytab"
Note: A principal has three parts: the primary, the instance, and the realm. The format of a typical Kerberos V5 principal is: primary/instance@REALM. Your principal for your gss-jaas.conf may need both the primary and the instance:principal="hive/myuserprin@PRINCIPAL.DOM.COM"
or it may only require the instance (as shown above) :principal="myuserprin@PRINCIPAL.DOM.COM"
Note: The instance field of the principal may be a user name myuserprin or it may be a fully qualified name myuserprin.dom.com, depending on your configuration.
Note: The principal used in your gss-jaas.conf may need to be a user principal, while the principal used in the JDBC URL (see below) may need to be a service principal, depending on your Kerberos configuration.
Note: In the example above, the leading name com.sun.security.jgss.initiate, which appears before the opening curly brace, is the Login Context. Your Login Context may be different; check with a Kerberos-knowledgeable resource in your organization.
Note: keytab files may have a single key for a single principal or may have multiple keys.
3. Agent Setup
Note: We recommend using the QuerySurge Agent directories for the file locations in this setup step, since this should be the same for all your Agents. However, there is no requirement to use these specific directories.
3a. Go to the directory: <QuerySurge Install Dir>\QuerySurge\agent\, and run QuerySurgeAgentw.exe as Administrator. (Make sure to run QuerySurgeAgentw.exe and not QuerySurgeAgent.exe.) On the General tab, use the Stop button (lower left) to stop the Agent.
3b. Make sure that the proper set of JDBC jar files is deployed to <QuerySurge Install Dir>\QuerySurge\agent\jdbc, per step 1a.
3c. In <QuerySurge Install Dir>\QuerySurge\agent\, create a bin\ directory.
Copy the winutil.exe that you downloaded (step 1b) to <QuerySurge Install Dir>\QuerySurge\agent\bin\.
3d. Copy your krb5.ini (or krb5.conf), gss-jaas.conf and keytab files to <QuerySurge Install Dir>\QuerySurge\agent\
3e. You may need to use a Kerberos client to run a kinit command locally, on your Agent box(es). This usually requires installation of a KDC client on the Agent box. Not all clients are compatible with all Kerberos implementations, so this should be done in consultation with a resource knowledgeable about your organizations's Kerberos setup.
4. Agent Configuration
4a. In QuerySurgeAgentw.exe, click on the Java tab.
In the Java Options box, add the following on separate lines. Do not delete anything from the Java Options box! Note that some of the values are file paths, which are dummy paths in the sample below:-Djavax.security.auth.useSubjectCredsOnly=false
-Djava.security.krb5.conf=<QuerySurge Install Dir>\QuerySurge\agent\krb5.ini
-Djava.security.auth.login.config=<QuerySurge Install Dir>\QuerySurge\agent\gss-jaas.conf
-Dhadoop.home.dir=<QuerySurge Install Dir>\QuerySurge\agent
Note: You'll need to replace the dummy paths in these settings with actual paths.
Note: If you do not need a gss-jaas.conf file, then omit the-Djava.security.auth.login.config switch.
Note: During setup, a high level of debug output is often helpful. Add the following command-line options while you are debugging your connection:-Dsun.security.krb5.debug=true
-Djava.security.debug=gssloginconfig,configfile,configparser,logincontext
4b. In QuerySurgeAgentw.exe, click on the General tab.
Use the Start button (lower left) to start your Agent.
5. QuerySurge Connection Wizard (using the Connection Extensibility option)
5a. Open the Connection Wizard in the QuerySurge Admin view. Select the Connection Extensibility option in the Data Source dropdown. Use the values and templates below to set up your connection.
5b. Driver Class:
org.apache.hive.jdbc.HiveDriver (do not use the legacy value org.apache.hadoop.hive.HiveDriver)
5c. Hive JDBC URL:jdbc:hive2://[my_hive_server]:[port]/[database];principal=prin.dom.com@APRINCIPAL.DOM.COM
A simple sample dummy Hive URL:jdbc:hive2://server.dom.com:10000/mydatabase;principal=hive/principal.dom.com@PRINCIPAL.DOM.COM
A sample dummy Hive URL using the _HOST specifier:jdbc:hive2://server.dom.com:10000/mydatabase;principal=hive/_HOST@PRINCIPAL.DOM.COM
Note: For compliant systems, if _HOST is used in the principal, it will be resolved to the actual hostname of the HiverServer2 instance. Check your documentation.
A more elaborate Hive URL, including the user looks like:
jdbc:hive2://server.dom.com:10000/mydatabase;user=someuser@PRINCIPAL.DOM.COM;principal=hive/principal.dom.com@PRINCIPAL.DOM.COM
A Zookeeper URL may be used (if enabled). Examples include:
jdbc:hive2://server1.dom.com:2181,server2.dom.com:2181/mydatabase;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
jdbc:hive2://server1.dom.com:2181,server2.dom.com:2181/mydatabase;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;kerberosAuthType=fromSubject;auth=kerberos
jdbc:hive2://server1.dom.com:2181,server2.dom.com:2181/mydatabase;serviceDiscoveryMode=zooKeeper;principal=hive/principal.dom.com@PRINCIPAL.DOM.COM
Note: The principal used in the JDBC URL typically must be a service principal; however depending on your Kerberos configuration, the URL may require a user principal.
Note: Your URL may, in some cases (depending on your configuration), contain a value with spaces (e.g. a file path). Spaces will most likely need to be encoded (i.e. represented as the corresponding hex value '%20') or you will receive an error: IllegalArgumentException: Illegal character in path at index... .
5d. Once you have set up a URL, try a Connection Test or a QueryPair with a simple query of the form:SELECT * FROM mydatabase.mytable LIMIT 5
Notes and General Comments:
- You may need to deploy unlimited strength policy jars to your QuerySurge java. (An error output of: found unsupported keytype (18) may indicate the need for unlimited strength jars.) For Java 7, these jars (local_policy.jar, US_export_policy.jar) are available here. For Java 8, the jars are available here. The jars should be deployed to <QuerySurge Install Dir>\QuerySurge\java\lib\security. The existing policy jars should be cached in another folder.
- The Connection test is subject to the browser timeout. If you get a timeout message during the Connection test, that may not be an indication of a true timeout. If you receive a timeout message, create a QueryPair using your test query for both Source and Target queries.
- Kerberos has many possible ways that it can be set up. These instructions go through a common path, but it may not be completely correct for your environment. You will likely have to consult with a Kerberos admin or other knowledgeable resource on this setup.
- This procedure should be performed on all Agents that you have deployed.