BigQuery is Google's serverless, scalable, enterprise data warehouse. This article describes the use of QuerySurge with Google BigQuery to analyze data stored in BigQuery data sets and also data stored in Google cloud storage and Google drive.
Connecting QuerySurge to BigQuery
As with all data stores, QuerySurge connects via JDBC. Google has released BigQuery JDBC drivers from a partnership with Simba Technologies Inc. You can find background and download links for the BigQuery JDBC drivers here.
Configuring BigQuery Authentication
The Simba JDBC Driver for Google BigQuery uses the OAuth 2.0 protocol for authentication. There are four Google OAuth API methods to provide credentials and authenticate a connection to the data warehouse, This article focuses on using a Google Service Account. All four authentication options are documented here.
Create a Service Account
In the dashboard of your Google Cloud Platform project, using the navigation menu at the top left, select IAM & Admin, and then, in the drop down menu select Service accounts.
Create a new Service account by adding a name, ID and description. Assign a role to the BigQuery instance within your project, the minimum permission needed to query data is BigQuery User.
The account that creates this service account by default has access to its settings, you may add user access to the service account if needed. Once the Service account is created, note the account's Email address (which should have the following form):
<Service Account Name>@<Your Project ID>.iam.gserviceaccount.com
Generate the Key File
Next, download a key file for the Service account. In the Service Accounts tab, under the actions column of the service account you created, click the three dot dropdown and select Create Key. Simba supports both JSON and P12 key file formats.
Note: The key file contains your private key. It is critical that the key file has appropriate access limitations once it is deployed, because connection security depends on it. Because you may need to deploy this file to multiple QuerySurge Agents, appropriate access on all Agent machines must be arranged. Note that key file access must be provided for the user under which your QuerySurge Agent(s) run. (See more information about Agents and the user logins under which they run here.)
Deploy the Simba JDBC Driver and oAuth Jars
For QuerySurge to use the BigQuery Simba JDBC driver, the driver jar and the authentication jars must be deployed to all QuerySurge Agents that you plan to use for BigQuery tests. The driver and oAuth jars can be downloaded from the JDBC 4.1-compatible link found here.
Note: The Simba drivers are only compatible with QuerySurge 5.x.x or higher as they are compiled with Java 7.
Note: Ensure the oAuth Jars are also deployed to all Agents as they are required for authentication.
Create a QuerySurge Connection to BigQuery
To create a new BigQuery Connection in QuerySurge, use the Connection Extensibility feature of the QuerySurge Connection Wizard. You can find a description of how to use Extensibility in this article.
The Driver class for the connection is:
The connection string format is:
jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<Your Project ID>;OAuthType=0;OAuthServiceAcctEmail=<Service Email Address>;OAuthPvtKeyPath=<Path to Key File>;
Note: The <Path to Key File> parameter is the full path including the file name, not the directory path without the file name (for example: /home/keyfiles/keyfile.json)
Query Tables in BigQuery with QuerySurge
Tables can be queried by specifying the data set followed by the table in the FROM clause.
SELECT ... FROM <Data Set>.<Table Name>
Note: Simba supports BigQuery's Standard SQL; a full guide can be found here.
Note: Keep in mind that the Simba driver returns all string values as CLOBs. Because QuerySurge respects the clob type (see this Knowledge Base article), users should consider implementing with casting to avoid multiple CLOB datatypes in your results.
Query Files in Google Cloud Storage with QuerySurge and BigQuery
BigQuery allows you to query data from files stored in Google Cloud Storage. First you need to create a table that points to your Cloud Storage location. In a data set of your choosing, select Create a new table.
In the menu for the create table from combo box, select Google Cloud Storage. Next, define the schema for the table. This can be done either manually or automatically.
Note: You can pull multiple files of the same type into one table by using the wildcard character (*) in the Select file from GCS bucket edit box. In the example above, if you have /sample001.csv, /sample002.csv etc. in your bucket, you can use /../sample* to pull all files matching the file name wildcard into one table. Note that these files all should have the same layout so that the same table schema applies across all the files.
Once a table has been created that points at the data in Cloud Storage, you can query it in QuerySurge like any other BigQuery table:
Query Files in Google Drive with QuerySurge and BigQuery
BigQuery also supports querying data from files stored in a Google Drive. The process is similar to querying files in Cloud Storage. A permanent external table in BigQuery must be created. First you'll need a Drive URI. In Google Drive, right click your file and click Get shareable link. The URI will be copied to the clipboard.
In a data set of your choosing select Create table.
In the menu for the Create table from combo box, select Drive. Paste the link into the Select Drive URI field and choose a file format.
Next, enter a table name and define your schema or allow Google to auto-detect it for you and create the table (not shown). Once a table has been created pointing at the data in Drive, you can query it in QuerySurge like any other BigQuery table:
We've shown here that connecting QuerySurge to BigQuery offers multiple avenues to access your data stored in Google Cloud Storage services using a standard SQL dialect. As mentioned earlier, the full guide for Google's standard SQL syntax can be found here.