Skip to main content

BigQuery Integration for Nodus

1. Introduction to the BigQuery Integration

What Is This Integration?

The BigQuery integration connects your Google BigQuery data warehousing platform with Nodus, allowing you to extract, analyze, and transform data from BigQuery tables and views. This integration functions as a source connector, bringing your BigQuery data into the Nodus ecosystem for advanced analytics and business intelligence.

Prerequisites:

  • A Google Cloud project with BigQuery enabled
  • A BigQuery dataset that you want to extract data from
  • A Google Cloud Service Account with the following roles:
    • BigQuery User
    • BigQuery Data Editor
    • BigQuery Job User
  • Service Account Key in JSON format

Connection Overview:

The integration uses Google's BigQuery API to extract data and perform queries. Authentication is handled through a Service Account Key, and data is extracted based on user-defined configurations that specify which datasets and tables to retrieve.

2. Platform Setup Documentation (Setup Form for BigQuery)

Purpose & Scope

This section covers how to set up the initial connection between Nodus and BigQuery by providing the necessary authentication credentials and project details.

Field-by-Field Breakdown:

Integration Name

  • Field Name & Label: Integration Name
  • Description & Purpose: A descriptive name to identify this BigQuery integration within your Nodus account.
  • Validation Rules & Format: Text string, required field.
  • Examples: "Production BigQuery", "Analytics Data Warehouse"
  • Troubleshooting Tips: Use a descriptive name that clearly identifies the specific BigQuery project or dataset.

Project ID

  • Field Name & Label: Project ID
  • Description & Purpose: The unique identifier for your Google Cloud project that contains the BigQuery instance.
  • Validation Rules & Format: Alphanumeric string, required field.
  • Examples: "my-analytics-project-123456"
  • Troubleshooting Tips: You can find your Project ID in the Google Cloud Console by clicking on your project name at the top of the screen. The Project ID is different from the Project Name.

Credential JSON

  • Field Name & Label: Credential JSON
  • Description & Purpose: The Google Cloud Service Account Key in JSON format, used for authentication.
  • Validation Rules & Format: JSON string, required field.
  • Examples:
{
"type": "service_account",
"project_id": "my-project-123456",
"private_key_id": "abcdef1234567890",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIE...\n-----END PRIVATE KEY-----\n",
"client_email": "my-service-account@my-project-123456.iam.gserviceaccount.com",
"client_id": "123456789012345678901",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/my-service-account%40my-project-123456.iam.gserviceaccount.com"
}

  • Troubleshooting Tips: Be sure to copy all contents of the Service Account Key JSON file including the opening and closing braces. The Service Account should have the necessary BigQuery permissions.

Step-by-Step Guide:

  1. Create a Service Account in Google Cloud Console:
    • Navigate to IAM & Admin > Service Accounts
    • Click "Create Service Account"
    • Enter a name and description for the service account
    • Assign the required roles (BigQuery User, BigQuery Data Editor, BigQuery Job User)
    • Click "Create Key" and select JSON format
    • Download the JSON key file
  2. Enter your integration details in the Nodus setup form:
    • Provide a meaningful Integration Name
    • Enter your Google Cloud Project ID
    • Copy and paste the entire contents of the downloaded JSON key file into the Credential JSON field
  3. Save the configuration and test the connection

3. Extraction/Query Configuration (Extraction Form for BigQuery)

Purpose & Overview

This section explains how to configure data extraction from BigQuery. The form allows you to select specific schemas and tables, define extraction strategies, and specify column filters for optimized data extraction.

Template & Field Documentation:

Source Schema and Table Selection

  • Field Name & Label: Source Schema / Source Table
  • Description & Purpose: Defines which dataset (schema) and table to extract data from
  • Validation Rules & Format: Dropdown selection, required field
  • Troubleshooting Tips: The schema list is populated with all available datasets in your BigQuery project, and the table list is populated based on the selected dataset.

Destination Configuration

  • Field Name & Label: Destination Schema / Destination Table
  • Description & Purpose: Defines where extracted data will be stored in the Nodus data warehouse
  • Validation Rules & Format: Dropdown selection, required field
  • Troubleshooting Tips: You can create a new table or select an existing one.

Write Strategy

  • Field Name & Label: Write Strategy
  • Description & Purpose: Defines how data should be written to the destination
  • Validation Rules & Format: Radio button selection, required field
  • Available Options:
    • Upsert - Update existing records and insert new ones
    • Append - Add new records to the existing table
    • Replace - Drop and recreate the table with new data
  • Troubleshooting Tips: Choose the appropriate strategy based on your use case. Upsert requires a primary key.

Workflow & Examples:

  1. Select the source schema (dataset) and table from the cascading dropdowns
  2. Select or create a destination schema and table
  3. Choose a write strategy
  4. Execute extraction

Example Use Cases:

Event Data Analysis:

  • Source Schema: analytics_data
  • Source Table: event_logs
  • Write Strategy: Append

User Data Synchronization:

  • Source Schema: user_data
  • Source Table: user_profiles
  • Write Strategy: Upsert (based on user_id)

4. Troubleshooting & FAQs for BigQuery

Common Issues & Error Messages

Authentication Failures

  • Error: "Invalid JSON key file" or "Invalid service account credentials"
  • Solution: Verify the JSON key is complete, properly formatted, and not corrupted. Ensure the service account has not been deleted or disabled.

Permission Issues

  • Error: "Access denied" or "Permission denied for dataset"
  • Solution: Verify that the service account has the required roles (BigQuery User, BigQuery Data Editor, BigQuery Job User). Check if the dataset has specific access controls that might be restricting access.

Query Failures

  • Error: "Query failed: Syntax error" or "Dataset not found"
  • Solution: Verify the schema and table names are correct. Check if there are any special characters that might need escaping.

Missing Data

  • Common Causes:
    • Incorrect filter conditions
    • Timezone issues with timestamp filters
    • Table partitioning affecting query results
  • Solution: Verify filter conditions, ensure timestamp comparisons account for timezone differences, and check if the table is partitioned.

Contact & Support Information