SQL Runner

Purpose and Overview

The SQL Runner is a sophisticated query development environment integrated within the Nodus platform, designed to provide advanced data exploration and transformation capabilities. This module enables technical users to interact directly with their data through SQL, facilitating complex analysis, data preparation, and transformation logic development.

Core Capabilities

Interactive Query Development: Real-time SQL authoring with syntax highlighting
Query Execution: Immediate processing against connected data sources
Result Visualization: Tabular representation of query outputs
Worksheet Management: Organizational structure for query development
Publishing Mechanism: Promotion of validated queries to production pipelines
Integration with DataFlow Canvas: Incorporation of SQL transformations into data pipelines

Interface Components

The SQL Runner interface consists of several key components, each serving a specific function in the query development process:

Worksheet Navigator

The left panel presents an organizational structure for SQL queries:

Database Selector: Choose the target database for execution
Folder Hierarchy: Logical organization of related worksheets
Search Functionality: Quick location of specific worksheets
Creation Controls: Add new worksheets and folders

Query Editor

The central panel provides a Monaco-based code editor optimized for SQL development:

Syntax Highlighting: Color-coding for SQL keywords and structures
Auto-completion: Context-aware suggestions for table and column names
Multiple Statement Support: Execution of sequential SQL commands

Results Panel

The bottom panel displays the output of executed queries:

Tabular Results: Grid representation of returned data
Export Options: Data extraction to various formats
Error Reporting: Detailed diagnostic information
Pagination: Navigation through large result sets

Creating a New Worksheet

Navigate to the SQL Runner interface (accessible from the Data Explorer)
Click the "New Worksheet" button in the navigator panel
In the creation dialog:
- Select the target database
- Choose an existing folder or create a new one
- Provide a descriptive name for the worksheet
The new worksheet will open in the editor with a template comment structure

Writing and Executing SQL

Basic Query Execution

Enter valid SQL in the editor panel
Execute the query using one of the following methods:
- Click the "Run" button in the toolbar
View results in the results panel
Analyze returned data and execution statistics

Partial Query Execution

For complex multi-statement scripts:

Select the specific SQL statement to execute
Click the "Run Selected" button
Only the highlighted portion will be executed
Results will display in the results panel

Publishing Worksheets

The Publishing Process

Publishing transforms a development worksheet into a production-ready transformation that can be incorporated into DataFlow Canvas pipelines:

Develop and test the SQL query in a worksheet
Click the "Publish" button in the worksheet toolbar
In the publication dialog:
- Verify the worksheet name (used as the reference name)
- Provide a description of the transformation's purpose
- Select the appropriate version tag (e.g., Development, Staging, Production)
- Configure additional metadata as needed
Confirm the publication
The worksheet transitions to "Published" status with a visual indicator

Working with Published Worksheets

Published worksheets have special characteristics:

Version Control: Maintains history of changes with appropriate metadata
Production Status: Indicates current deployment status
Usage Tracking: Shows where the transformation is being utilized
Access Control: May have restricted edit permissions based on status -- Dataflow Canvas Integration: Allows the transformation to be incorporated into DataFlow Canvas pipelines

Limitations and Best Practices

Performance Considerations

Row Limits: Results are automatically limited to 500 rows to prevent excessive resource consumption

Query Optimization

Utilize appropriate filtering in the WHERE clause to reduce data volume
Include explicit column lists rather than SELECT *
Consider materialized views for frequently queried complex aggregations
Use EXPLAIN PLAN to identify performance bottlenecks
Implement appropriate indexing on frequently queried columns

Data Export

Small result sets can be exported directly from the interface
Larger exports are processed asynchronously with notification upon completion
Very large exports may require dedicated extraction jobs through DataFlow Canvas

Security Considerations

SQL Runner enforces all database-level access controls
Avoid storing sensitive information in worksheet comments
Published worksheets may require approval in regulated environments
Data masking applies to sensitive columns based on user permissions

Integration with DataFlow Canvas

SQL Runner worksheets form a critical component of the Nodus transformation ecosystem:

Published worksheets appear as available transformations in the DataFlow Canvas
Within a DataFlow:
- Add a Transformation task group
- Select a published SQL worksheet as the transformation logic
- Configure execution parameters and dependencies
- Connect appropriate source and destination nodes

This integration enables:

Development-to-Production Workflow: Seamless transition from exploration to production
Reusable Transformation Logic: Consistent application of business rules
Parameterized Execution: Dynamic processing based on runtime variables
Scheduled Processing: Automated execution on defined schedules

Example Use Cases

Data Cleaning and Standardization

-- Standardize customer address data
SELECT
  customer_id,
  UPPER(TRIM(first_name)) AS first_name,
  UPPER(TRIM(last_name)) AS last_name,
  REGEXP_REPLACE(phone, '[^0-9]', '') AS normalized_phone,
  CASE 
    WHEN LENGTH(postal_code) = 4 THEN LPAD(postal_code, 5, '0')
    ELSE postal_code
  END AS standardized_postal_code
FROM raw_customer_data
WHERE customer_id IS NOT NULL

Cross-Source Aggregation

-- Combine marketing performance across channels
WITH google_data AS (
  SELECT 
    date,
    'Google' AS source,
    campaign_name,
    SUM(impressions) AS impressions,
    SUM(clicks) AS clicks,
    SUM(cost) AS spend
  FROM google_ads_campaigns 
  WHERE date BETWEEN '2023-01-01' AND '2023-01-31'
  GROUP BY date, campaign_name
),
facebook_data AS (
  SELECT 
    date,
    'Facebook' AS source,
    campaign_name,
    SUM(impressions) AS impressions,
    SUM(link_clicks) AS clicks,
    SUM(spend) AS spend
  FROM facebook_ads_campaigns
  WHERE date BETWEEN '2023-01-01' AND '2023-01-31'
  GROUP BY date, campaign_name
)
SELECT * FROM google_data
UNION ALL
SELECT * FROM facebook_data
ORDER BY date, source, campaign_name

Complex Business Logic Implementation

-- Calculate customer lifetime value with segmentation
WITH purchase_history AS (
  SELECT
    customer_id,
    SUM(order_value) AS total_spend,
    COUNT(DISTINCT order_id) AS order_count,
    MIN(order_date) AS first_purchase_date,
    MAX(order_date) AS last_purchase_date,
    DATE_DIFF('day', MIN(order_date), MAX(order_date)) AS customer_tenure_days
  FROM orders
  WHERE order_status = 'completed'
  GROUP BY customer_id
)
SELECT
  customer_id,
  total_spend,
  order_count,
  customer_tenure_days,
  total_spend / NULLIF(order_count, 0) AS avg_order_value,
  CASE
    WHEN customer_tenure_days > 365 THEN 'Established'
    WHEN customer_tenure_days > 90 THEN 'Developing'
    ELSE 'New'
  END AS customer_segment,
  CASE
    WHEN DATE_DIFF('day', last_purchase_date, CURRENT_DATE) > 180 THEN 'At Risk'
    WHEN DATE_DIFF('day', last_purchase_date, CURRENT_DATE) > 90 THEN 'Cooling'
    ELSE 'Active'
  END AS activity_status
FROM purchase_history
ORDER BY total_spend DESC

SQL Runner

Purpose and Overview​

Core Capabilities​

Interface Components​

Worksheet Navigator​

Query Editor​

Results Panel​

Creating a New Worksheet​

Writing and Executing SQL​

Basic Query Execution​

Partial Query Execution​

Publishing Worksheets​

The Publishing Process​

Working with Published Worksheets​

Limitations and Best Practices​

Performance Considerations​

Query Optimization​

Data Export​

Security Considerations​

Integration with DataFlow Canvas​

Example Use Cases​

Data Cleaning and Standardization​

Cross-Source Aggregation​

Complex Business Logic Implementation​