SQL Runner
Purpose and Overview
The SQL Runner is a sophisticated query development environment integrated within the Nodus platform, designed to provide advanced data exploration and transformation capabilities. This module enables technical users to interact directly with their data through SQL, facilitating complex analysis, data preparation, and transformation logic development.
Core Capabilities
- Interactive Query Development: Real-time SQL authoring with syntax highlighting
- Query Execution: Immediate processing against connected data sources
- Result Visualization: Tabular representation of query outputs
- Worksheet Management: Organizational structure for query development
- Publishing Mechanism: Promotion of validated queries to production pipelines
- Integration with DataFlow Canvas: Incorporation of SQL transformations into data pipelines
Interface Components
The SQL Runner interface consists of several key components, each serving a specific function in the query development process:
Worksheet Navigator
The left panel presents an organizational structure for SQL queries:
- Database Selector: Choose the target database for execution
- Folder Hierarchy: Logical organization of related worksheets
- Search Functionality: Quick location of specific worksheets
- Creation Controls: Add new worksheets and folders
Query Editor
The central panel provides a Monaco-based code editor optimized for SQL development:
- Syntax Highlighting: Color-coding for SQL keywords and structures
- Auto-completion: Context-aware suggestions for table and column names
- Multiple Statement Support: Execution of sequential SQL commands
Results Panel
The bottom panel displays the output of executed queries:
- Tabular Results: Grid representation of returned data
- Export Options: Data extraction to various formats
- Error Reporting: Detailed diagnostic information
- Pagination: Navigation through large result sets
Creating a New Worksheet
- Navigate to the SQL Runner interface (accessible from the Data Explorer)
- Click the "New Worksheet" button in the navigator panel
- In the creation dialog:
- Select the target database
- Choose an existing folder or create a new one
- Provide a descriptive name for the worksheet
- The new worksheet will open in the editor with a template comment structure
Writing and Executing SQL
Basic Query Execution
- Enter valid SQL in the editor panel
- Execute the query using one of the following methods:
- Click the "Run" button in the toolbar
- View results in the results panel
- Analyze returned data and execution statistics
Partial Query Execution
For complex multi-statement scripts:
- Select the specific SQL statement to execute
- Click the "Run Selected" button
- Only the highlighted portion will be executed
- Results will display in the results panel
Publishing Worksheets
The Publishing Process
Publishing transforms a development worksheet into a production-ready transformation that can be incorporated into DataFlow Canvas pipelines:
- Develop and test the SQL query in a worksheet
- Click the "Publish" button in the worksheet toolbar
- In the publication dialog:
- Verify the worksheet name (used as the reference name)
- Provide a description of the transformation's purpose
- Select the appropriate version tag (e.g., Development, Staging, Production)
- Configure additional metadata as needed
- Confirm the publication
- The worksheet transitions to "Published" status with a visual indicator
Working with Published Worksheets
Published worksheets have special characteristics:
- Version Control: Maintains history of changes with appropriate metadata
- Production Status: Indicates current deployment status
- Usage Tracking: Shows where the transformation is being utilized
- Access Control: May have restricted edit permissions based on status -- Dataflow Canvas Integration: Allows the transformation to be incorporated into DataFlow Canvas pipelines
Limitations and Best Practices
Performance Considerations
- Row Limits: Results are automatically limited to 500 rows to prevent excessive resource consumption
Query Optimization
- Utilize appropriate filtering in the WHERE clause to reduce data volume
- Include explicit column lists rather than SELECT *
- Consider materialized views for frequently queried complex aggregations
- Use EXPLAIN PLAN to identify performance bottlenecks
- Implement appropriate indexing on frequently queried columns
Data Export
- Small result sets can be exported directly from the interface
- Larger exports are processed asynchronously with notification upon completion
- Very large exports may require dedicated extraction jobs through DataFlow Canvas
Security Considerations
- SQL Runner enforces all database-level access controls
- Avoid storing sensitive information in worksheet comments
- Published worksheets may require approval in regulated environments
- Data masking applies to sensitive columns based on user permissions
Integration with DataFlow Canvas
SQL Runner worksheets form a critical component of the Nodus transformation ecosystem:
- Published worksheets appear as available transformations in the DataFlow Canvas
- Within a DataFlow:
- Add a Transformation task group
- Select a published SQL worksheet as the transformation logic
- Configure execution parameters and dependencies
- Connect appropriate source and destination nodes
This integration enables:
- Development-to-Production Workflow: Seamless transition from exploration to production
- Reusable Transformation Logic: Consistent application of business rules
- Parameterized Execution: Dynamic processing based on runtime variables
- Scheduled Processing: Automated execution on defined schedules
Example Use Cases
Data Cleaning and Standardization
-- Standardize customer address data
SELECT
customer_id,
UPPER(TRIM(first_name)) AS first_name,
UPPER(TRIM(last_name)) AS last_name,
REGEXP_REPLACE(phone, '[^0-9]', '') AS normalized_phone,
CASE
WHEN LENGTH(postal_code) = 4 THEN LPAD(postal_code, 5, '0')
ELSE postal_code
END AS standardized_postal_code
FROM raw_customer_data
WHERE customer_id IS NOT NULL
Cross-Source Aggregation
-- Combine marketing performance across channels
WITH google_data AS (
SELECT
date,
'Google' AS source,
campaign_name,
SUM(impressions) AS impressions,
SUM(clicks) AS clicks,
SUM(cost) AS spend
FROM google_ads_campaigns
WHERE date BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY date, campaign_name
),
facebook_data AS (
SELECT
date,
'Facebook' AS source,
campaign_name,
SUM(impressions) AS impressions,
SUM(link_clicks) AS clicks,
SUM(spend) AS spend
FROM facebook_ads_campaigns
WHERE date BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY date, campaign_name
)
SELECT * FROM google_data
UNION ALL
SELECT * FROM facebook_data
ORDER BY date, source, campaign_name
Complex Business Logic Implementation
-- Calculate customer lifetime value with segmentation
WITH purchase_history AS (
SELECT
customer_id,
SUM(order_value) AS total_spend,
COUNT(DISTINCT order_id) AS order_count,
MIN(order_date) AS first_purchase_date,
MAX(order_date) AS last_purchase_date,
DATE_DIFF('day', MIN(order_date), MAX(order_date)) AS customer_tenure_days
FROM orders
WHERE order_status = 'completed'
GROUP BY customer_id
)
SELECT
customer_id,
total_spend,
order_count,
customer_tenure_days,
total_spend / NULLIF(order_count, 0) AS avg_order_value,
CASE
WHEN customer_tenure_days > 365 THEN 'Established'
WHEN customer_tenure_days > 90 THEN 'Developing'
ELSE 'New'
END AS customer_segment,
CASE
WHEN DATE_DIFF('day', last_purchase_date, CURRENT_DATE) > 180 THEN 'At Risk'
WHEN DATE_DIFF('day', last_purchase_date, CURRENT_DATE) > 90 THEN 'Cooling'
ELSE 'Active'
END AS activity_status
FROM purchase_history
ORDER BY total_spend DESC