Skip to main content

DataFlow Canvas

Purpose and Overview

The DataFlow Canvas is the central orchestration environment within the Nodus platform, providing a visual, low-code interface for designing, implementing, and managing end-to-end data workflows. This module enables the creation of sophisticated data pipelines through an intuitive drag-and-drop interface, facilitating the seamless movement and transformation of data across various systems and repositories.

Core Capabilities

  • Visual Workflow Design: Intuitive drag-and-drop interface for pipeline construction
  • Multi-stage Processing: Sequential and parallel execution of data operations
  • Comprehensive Integration: Connection to diverse data sources and destinations
  • Transformation Logic: Implementation of complex data manipulation requirements
  • Execution Control: Flexible scheduling and triggering mechanisms
  • Monitoring and Logging: Real-time visibility into pipeline execution status
  • Error Handling: Sophisticated failure management and recovery options

Canvas Components

The DataFlow Canvas architecture is built around a modular component structure, enabling flexible pipeline composition:

Task Group Blocks

Task Group Blocks represent functional categories of operations within a data pipeline:

  1. Source/Extraction Blocks: Components that retrieve data from external systems

    • Platform integrations (e.g., Google Analytics, Shopify, LinkedIn)
    • Database connectors (e.g., PostgreSQL, MySQL, Oracle)
  2. Transformation Blocks: Components that modify, reshape, or enhance data

    • SQL transformations (from SQL Runner)
  3. Destination Blocks: Components that deliver data to target systems

    • Data warehouses (e.g., Snowflake, BigQuery, Redshift)
    • Data lakes (e.g., S3, Azure Data Lake)
    • Operational databases
    • File exports

Task Blocks

Within each Task Group Block, individual Task Blocks represent specific operational units:

  • Configuration Parameters: Settings that control behavior
  • Input/Output Schema: Data structure specifications
  • Execution Settings: Runtime behavior controls
  • Schedule Definition: Temporal execution parameters
  • Dependency Links: Relationships with other tasks

Connections

Visual links between components that define:

  • Data Flow Direction: Sequence of operations
  • Dependency Relationships: Execution prerequisites
  • Parameter Passing: Context propagation between steps
  • Conditional Paths: Branching logic based on conditions

Building DataFlows

Creating a New DataFlow

  1. Navigate to the DataFlow Canvas from the main navigation
  2. Click "Create New DataFlow" or select from available templates
  3. Provide a descriptive name and optional description
  4. Select the appropriate workspace context
  5. The canvas will initialize with a blank design surface

Adding Task Group Blocks

  1. Click the "+" button on the canvas
  2. Select the appropriate task group category:
    • Source/Extraction
    • Transformation
    • Destination
  3. Position the task group block on the canvas
  4. Configure group-level settings as needed

Adding Task Blocks

  1. Within a task group block, click "Add Task"

  2. Select from available task types based on the group category

  3. Configure task-specific parameters:

    For Source/Extraction tasks:

    • Select the integrated platform (e.g., Google Analytics 4)
    • Choose the extraction template or query
    • Configure data selection criteria
    • Set incremental extraction parameters

    For Transformation tasks:

    • Select a published SQL Runner worksheet
    • Configure input parameter mappings
    • Specify execution options
    • Define error handling behavior

    For Destination tasks:

    • Select the target system
    • Configure connection parameters
    • Specify target table or location
    • Choose write mode (append/replace/upsert)

Configuring Execution Controls

For each task or task group:

  1. Select the task or group to access its properties panel
  2. Configure execution settings:
    • Schedule (frequency, time window, exclusions)
    • Dependencies (upstream task completion)

Execution Modes

The DataFlow Canvas supports multiple execution paradigms to accommodate various operational requirements:

Sequential Processing

Tasks execute in a defined order, with each step waiting for the previous one to complete:

Source Extraction → Transformation → Destination Loading

This approach ensures data consistency throughout the pipeline and is suitable for processes where each step depends on the complete results of the previous step.

Parallel Processing

Multiple tasks execute simultaneously when there are no interdependencies:

Source A Extraction ──→ Transformation A ──→ Destination X
Source B Extraction ──→ Transformation B ──→ Destination Y
Source C Extraction ──→ No Transformation ──→ Destination Z

This approach maximizes throughput and reduces overall execution time.

Hybrid Orchestration

Combination of sequential and parallel execution based on logical dependencies:

                      ┌──→ Transformation A ──→┐
Source Extraction ────┤ ├──→ Destination Loading
└──→ Transformation B ──→┘

This approach optimizes resource utilization while maintaining necessary processing order.

Scheduling Options

DataFlow Canvas provides flexible scheduling mechanisms to automate pipeline execution:

Time-Based Scheduling

  • Fixed Schedule: Execute at specific times (e.g., daily at 2:00 AM)
  • Interval-Based: Execute at regular intervals (e.g., every 6 hours)
  • Cron Expression: Advanced scheduling using cron syntax
  • Calendar Integration: Alignment with business calendars and exclusions

Event-Driven Execution

  • Dependency Completion: Start when prerequisite workflows complete
  • Manual Triggering: Ad-hoc execution initiated by authorized users

Monitoring and Observability

Comprehensive visibility into execution status and performance:

  • Real-time Execution Tracking: Visual indication of current processing stage
  • Historical Performance Analysis: Trend analysis for execution metrics
  • Detailed Logging: Comprehensive record of all operations

Use Cases

Marketing Data Integration

Scenario: Consolidate marketing performance data from multiple channels for unified analysis.

Implementation:

  • Extract campaign data from Google Analytics, Facebook Ads, and LinkedIn
  • Transform metrics to standardized format with consistent dimensions
  • Load unified dataset to data warehouse for cross-channel analysis
  • Schedule daily execution with appropriate incremental processing