DataFlow Canvas

Purpose and Overview

The DataFlow Canvas is the central orchestration environment within the Nodus platform, providing a visual, low-code interface for designing, implementing, and managing end-to-end data workflows. This module enables the creation of sophisticated data pipelines through an intuitive drag-and-drop interface, facilitating the seamless movement and transformation of data across various systems and repositories.

Core Capabilities

Visual Workflow Design: Intuitive drag-and-drop interface for pipeline construction
Multi-stage Processing: Sequential and parallel execution of data operations
Comprehensive Integration: Connection to diverse data sources and destinations
Transformation Logic: Implementation of complex data manipulation requirements
Execution Control: Flexible scheduling and triggering mechanisms
Monitoring and Logging: Real-time visibility into pipeline execution status
Error Handling: Sophisticated failure management and recovery options

Canvas Components

The DataFlow Canvas architecture is built around a modular component structure, enabling flexible pipeline composition:

Task Group Blocks

Task Group Blocks represent functional categories of operations within a data pipeline:

Source/Extraction Blocks: Components that retrieve data from external systems
- Platform integrations (e.g., Google Analytics, Shopify, LinkedIn)
- Database connectors (e.g., PostgreSQL, MySQL, Oracle)
Transformation Blocks: Components that modify, reshape, or enhance data
- SQL transformations (from SQL Runner)
Destination Blocks: Components that deliver data to target systems
- Data warehouses (e.g., Snowflake, BigQuery, Redshift)
- Data lakes (e.g., S3, Azure Data Lake)
- Operational databases
- File exports

Task Blocks

Within each Task Group Block, individual Task Blocks represent specific operational units:

Configuration Parameters: Settings that control behavior
Input/Output Schema: Data structure specifications
Execution Settings: Runtime behavior controls
Schedule Definition: Temporal execution parameters
Dependency Links: Relationships with other tasks

Connections

Visual links between components that define:

Data Flow Direction: Sequence of operations
Dependency Relationships: Execution prerequisites
Parameter Passing: Context propagation between steps
Conditional Paths: Branching logic based on conditions

Building DataFlows

Creating a New DataFlow

Navigate to the DataFlow Canvas from the main navigation
Click "Create New DataFlow" or select from available templates
Provide a descriptive name and optional description
Select the appropriate workspace context
The canvas will initialize with a blank design surface

Adding Task Group Blocks

Click the "+" button on the canvas
Select the appropriate task group category:
- Source/Extraction
- Transformation
- Destination
Position the task group block on the canvas
Configure group-level settings as needed

Adding Task Blocks

Within a task group block, click "Add Task"
Select from available task types based on the group category
Configure task-specific parameters:
For Source/Extraction tasks:
- Select the integrated platform (e.g., Google Analytics 4)
- Choose the extraction template or query
- Configure data selection criteria
- Set incremental extraction parameters
For Transformation tasks:
- Select a published SQL Runner worksheet
- Configure input parameter mappings
- Specify execution options
- Define error handling behavior
For Destination tasks:
- Select the target system
- Configure connection parameters
- Specify target table or location
- Choose write mode (append/replace/upsert)

Configuring Execution Controls

For each task or task group:

Select the task or group to access its properties panel
Configure execution settings:
- Schedule (frequency, time window, exclusions)
- Dependencies (upstream task completion)

Execution Modes

The DataFlow Canvas supports multiple execution paradigms to accommodate various operational requirements:

Sequential Processing

Tasks execute in a defined order, with each step waiting for the previous one to complete:

Source Extraction → Transformation → Destination Loading

This approach ensures data consistency throughout the pipeline and is suitable for processes where each step depends on the complete results of the previous step.

Parallel Processing

Multiple tasks execute simultaneously when there are no interdependencies:

Source A Extraction ──→ Transformation A ──→ Destination X
Source B Extraction ──→ Transformation B ──→ Destination Y
Source C Extraction ──→ No Transformation ──→ Destination Z  

This approach maximizes throughput and reduces overall execution time.

Hybrid Orchestration

Combination of sequential and parallel execution based on logical dependencies:

                      ┌──→ Transformation A ──→┐
Source Extraction ────┤                        ├──→ Destination Loading
                      └──→ Transformation B ──→┘

This approach optimizes resource utilization while maintaining necessary processing order.

Scheduling Options

DataFlow Canvas provides flexible scheduling mechanisms to automate pipeline execution:

Time-Based Scheduling

Fixed Schedule: Execute at specific times (e.g., daily at 2:00 AM)
Interval-Based: Execute at regular intervals (e.g., every 6 hours)
Cron Expression: Advanced scheduling using cron syntax
Calendar Integration: Alignment with business calendars and exclusions

Event-Driven Execution

Dependency Completion: Start when prerequisite workflows complete
Manual Triggering: Ad-hoc execution initiated by authorized users

Monitoring and Observability

Comprehensive visibility into execution status and performance:

Real-time Execution Tracking: Visual indication of current processing stage
Historical Performance Analysis: Trend analysis for execution metrics
Detailed Logging: Comprehensive record of all operations

Use Cases

Marketing Data Integration

Scenario: Consolidate marketing performance data from multiple channels for unified analysis.

Implementation:

Extract campaign data from Google Analytics, Facebook Ads, and LinkedIn
Transform metrics to standardized format with consistent dimensions
Load unified dataset to data warehouse for cross-channel analysis
Schedule daily execution with appropriate incremental processing

DataFlow Canvas

Purpose and Overview​

Core Capabilities​

Canvas Components​

Task Group Blocks​

Task Blocks​

Connections​

Building DataFlows​

Creating a New DataFlow​

Adding Task Group Blocks​

Adding Task Blocks​

Configuring Execution Controls​

Execution Modes​

Sequential Processing​

Parallel Processing​

Hybrid Orchestration​

Scheduling Options​

Time-Based Scheduling​

Event-Driven Execution​

Monitoring and Observability​

Use Cases​

Marketing Data Integration​