Skip to content

Workflow Specification (spec.yaml)

A workflow specification defines the tasks, dependencies, and configuration for a compiled workflow. This page is a complete reference for the spec.yaml format consumed by wt-compiler.


Root Structure

id: my_workflow

requirements:
  - name: my-tasks-package
    version: ">=1.0"

rjsf-overrides: {}          # optional
task-instance-defaults: {}   # optional

workflow:
  - id: step_one
    task: do_something
    # ...
Field Type Required Description
id string yes Unique workflow identifier. Must be a valid Python identifier (max 64 chars). Cannot collide with Python keywords, builtins, or any task ID in the workflow.
requirements list yes Conda packages the workflow needs at runtime. See Requirements.
rjsf-overrides object no Overrides for React JSON Schema Form rendering. See RJSF Overrides.
task-instance-defaults object no Defaults applied to every task instance. See Task Instance Defaults.
workflow list yes Ordered list of task instances and task groups. Must be in topological order — every dependency appears before its dependent.

Validation rules

  • All task IDs must be globally unique.
  • No task ID may equal the spec id.
  • Every ${{ workflow.<id>.return }} reference must point to a task defined earlier in the list.
  • Circular dependencies are forbidden.

requirements

Each entry describes a package needed at runtime. Requirements come in two flavors — conda and PyPI — and can be mixed freely.

Conda requirements

Conda packages are resolved from conda channels during environment creation.

Restricted conda channels

The compiler only supports a fixed set of conda channels (conda-forge, microsoft, the ecoscope-workflows prefix.dev channels, and local file-based development channels). Using an unsupported channel will raise a validation error.

requirements:
  - name: python
    version: ">=3.10,<4"
  - name: ecoscope-workflows-core
    version: ">=1.0"
    channel: "https://repo.prefix.dev/ecoscope-workflows"
Field Type Required Default Description
name string yes Package name.
version string yes Version constraint (e.g. ">=3.10", "==1.2.3", "*").
channel string no "conda-forge" Conda channel.

PyPI requirements

PyPI requirements reference packages from a local path, a Git repository, or a direct URL. They are installed via uv pip install into the conda environment during task discovery and appear in the compiled workflow's pixi.toml under [pypi-dependencies].

The compiler distinguishes PyPI requirements from conda requirements automatically: any entry with a git, path, or url key is treated as PyPI.

Local path — simplest option for local development:

requirements:
  - name: my-tasks
    path: /home/user/my-tasks
  - name: my-other-tasks
    path: /home/user/my-other-tasks
    editable: true

Git repository:

requirements:
  - name: my-tasks
    git: https://github.com/org/my-tasks.git
  - name: my-tasks
    git: https://github.com/org/my-tasks.git
    tag: v1.0.0

Direct URL:

requirements:
  - name: my-tasks
    url: https://example.com/my_tasks-1.0.0-py3-none-any.whl
Field Type Required Default Description
name string yes Package name.
git string one of git/path/url Git repository URL.
rev string no Git commit hash (only with git).
branch string no Git branch name (only with git).
tag string no Git tag name (only with git).
path string one of git/path/url Absolute local filesystem path (not file:// URLs, not relative).
editable bool no Install in editable mode (only with path).
url string one of git/path/url Direct URL to a wheel or sdist.
subdirectory string no Subdirectory within the source to install from.
extras list no List of extras to install (e.g. ["dev", "test"]).

Validation rules:

  • Exactly one of git, path, or url must be set.
  • At most one of rev, branch, or tag may be set (and only with git).
  • editable is only valid with path.
  • path must be an absolute filesystem path (not relative, not a file:// URL).

Mixed requirements

Conda and PyPI requirements can appear together:

requirements:
  - name: python
    version: ">=3.10,<4"
  - name: pandas
    version: ">=2.0"
  - name: my-tasks
    path: /home/user/my-tasks

rjsf-overrides

Customizes how workflow parameters are rendered in React JSON Schema Form (RJSF) UIs. Uses dotted-key notation to target nested schema paths.

rjsf-overrides:
  properties:
    get_events_data.properties.event_types:
      title: "Event Types"
      description: "Select one or more event types to include"
      items:
        type: string
        enum: ["immobility", "mortality", "geofence_break"]

  uiSchema:
    get_events_data.event_types:
      ui:widget: "select"
      ui:options:
        displayLabel: false

  $defs:
    ValueGrouper.oneOf:
      - const: "event_category"
        title: "Event Category"
      - const: "event_type"
        title: "Event Type"
Section Purpose
properties Override JSON Schema properties (titles, descriptions, defaults, enums, constraints).
uiSchema Override RJSF UI rendering (widgets, layout, help text).
$defs Override shared JSON Schema definitions (e.g. constrain model oneOf choices).

task-instance-defaults

Defaults applied to every task instance in the workflow. Currently supports only skipif. A task-level value overrides the default.

task-instance-defaults:
  skipif:
    conditions:
      - is_dry_run

workflow:
  - id: step_one
    task: extract
    # inherits skipif from defaults

  - id: step_two
    task: transform
    skipif:
      conditions:
        - custom_condition
    # overrides defaults with its own skipif

workflow

The workflow list contains the tasks that make up the workflow. Each entry is either a task instance or a task group. Entries must be in topological order — every dependency must appear before its dependent.

Task Instances

A task instance is a single invocation of a registered task function.

- id: get_data
  name: "Fetch Raw Data"
  task: ecoscope_workflows_core.tasks.get_events
  partial:
    time_range: ${{ workflow.time_range.return }}
    event_types: ["immobility", "mortality"]
Field Type Required Default Description
id string yes Unique identifier, used as a variable name in the compiled DAG. Valid Python identifier, max 32 chars, cannot be a Python keyword or builtin.
name string no "" Human-readable display name.
task string yes Registered task reference — either a short name (if globally unique) or a fully qualified dotted path (e.g. mypackage.tasks.extract).
partial object no {} Static keyword arguments bound to the task. See partial.
map object no {} Apply the task across an iterable. See map.
mapvalues object no {} Apply the task across key-value pairs. See mapvalues.
skipif object no null Conditional skip configuration. See skipif.

Note

A task instance uses exactly one execution method: call (the default), map, or mapvalues. If both map and mapvalues are present, validation fails.

partial

Binds keyword arguments to a task. Keys must match the task function's parameter names.

- id: create_chart
  task: create_bar_chart
  partial:
    title: "Event Summary"
    data: ${{ workflow.get_events.return }}
    color: "#4a90d9"
    api_key: ${{ env.CHART_API_KEY }}

When the compiled workflow runs, these arguments are passed to the task function via its .partial() method — they are fixed for every invocation.

Variable references

Values in partial (and in map/mapvalues argvalues) can reference task outputs or environment variables using the ${{ ... }} syntax.

Task output — references the return value of a previously defined task:

${{ workflow.<task_id>.return }}

The referenced task must appear earlier in the workflow (topological ordering).

Environment variable — resolved at runtime from os.environ:

${{ env.<VAR_NAME> }}

Inline values — any YAML literal (numbers, strings, booleans, null, lists, dicts) can be used directly:

partial:
  count: 100
  label: "events"
  enabled: true
  options: null
  tags: ["a", "b", "c"]

Mixing references and literals — lists and dicts can freely combine variable references with inline values:

partial:
  sources:
    - ${{ workflow.source_a.return }}
    - ${{ workflow.source_b.return }}
    - "/static/fallback.csv"
  config:
    data: ${{ workflow.extract.return }}
    threshold: 0.95
    debug: false

map

Applies a task to each element of an iterable, producing a list of results.

- id: process_regions
  task: process_region
  partial:
    template: "default"
  map:
    argnames: region
    argvalues: ${{ workflow.get_regions.return }}
Field Type Required Description
argnames string or list yes Parameter name(s) to bind each element to.
argvalues reference(s) yes Variable reference(s) to the iterable(s).

Both fields must be provided together, or both omitted.

Single argument:

map:
  argnames: item
  argvalues: ${{ workflow.get_items.return }}

Equivalent to: [task(item=x) for x in get_items()]

Multiple arguments:

map:
  argnames: [year, month]
  argvalues: ${{ workflow.get_date_pairs.return }}

Equivalent to: [task(year=y, month=m) for y, m in get_date_pairs()]

mapvalues

Applies a task to the values of key-value pairs, preserving the keys in the output.

- id: transform_by_group
  task: transform
  mapvalues:
    argnames: dataset
    argvalues: ${{ workflow.grouped_data.return }}
Field Type Required Description
argnames string or list yes Parameter name to bind each value to.
argvalues reference(s) yes Variable reference(s) to key-value pair iterable(s).

Input: [("group_a", data_a), ("group_b", data_b)] Output: [("group_a", result_a), ("group_b", result_b)]

skipif

Conditionally skip a task based on one or more boolean condition functions.

- id: expensive_step
  task: run_analysis
  skipif:
    conditions:
      - should_skip_analysis
    unpack_depth: 1
  partial:
    data: ${{ workflow.extract.return }}
Field Type Required Default Description
conditions list yes Registered task names that return a boolean. If any returns True, the task is skipped and a SkipSentinel value is returned.
unpack_depth int no 1 Controls unpacking depth of nested list-like arguments when evaluating conditions.

Task Groups

A task group is a logical container for related task instances. Groups are flattened during compilation — they affect documentation and UI presentation but not execution order.

- type: task-group
  title: "Data Extraction"
  description: "Fetch data from all configured sources"
  tasks:
    - id: fetch_events
      task: get_events
      partial:
        client: ${{ workflow.er_client.return }}
    - id: fetch_patrols
      task: get_patrols
      partial:
        client: ${{ workflow.er_client.return }}
Field Type Required Description
type "task-group" yes Literal marker that identifies this entry as a group.
title string yes Group title.
description string yes Group description.
tasks list yes List of task instances.

Note

Task groups cannot be nested. Task IDs inside a group must still be globally unique across the entire workflow.


Complete Example

id: patrol_events_dashboard

requirements:
  - name: ecoscope-workflows-core
    version: ">=1.0"
    channel: "https://repo.prefix.dev/ecoscope-workflows"

rjsf-overrides:
  properties:
    get_events_data.properties.event_types:
      items:
        enum: ["immobility", "mortality"]
  uiSchema:
    er_client_name:
      ui:widget: "select"

task-instance-defaults:
  skipif:
    conditions:
      - is_dry_run

workflow:
  # -- setup --
  - id: workflow_details
    name: "Workflow Details"
    task: set_workflow_details
    partial:
      title: "Patrol Events Dashboard"

  - id: er_client
    name: "Data Source"
    task: set_er_connection

  - id: time_range
    name: "Time Range"
    task: set_time_range

  # -- data extraction --
  - type: task-group
    title: "Data Extraction"
    description: "Fetch patrol and event data from EarthRanger"
    tasks:
      - id: get_events
        name: "Get Events"
        task: get_events
        partial:
          client: ${{ workflow.er_client.return }}
          time_range: ${{ workflow.time_range.return }}
          event_types: ["immobility", "mortality"]

      - id: get_patrols
        name: "Get Patrols"
        task: get_patrols
        partial:
          client: ${{ workflow.er_client.return }}
          time_range: ${{ workflow.time_range.return }}

  # -- grouping --
  - id: groupers
    name: "Set Groupers"
    task: set_groupers

  # -- widgets --
  - id: event_count
    name: "Event Count Widget"
    task: create_single_value_widget_single_view
    partial:
      data: ${{ workflow.get_events.return }}
      label: "Total Events"

  - id: patrol_map
    name: "Patrol Map Widget"
    task: create_map_widget_single_view
    partial:
      data: ${{ workflow.get_patrols.return }}
      title: "Patrol Coverage"

  # -- dashboard assembly --
  - id: dashboard
    name: "Gather Dashboard"
    task: gather_dashboard
    partial:
      details: ${{ workflow.workflow_details.return }}
      time_range: ${{ workflow.time_range.return }}
      widgets:
        - ${{ workflow.event_count.return }}
        - ${{ workflow.patrol_map.return }}
      groupers: ${{ workflow.groupers.return }}