Squid Mesh - workflow automation runtime for Elixir applications

Up next: Saga Semantics GitHub · Where software is built

Quick update: out of professional respect for the work behind Oban and Oban Pro, and because Squid Mesh operates in an adjacent space, I’m going to stop using Oban directly as the execution backend.

The current integration uses Oban as the durable job substrate (we don’t use it for retries, exp. backoff, etc). Going forward, Squid Mesh will provide its own narrow execution engine focused only on the runtime guarantees the project needs: durable step dispatch, delayed continuations, restart/redelivery behavior, cron activation, and test support.

That should hopefully mean a few weeks of development (to build only what I need at the moment) until I have something ready to be used.

2 Likes

Btw, saga compensation is now available on main.

You can now define compensation steps for workflow actions so failed runs can trigger cleanup or rollback behavior as part of the workflow definition. Docs and examples are on the branch as well.

1 Like

Squid Mesh 0.1.0-alpha.6 is out.

Up next, I’m going to detach the runtime from Oban so Squid Mesh can stay executor-agnostic in the short term. Oban will remain a supported integration, but I want the core workflow model to avoid assuming one specific job backend before introducing a new executor.

The goal is to keep the library easier to embed in different host apps while the runtime surface is still early and flexible.

1 Like

Up next, I’m going to detach the runtime from Oban so Squid Mesh can stay executor-agnostic in the short term.

After this, you should be able to implement SquidMesh.Executor behaviour to use whatever executor you like (Oban, EctoJob, Exq, etc)
e.g.

@behaviour SquidMesh.Executor

@impl true
def enqueue_step(_config, _run, _step, _opts)

@impl true
def enqueue_steps(_config, _run, _steps, _opts)

@impl true
def enqueue_compensation(_config, _run, _opts)

@impl true
def enqueue_cron(_config, _workflow, _trigger, _opts)

PR is up Add pluggable executor boundary by ccarvalho-eng · Pull Request #159 · ccarvalho-eng/squid_mesh · GitHub

2 Likes

Squid Mesh is now moving under a new roadmap:

Squid Mesh is a Jido-native durable workflow layer with a friendly Elixir DSL for human-in-the-loop and agentic workflows.

The public workflow API stays focused on Squid Mesh concepts: workflows, steps, approvals, waits, retries, inspection, and explanation. Under the hood, the new core is being rebuilt around Jido signals, actions, agents, thread journals, checkpoints, and Jido.Storage, with Runic as the pure workflow planner and Spark powering the DSL/spec layer.

A few important boundaries:

  • Squid Mesh owns durable workflow semantics and inspection.
  • Jido provides the runtime primitives underneath.
  • The happy-path API should not require users to learn Jido internals.
  • Oban is not part of Squid Mesh core.
  • External executor integrations can come later, after the Jido-native core is proven.
  • The first milestone focuses on durable dispatch, journals/checkpoints, Runic planning, workflow/dispatch agents, storage, and projection-based inspection.

Roadmap milestone:

Umbrella issue:

Feedback is welcome, especially around the public step contract and where the boundary should sit between Squid Mesh-native APIs and Jido-aware advanced usage.

2 Likes

@dimitarvp FYI: Most of the suggestions you’ve made earlier are now implemented and some are kept in the roadmap.

1 Like

Much appreciated. I still have not gotten around to persistent workflow library in my work (with all the saga compensation/undo and others) so I’ll re-review SquidMesh in the next weeks and let you know if I see any gaps or have any architectural remarks.

1 Like

Just merged the library positioning guide: squid_mesh/docs/positioning.md at main · ccarvalho-eng/squid_mesh · GitHub

Good for a brief overview of capabilities and comparison against similar libraries

Squid Mesh 0.1.0-alpha.7 is out.

This release continues the work toward a stronger runtime core and clearer host-app boundary.

Highlights:

  • Added a pluggable executor boundary for step execution, delayed scheduling, redelivery, and
    cron activation
  • Added native SquidMesh.Step modules, while keeping raw Jido.Action support as an explicit interop path
  • Added durable dispatch protocol docs and runtime projection invariants
  • Added a Runic workflow planner boundary for graph and mapping facts
  • Added Jido storage journal support, durable rebuild fences, and rebuildable runtime agent
    checkpoints
  • Updated examples to use native Squid Mesh steps by default
  • Hardened planner mappings, dispatch projection validation, journal replay decoding, and agent
    replay recovery

Links:

This is still an alpha release. The runtime is suitable for evaluation, local development, and integration work, but it is not yet positioned as production-ready.

1 Like

Adding a UI for Squid Mesh (read-only for now):

3 Likes

Once I finish the remaining long-running step recovery work, I’ll feel much more confident moving Squid Mesh from alpha toward beta.

The main architectural gap I still want to close is the runtime switchover to the Jido-native coordination path. The direction is to make workflow and dispatch agents rebuild from durable journal state, with claim fencing, leases, heartbeats, completion, failure, retry, and recovery all represented as durable facts.

For long-running steps specifically, the missing piece is heartbeat-backed recovery. I’m planning to lean on the model from Mike Hostetler’s Intent Ledger project: GitHub - mikehostetler/intent_ledger · GitHub . Once Squid Mesh can safely recover claimed long-running work after worker crashes, restarts, or expired leases, the runtime story should be solid enough that I’d be comfortable calling the project beta rather than alpha.

Still a few focused PRs away, but the pieces are starting to line up.

OMG this is awesome. I already feel like I am missing out though I know you are not quite done with the initial items we discussed (or at least this is what I extracted from the recent announcements).

Yeah. We’ll get there. Thanks for all the hints and feature suggestions. Hopefully the last 9 issues are what we need to call it a solid v1 wrap! Issues · ccarvalho-eng/squid_mesh · GitHub

1 Like

Thank you for the great addition to elixir ecosystem.

I have a question why you opted to have it running inside the related parent application instead of having it running independently as in my work most of the times we need durable workflow for processes that spans across multiple systems and teams?a workflow could be triggered through API or UI form or even timed event ,..etc then it start interacting with people or related systems based on defined flow rules.

By the way do you plan to add SLA capability per each step ? Also , capability to limit what each user see in the flow as sometimes you have external users who should not all internal steps and communication

Thanks, appreciate the thoughtful questions.

The main reason Squid Mesh is designed to run inside the host application is proximity to domain logic. Most workflow steps need to call existing contexts, use the app’s auth/authorization rules, share the same database transaction boundaries, emit domain events, and reuse local observability. If the workflow runtime runs as a separate service, you often end up rebuilding a second integration layer just so it can call back into the system that already owns the business rules.

That said, it does not have to be embedded in a large existing app. You can absolutely create a small dedicated Elixir app that embeds Squid Mesh and acts as the workflow host for a broader system. That is probably the right shape when a workflow spans multiple teams or systems: keep Squid Mesh close to the orchestration/domain code for that workflow, while integrations with other systems happen through APIs, messages, webhooks, etc. We also have a standalone harness today, but it is mostly for QA/smoke testing rather than the recommended production deployment shape.

On SLA per step: if you mean things like expected completion time, timeout/deadline, escalation, alerting, or “this approval must happen within 24h”, then yes, that fits the direction of the project. Some of that can already be modeled with retries, pauses/manual actions, and external monitoring, but first-class step-level deadlines/escalations would be a good capability.

On permissions/visibility: yes, I think that is important. The runtime should be able to expose different views of the same run depending on the actor. For example, an external user may see only their task/status while internal operators see full step history, errors, audit events, and system communication. That likely belongs as an authorization/read-model layer rather than changing the underlying workflow history.

@dimitarvp FYI: this is the summary of features implemented after your suggestions:

Implemented:

  • Human-in-the-loop pause/manual unblock:
    :pause, :approval, unblock_run/2, approve_run/3, and reject_run/3.
  • Machine-readable explanation/introspection:
    SquidMesh.explain_run/2 returns reason, evidence, and next actions.
  • Saga compensation callbacks:
    compensate: SomeStep runs completed reversible steps in reverse completion order and persists recovery metadata.
  • Irreversible/non-compensatable markers:
    irreversible: true and compensatable: false affect recovery policy and block replay unless allow_irreversible: true.
  • Compensation vs undo distinction:
    recovery: :compensation | :undo exists on :error transitions and is surfaced in docs/audit history.
  • Accumulated state access:
    Step input can merge payload, run context, and completed dependency outputs, with input: and output: mapping.
  • Static fan-out/fan-in:
    Dependency workflows support after: […], multiple root steps, joins, and independent host-executor dispatch.
  • Local transaction grouping:
    transaction: :repo wraps a single custom step callback in the configured Ecto repo transaction.
  • Positioning against adjacent libraries:
    Docs compare Squid Mesh with Sage, Reactor, FlowStone, Runic, and Jido.
  • Squid Mesh-native step API:
    SquidMesh.Step is now the preferred authoring contract; raw Jido.Action remains an interop
    path.

Remaining or partial:

  • Full compensation/undo/rollback semantics:
    Compensation exists, undo is represented as recovery routing metadata, but there is not yet a
    full automatic undo-chain engine with partial-success reporting.
  • Rich conditional/deferred continuation:
    Basic accumulated state exists, but KYC-style “defer this step and continue/poll later” behavior is still planned.
  • Dynamic step injection / graph expansion:
    Still planned, not implemented.
  • Transaction across fan-out/sub-steps:
    transaction: :repo only covers one local step callback, not multiple workflow steps or distributed work.
  • Long-running step recovery:
    Lease/heartbeat/fencing foundations exist in the Jido-native path, but heartbeat-backed live recovery is not fully wired yet.
  • Journal-backed inspection/explanation:
    Current inspection/explanation reads Postgres runtime tables; durable journal/checkpoint projections are still open work.
  • Full Jido-native runtime switchover:
    Protocols, journals, workflow agents, and dispatch agents exist as foundations, but the live
    runtime still uses the current Postgres plus host-executor path.
  • Exactly-once external effects:
    Explicitly out of scope; external systems still need idempotency keys or duplicate-safe behavior.
1 Like

I’ve been thinking that Squid Mesh is already a strong use case for Jido, but it probably also needs its own strong use case.

One idea I’m considering is Rift: an embeddable Phoenix/LiveView ops inbox for workflows that need human decisions.

The rough idea is that the host app defines case types in code. Users open cases through host-configured forms. Each case starts a Squid Mesh workflow run. Operators then review cases in a LiveView inbox, claim/assign them, approve/reject/cancel, and inspect runtime details through SquidSonar.

The goal would not be to build a generic help desk or a no-code workflow builder. The host app would still own domain logic, auth, users, tenants, teams, file storage, and the actual workflow modules. Rift would provide the human-facing ops surface around durable workflows.

Part of the motivation is dogfooding. A real app like this would put pressure on Squid Mesh in useful places: human-in-the-loop flows, approvals/rejections, cancellation, audit trails, side effects, failed side effects, stale operator actions, and runtime inspection. That seems like a better feedback loop than only adding examples inside the main repo.

I started a PR with a PLAN.md: GitHub - dark-trench/rift: LiveView ops inbox for human workflow decisions · GitHub

Curious if this direction resonates with people who have had to build internal approval queues, ops review flows, or human-in-the-loop admin workflows in Phoenix apps.

3 Likes

Dogfooding absolutely is the best way to test your library in the real space without waiting for your users to do so. Kudos.

1 Like