Oban.Pro.Workers.Workflow behaviour (Oban Pro v0.14.0)

Workflow workers compose together with arbitrary dependencies between jobs, allowing sequential, fan-out, and fan-in execution workflows. Workflows are fault tolerant, can be homogeneous or heterogeneous, and scale horizontally across all available nodes.

Workflow jobs aren't executed until all upstream dependencies have completed. This includes waiting on retries, scheduled jobs, or snoozing.

usage

Usage

Each worker within a workflow must use Oban.Pro.Workers.Workflow rather than Oban.Worker, otherwise jobs will execute without any dependency management.

Our example workflow will coordinate archiving an account. To properly archive the account we must perform the following workers:

  1. FinalReceipt — generate a final receipt and email it to the account owner
  2. EmailSubscriber — email each subscriber on the account
  3. BackupPost — up all important posts on the account to external storage
  4. DeleteAccount — delete the account from the database

We need step 1 to run before step 2 or 3, but both steps 2 and 3 can run in parallel. After all of the jobs in steps 2 and 3 finish we can finally delete the account from the database. For demonstration purposes we'll define one of the workers with a stubbed-in process/1 function:

defmodule MyApp.FinalReceipt do
  use Oban.Pro.Workers.Workflow, queue: :default

  @impl true
  def process(%Job{} = _job) do
    # Generate the final receipt.

    :ok
  end
end

Note that we define a process/1 callback instead of perform/1 because the perform/1 function coordinates the workflow. The process/1 function receives an Oban.Job struct, just like perform/1 and it should return similar values, i.e. :ok, {:ok, value}.

Assuming that you've defined the other workers, we can construct a workflow to enforce sequential execution:

alias MyApp.{BackupPost, DeleteAccount, EmailSubscriber, FinalReceipt}

def archive_account(acc_id) do
  FinalReceipt.new_workflow()
  |> FinalReceipt.add(:receipt, FinalReceipt.new(%{id: acc_id}))
  |> FinalReceipt.add(:email, EmailSubscriber.new(%{id: acc_id}), deps: [:receipt])
  |> FinalReceipt.add(:backup, BackupPost.new(%{id: acc_id}), deps: [:receipt])
  |> FinalReceipt.add(:delete, DeleteAccount.new(%{id: acc_id}), deps: [:backup, :receipt])
  |> Oban.insert_all()
end

Here we use new_workflow/1 to initialize a new workflow (later on we will look at how to customize workflow behaviour with various options). Jobs are then added to the workflow with add/3,4, where the first argument is an existing workflow, the second argument is unique name, the third is a job changeset, and the fourth an optional list of dependencies. Finally, the workflow passes directly to Oban.insert_all/1,2, which handles inserting each of the jobs into the database atomically.

All workers that use Workflow have new_workflow/1 and add/4 functions defined, so it doesn't matter which module you use to initialize or to add new jobs.

Dependency resolution guarantees that jobs execute in the order receipt -> email -> backup -> delete, even if one of the jobs fails and needs to retry. Any job that defines a dependency will wait for each upstream dependency to complete before it starts.

Recall how our original specification stated that there could be multiple email and backup jobs? The workflow we've built only handles one email or backup job at a time.

Let's modify the workflow to fan-out to multiple email and backup jobs and then fan-in to the final delete job:

def archive_account_workflow(acc_id) do
  subscribers = subscribers_for_account(acc_id)
  documents = documents_for_account(acc_id)

  email_deps = Enum.map(subscribers, &"email_#{&1.id}")
  backup_deps = Enum.map(documents, &"backup_#{&1.id}")
  delete_deps = email_deps ++ backup_deps

  FinalReceipt.new_workflow()
  |> FinalReceipt.add(:receipt, FinalReceipt.new(%{id: acc_id}))
  |> add_email_jobs(subscribers)
  |> add_backup_jobs(documents)
  |> DeleteAccount.add(:delete, DeleteAccount.new(%{id: acc_id}), deps: delete_deps)
end

defp add_email_jobs(workflow, subscribers) do
  Enum.reduce(subscribers, workflow, fn %{id: id, email: email}, acc ->
    EmailSubscriber.add(acc, "email_#{id}", EmailSubscriber.new(%{email: email}), deps: [:receipt])
  end)
end

defp add_backup_jobs(workflow, subscribers) do
  Enum.reduce(subscribers, workflow, fn %{id: id}, acc ->
    BackupPost.add(acc, "backup_#{id}", BackupPost.new(%{id: id}), deps: [:receipt])
  end)
end

Now the workflow will run all of the email and backup jobs we need, before deleting the account. To confirm the flow without inserting and executing jobs we can visualize it.

creating-workflows

Creating Workflows

Workflows use conservative defaults for safe, and relatively quick, dependency resolution. You can customize the safety checks, waiting times and resolution intensity by providing a few top-level options:

  • ignore_cancelled — regard cancelled dependencies as completed rather than halting the workflow. Defaults to false.

  • ignore_discarded — regard discarded dependencies as completed rather than halting the workflow. Defaults to false.

  • ignore_deleted — regard deleted (typically pruned) dependencies as completed rather than halting the workflow. Defaults to false.

  • waiting_delay — the number of milliseconds to wait between dependency checks. This value, combined with the waiting_limit, dictates how long a job will await upstream dependencies. Defaults to 1000ms.

  • waiting_limit — the number of times to retry dependency checks. This value, combined with waiting_delay, dictates how long a job will await upstream dependencies. Defaults to 10.

  • waiting_snooze — the number of seconds a job will snooze after awaiting executing upstream dependencies. If upstream dependencies are scheduled or retryable then the job snoozes until the latest dependency's scheduled_at timestamp.

The following example creates a workflow with all of the available options:

alias Oban.Pro.Workers.Workflow

workflow = Workflow.new(
  ignore_cancelled: true,
  ignore_deleted: true,
  ignore_discarded: true,
  waiting_delay: :timer.seconds(5),
  waiting_limit: 5,
  waiting_snooze: 10
)

Internally, the meta field stores options for each job. That makes it possible to set or override workflow options per-job. For example, configure a single job to ignore cancelled dependencies and another to snooze longer:

MyWorkflow.new_workflow()
|> MyWorkflow.add(:a, MyWorkflow.new(%{}))
|> MyWorkflow.add(:b, MyWorkflow.new(%{}, meta: %{ignore_cancelled: true}), deps: [:a])
|> MyWorkflow.add(:c, MyWorkflow.new(%{}, meta: %{waiting_snooze: 30}), deps: [:b])

Dependency resolution relies on jobs lingering in the database after execution. If your system prunes job dependencies then the workflow may never complete. To override this behaviour, set ignore_deleted: true on your workflows.

generating-workflow-ids

Generating Workflow IDs

By default workflow_id is a version 4 random UUID. This is more than sufficient to ensure that workflows are unique for any period of time. However, if you require more control you can override workflow_id generation at the worker level, or pass a value directly to the new_workflow/1 function.

To override the workflow_id for a particular workflow you override the gen_id/0 callback:

defmodule MyApp.Workflow do
  use Oban.Pro.Workers.Workflow

  # Generate a 24 character long random string instead
  @impl true
  def gen_id do
    24
    |> :crypto.strong_rand_bytes()
    |> Base.encode64()
  end
  ...
end

The gen_id/0 callback works for random/non-deterministic id generation. If you'd prefer to use a deterministic id instead you can pass the workflow_id in as an option to new_workflow/1:

MyApp.Workflow.new_workflow(workflow_id: "custom-id")

Using this technique you can verify the workflow_id in tests or append to the workflow manually after it was originally created.

appending-to-a-workflow

Appending to a Workflow

Sometimes all jobs aren't known when the workflow is created. In that case, you can add more jobs with optional dependency checking using append_workflow/2. An appended workflow starts with one or more jobs, which reuses the original workflow id, and optionally builds a set of dependencies for checking.

In this example we disable deps checking with check_deps: false:

defmodule MyApp.WorkflowWorker do
  use Oban.Pro.Workers.Workflow

  @impl true
  def process(%Job{} = job) do
    jobs =
      job
      |> append_workflow(check_deps: false)
      |> add(:d, WorkerD.new(%{}), deps: [:a])
      |> add(:e, WorkerE.new(%{}), deps: [:b])
      |> add(:f, WorkerF.new(%{}), deps: [:c])
      |> Oban.insert_all()

    {:ok, jobs}
  end
end

The new jobs specify deps on preexisting jobs named :a, :b, and :c, but there isn't any guarantee those jobs actually exist. That could lead to an incomplete workflow where the new jobs may never complete.

To be safe and check jobs while appending we'll fetch all of the previous jobs and feed them in:

defmodule MyApp.WorkflowWorker do
  use Oban.Pro.Workers.Workflow

  @impl true
  def process(%Job{} = job) do
    {:ok, jobs} =
      MyApp.Repo.transaction(fn ->
        job
        |> stream_workflow_jobs()
        |> Enum.to_list()
      end)

    jobs
    |> append_workflow()
    |> add(:d, WorkerD.new(%{}), deps: [:a])
    |> add(:e, WorkerE.new(%{}), deps: [:b])
    |> add(:f, WorkerF.new(%{}), deps: [:c])
    |> Oban.insert_all()

    :ok
  end
end

Now there isn't any risk of an incomplete workflow, at the expense of loading some extraneous jobs.

fetching-workflow-jobs

Fetching Workflow Jobs

The c:all_workflow_jobs/1,2 function simplifies loading all jobs in a workflow from within a worker. For example, to fetch all of the jobs in a workflow:

defmodule MyApp.Workflow do
  use Oban.Pro.Workers.Workflow

  @impl Workflow
  def process(%Job{} = job) do
    job
    |> all_workflow_jobs()
    |> do_things_with_jobs()

    :ok
  end
end

It's also possible to scope fetching to only dependencies of the current job:

deps = all_workflow_jobs(job, only_deps: true)

Or only a single explicit dependency:

[dep_job] = all_workflow_jobs(job, names: [:a])

For large workflows it's not efficient to load all jobs in memory at once. In that case, you can the c:stream_workflow_jobs/1,2 callback instead to fetch jobs lazily. For example, to stream all of the completed jobs in a workflow:

defmodule MyApp.Workflow do
  use Oban.Pro.Workers.Workflow

  @impl Workflow
  def process(%Job{} = job) do
    {:ok, workflow_jobs} =
      MyApp.Repo.transaction(fn ->
        job
        |> stream_workflow_jobs()
        |> Stream.filter(& &1.state == "completed")
        |> Enum.to_list()
      end)

    do_things_with_jobs(workflow_jobs)

    :ok
  end
end

Streaming is provided by Ecto's Repo.stream, and it must take place within a transaction. Using a stream lets you control the number of jobs loaded from the database, minimizing memory usage for large workflows.

visualizing-workflows

Visualizing Workflows

Workflows are a type of Directed Acyclic Graph, also known as a DAG. That means we can describe a workflow as a graph of jobs and dependencies, where execution flows between jobs. By converting the workflow into DOT notation, a standard graph description language, we can render visualizations!

Dot generation relies on libgraph, which is an optional dependency. You'll need to specify it as a dependency before generating dot output:

def deps do
  [{:libgraph, "~> 0.7"}]
end

Once you've installed libgraph, we can use to_dot/1 to convert a workflow. As with new_workflow and add, all workflow workers define a to_dot/1 function that takes a workflow and returns a dot formatted string. For example, calling to_dot/1 with the account archiving workflow from above:

FinalReceipt.to_dot(archive_account_workflow(123))

Generates the following dot output, where each vertex is a combination of the job's name in the workflow and its worker module:

strict digraph {
    "delete (MyApp.DeleteAccount)"
    "backup_1 (MyApp.BackupPost)"
    "backup_2 (MyApp.BackupPost)"
    "backup_3 (MyApp.BackupPost)"
    "receipt (MyApp.FinalReceipt)"
    "email_1 (MyApp.EmailSubscriber)"
    "email_2 (MyApp.EmailSubscriber)"
    "backup_1 (MyApp.BackupPost)" -> "delete (MyApp.DeleteAccount)" [weight=1]
    "backup_2 (MyApp.BackupPost)" -> "delete (MyApp.DeleteAccount)" [weight=1]
    "backup_3 (MyApp.BackupPost)" -> "delete (MyApp.DeleteAccount)" [weight=1]
    "receipt (MyApp.FinalReceipt)" -> "backup_1 (MyApp.BackupPost)" [weight=1]
    "receipt (MyApp.FinalReceipt)" -> "backup_2 (MyApp.BackupPost)" [weight=1]
    "receipt (MyApp.FinalReceipt)" -> "backup_3 (MyApp.BackupPost)" [weight=1]
    "receipt (MyApp.FinalReceipt)" -> "email_1 (MyApp.EmailSubscriber)" [weight=1]
    "receipt (MyApp.FinalReceipt)" -> "email_2 (MyApp.EmailSubscriber)" [weight=1]
    "email_1 (MyApp.EmailSubscriber)" -> "delete (MyApp.DeleteAccount)" [weight=1]
    "email_2 (MyApp.EmailSubscriber)" -> "delete (MyApp.DeleteAccount)" [weight=1]
}

Now we can take that dot output and render it using a tool like graphviz. The following example function accepts a workflow and renders it out as an SVG:

defmodule WorkflowRenderer do
  alias Oban.Pro.Workers.Workflow

  def render(workflow) do
    dot_path = "workflow.dot"
    svg_path = "workflow.svg"

    File.write!(dot_path, Workflow.to_dot(workflow))

    System.cmd("dot", ["-T", "svg", "-o", svg_path, dot_path])
  end
end

With graphviz installed, that will generate a SVG of the workflow:

delete (MyApp.DeleteAccount)backup_1 (MyApp.BackupPost)backup_2 (MyApp.BackupPost)backup_3 (MyApp.BackupPost)receipt (MyApp.FinalReceipt)email_1 (MyApp.EmailSubscriber)email_2 (MyApp.EmailSubscriber)

Looking at the visualized graph we can clearly see how the workflow starts with a single render job, fans-out to multiple email and backup jobs, and finally fans-in to the delete job—exactly as we planned!

Link to this section Summary

Callbacks

Add a named job to the workflow along with optional dependencies.

Get all jobs for a workflow, optionally filtered by upstream deps.

Instantiate a new workflow from an existing workflow job, jobs, or a stream.

Generate a unique string to identify the workflow.

Instantiate a new workflow struct with a unique workflow id.

Stream all jobs for a workflow.

Converts the given workflow to DOT format, which can then be converted to a number of other formats via Graphviz, e.g. dot -Tpng out.dot > out.png.

Functions

Initialize a new workflow.

Link to this section Types

Link to this type

add_option()

@type add_option() :: {:deps, [name()]}
Link to this type

append_option()

@type append_option() :: new_option() | {:check_deps, boolean()}
@type chan() :: Oban.Job.changeset()
Link to this type

fetch_opts()

@type fetch_opts() ::
  {:log, Logger.level()}
  | {:names, [atom()]}
  | {:only_deps, boolean()}
  | {:timeout, timeout()}
@type name() :: atom() | String.t()
Link to this type

new_option()

@type new_option() ::
  {:ignore_cancelled, boolean()}
  | {:ignore_deleted, boolean()}
  | {:ignore_discarded, boolean()}
  | {:waiting_delay, pos_integer()}
  | {:waiting_limit, pos_integer()}
  | {:waiting_snooze, pos_integer()}
  | {:workflow_id, String.t()}
@type t() :: %Oban.Pro.Workers.Workflow{
  changesets: [chan()],
  check_deps: boolean(),
  id: String.t(),
  names: MapSet.t(),
  opts: map()
}

Link to this section Callbacks

Link to this callback

add(flow, name, changeset, opts)

@callback add(flow :: t(), name :: name(), changeset :: chan(), opts :: [add_option()]) ::
  t()

Add a named job to the workflow along with optional dependencies.

examples

Examples

Add jobs to a workflow with dependencies:

MyApp.MyWorkflow.new_workflow()
|> MyApp.MyWorkflow.add(:a, MyApp.WorkerA.new(%{id: id}))
|> MyApp.MyWorkflow.add(:b, MyApp.WorkerB.new(%{id: id}), deps: [:a])
|> MyApp.MyWorkflow.add(:c, MyApp.WorkerC.new(%{id: id}), deps: [:a])
|> MyApp.MyWorkflow.add(:d, MyApp.WorkerC.new(%{id: id}), deps: [:b, :c])
Link to this callback

all_workflow_jobs(t, list)

@callback all_workflow_jobs(Oban.Job.t(), [fetch_opts()]) :: [Oban.Job.t()]

Get all jobs for a workflow, optionally filtered by upstream deps.

examples

Examples

Retrieve all workflow jobs:

@impl Workflow
def process(%Job{} = job) do
  job
  |> all_workflow_jobs()
  |> do_things_with_jobs()

  :ok
end

Retrieve only current job's deps:

workflow_jobs = all_workflow_jobs(job, only_deps: true)

Retrieve an explicit list of dependencies:

[job_a, job_b] = all_workflow_jobs(job, names: [:a, :b])
Link to this callback

append_workflow(jobs, list)

@callback append_workflow(jobs :: Oban.Job.t() | [Oban.Job.t()], [append_option()]) :: t()

Instantiate a new workflow from an existing workflow job, jobs, or a stream.

examples

Examples

Append to a workflow seeded with all other jobs in the workflow:

jobs
|> append_workflow()
|> add(:d, WorkerD.new(%{}), deps: [:a])
|> add(:e, WorkerE.new(%{}), deps: [:b])
|> Oban.insert_all()

Append to a workflow from a single job and bypass checking deps:

job
|> append_workflow(check_deps: false)
|> add(:d, WorkerD.new(%{}), deps: [:a])
|> add(:e, WorkerE.new(%{}), deps: [:b])
|> Oban.insert_all()
@callback gen_id() :: String.t()

Generate a unique string to identify the workflow.

Defaults to a 128bit UUID.

examples

Examples

Generate a workflow id using random bytes instead of a UUID:

@impl Workflow
def gen_id do
  24
  |> :crypto.strong_rand_bytes()
  |> Base.encode64()
end
Link to this callback

new_workflow(opts)

@callback new_workflow(opts :: [new_option()]) :: t()

Instantiate a new workflow struct with a unique workflow id.

examples

Examples

Create a simple workflow without any options:

MyApp.MyWorkflow.new_workflow()
|> MyWorkflow.add(:a, MyApp.WorkerA.new(%{}))
|> MyWorkflow.add(:b, MyApp.WorkerB.new(%{}))

Create a workflow with a static id and some options:

MyWorkflow.new_workflow(workflow_id: "workflow-id", ignore_cancelled: true)
Link to this callback

stream_workflow_jobs(t, list)

@callback stream_workflow_jobs(Oban.Job.t(), [fetch_opts()]) :: Enum.t()

Stream all jobs for a workflow.

examples

Examples

Stream with filtering to only preserve completed jobs:

@impl Workflow def process(%Job{} = job) do

{:ok, workflow_jobs} =
  MyApp.Repo.transaction(fn ->
    job
    |> stream_workflow_jobs()
    |> Stream.filter(& &1.state == "completed")
    |> Enum.to_list()
  end)

do_things_with_jobs(workflow_jobs)

:ok

end

@callback to_dot(flow :: t()) :: String.t()

Converts the given workflow to DOT format, which can then be converted to a number of other formats via Graphviz, e.g. dot -Tpng out.dot > out.png.

The default implementation relies on libgraph.

examples

Examples

Generate a DOT graph format from a workflow:

MyApp.MyWorkflow.to_dot(workflow)

Link to this section Functions

Link to this function

new(opts \\ [])

@spec new(opts :: [new_option()]) :: t()

Initialize a new workflow.