Oban.Pro.Plugins.DynamicLifeline (Oban Pro v1.5.0-rc.5)
The DynamicLifeline
plugin uses producer records to periodically rescues orphaned jobs, i.e.
jobs that are stuck in the executing
state because the node was shut down before the job could
finish. In addition, it performs the following maintenance tasks:
- Discard jobs left
available
with an attempt equal to max attempts - Rescue stuck workflows with deleted dependencies or missed scheduling events
- Rescue stuck chains with deleted dependencies or missed scheduling events
Without DynamicLifeline
you'll need to manually rescue stuck jobs or perform maintenance.
Using the Plugin
To use the DynamicLifeline
plugin, add the module to your list of Oban plugins in
config.exs
:
config :my_app, Oban,
plugins: [Oban.Pro.Plugins.DynamicLifeline]
...
There isn't any configuration necessary. By default, the plugin rescues orphaned jobs every 1 minute. If necessary, you can override the rescue interval:
plugins: [{Oban.Pro.Plugins.DynamicLifeline, rescue_interval: :timer.minutes(5)}]
If your system is under high load or produces a multitude of orphans you may wish to increase
the query timeout beyond the 30s
default:
plugins: [{Oban.Pro.Plugins.DynamicLifeline, timeout: :timer.minutes(1)}]
Note that rescuing orphans relies on producer records as used by the Smart
engine.
Identifying Rescued Jobs
Rescued jobs can be identified by a rescued
value in meta
. Each rescue increments the
rescued
count by one.
Rescuing Exhausted Jobs
When a job's attempt
matches its max_attempts
its retries are considered "exhausted".
Normally, the DynamicLifeline
plugin transitions exhausted jobs to the discarded
state and
they won't be retried again. It does this for a couple of reasons:
To ensure at-most-once semantics. Suppose a long-running job interacted with a non-idempotent service and was shut down while waiting for a reply; you may not want that job to retry.
To prevent infinitely crashing BEAM nodes. Poorly behaving jobs may crash the node (through NIFs, memory exhaustion, etc.) We don't want to repeatedly rescue and rerun a job that repeatedly crashes the entire node.
Discarding exhausted jobs may not always be desired. Use the retry_exhausted
option if you'd
prefer to retry exhausted jobs when they are rescued, rather than discarding them:
plugins: [{Oban.Pro.Plugins.DynamicLifeline, retry_exhausted: true}]
During rescues, with retry_exhausted: true
, a job's max_attempts
is incremented and it is
moved back to the available
state.
Instrumenting with Telemetry
The DynamicLifeline
plugin adds the following metadata to the [:oban, :plugin, :stop]
event:
:rescued_jobs
— a list of jobs transitioned back toavailable
:discarded_jobs
— a list of jobs transitioned todiscarded
Note: jobs only include id
, queue
, and state
fields.
Summary
Types
option()
@type option() :: {:conf, Oban.Config.t()} | {:name, Oban.name()} | {:retry_exhausted, boolean()} | {:rescue_interval, timeout()} | {:timeout, timeout()}