Dynamic Lifeline Plugin
🌟 This plugin is available through Oban.Pro
The DynamicLifeline
plugin uses producer records to periodically rescues
orphaned jobs, i.e. jobs that are stuck in the executing
state because the
node was shut down before the job could finish.
Without the DynamicLifeline
plugin you may need to manually rescue jobs stuck
in the executing
state.
using-and-configuring
Using and Configuring
To use the DynamicLifeline
plugin add the module to your list of Oban plugins
in config.exs
:
config :my_app, Oban,
plugins: [Oban.Pro.Plugins.DynamicLifeline]
...
There isn't any configuration necessary. By default, the plugin will delete outdated producer records and rescue orphaned jobs every 1 minute. If necessary you can configure the rescue interval:
plugins: [{Oban.Pro.Plugins.DynamicLifeline, rescue_interval: :timer.minutes(5)}]
Note that rescuing orphans relies on producer records as used by the SmartEngine
.
rescuing-exhausted-jobs
Rescuing Exhausted Jobs
When a job's attempt
matches its max_attempts
its retries are considered
"exhausted". Normally, the DynamicLifeline
plugin transitions exhausted jobs to the
discarded
state and they won't be retried again. It does this for a couple of
reasons:
- To ensure at-most-once semantics. Suppose a long-running job interacted with a non-idempotent service and was shut down while waiting for a reply; you may not want that job to retry.
- To prevent infinitely crashing BEAM nodes. Poorly behaving jobs may crash the node (through NIFs, memory exhaustion, etc.) We don't want to repeatedly rescue and rerun a job that repeatedly crashes the entire node.
Discarding exhausted jobs may not always be desired. Use the retry_exhausted
option if you'd prefer to retry exhausted jobs when they are rescued, rather
than discarding them:
plugins: [{Oban.Pro.Plugins.DynamicLifeline, retry_exhausted: true}]
During rescues, with retry_exhausted: true
, a job's max_attempts
is
incremented and it is moved back to the available
state.
implementation-notes
Implementation Notes
Some additional notes about how DynamicLifeline
operates:
Orphan rescuing is guaranteed to only rescue jobs that belong to dead queue processes or nodes.
Only a single node will rescue orphans at any given time, which prevents potential deadlocks and churn.
instrumenting-with-telemetry
Instrumenting with Telemetry
The DynamicLifeline
plugin adds the following metadata to the [:oban, :plugin, :stop]
event:
:action
—:rescue
.:deleted_count
— the number of producers deleted:rescued_count
— the number of jobs rescued
See the docs on Plugin Events for details.