Changelog for Oban Pro v1.4
π Streamlined Workflows
Workflows now use automatic scheduling to run workflow jobs in order, without any polling or
snoozing. Jobs with upstream dependencies are automatically marked as on_hold
and scheduled far
into the future. After the upstream dependency executes they're made available to run, with
consideration for retries, cancellations, discards, and deletion.
An optimized, debounced query system uses up to 10x fewer queries to execute slower workflows.
Queue concurrency limits don't impact workflow execution, and even complex workflows can quickly run in parallel with
global_limit: 1
and zero snoozing.Cancelled, deleted, and discarded dependencies are still handled according to their respective
ignore_*
policies.
All of the previous workflow "tuning" options like waiting_limit
and waiting_snooze
are gone,
as they're not needed to optimize workflow execution. Finally, older "in flight" workflows will
still run with the legacy polling mechanism to ensure backwards compatibility.
β€οΈβπ©Ήπ Upgrading Workflows in v1.4
Because workflow staging is now reactive, you should add an optimized partial index to improve the performance of workflow dependency queries:
defmodule MyApp.Repo.Migrations.AddWorkflowIndexes do use Ecto.Migration defdelegate change, to: Oban.Pro.Migrations.Workflow end
β²οΈ Job Execution Deadlines
Jobs that shouldn't run after some period of time can be marked with a deadline
. After the
deadline has passed the job will be pre-emptively cancelled on its next run, or optionally
during its next run if desired.
defmodule DeadlinedWorker do
use Oban.Pro.Worker, deadline: {1, :hour}
@impl true
def process(%Job{args: args}) do
# If this doesn't run within an hour, it's cancelled
end
end
Deadlines may be set at runtime as well:
DeadlinedWorker.new(args, deadline: {30, :minutes})
In either case, the deadline is always relative and computed at runtime. That also allows the deadline to consider schedulingβa job scheduled to run 1 hour from now with a 1 hour deadline will expire 2 hours in the future.
π§ Automatic Crontab Syncing
Synchronizing persisted entries manually required two deploys: one to flag it with deleted: true
and another to clean up the entry entirely. That extra step isn't ideal for applications that
don't insert or delete jobs at runtime.
To delete entries that are no longer listed in the crontab automatically set the sync_mode
option to :automatic
:
[
sync_mode: :automatic,
crontab: [
{"0 * * * *", MyApp.BasicJob},
{"0 0 * * *", MyApp.OtherJob}
]
]
To remove unwanted entries, simply delete them from the crontab:
crontab: [
{"0 * * * *", MyApp.BasicJob},
- {"0 0 * * *", MyApp.OtherJob}
]
With :automatic
sync, the entry for MyApp.OtherJob
will be deleted on the next deployment.
β€οΈβπ©Ήπ Upgrading DynamicCron in v1.4
Changes to
DynamicCron
require a migration to add the newinsertions
column. You must re-run theOban.Pro.Migrations.DynamicCron
migration when upgrading.defmodule MyApp.Repo.Migrations.UpdateObanCron do use Ecto.Migration defdelegate change, to: Oban.Pro.Migrations.DynamicCron end
v1.4.11 β 2024-08-07
Version constraints are loosened to allow Oban v2.18.
Bug Fixes
[Smart] Take a queue mutex with pending batch or workflow handlers.
Flush handlers for batches and workflows must be staggered to prevent overlapping transactions. Without a mutex to stagger fetching the transactions on different nodes can't see all completed jobs and handlers can misfire.
This is backported from a similar change in v1.5.
[Batch] Always cast
state
andcallback
values to text in batch.Without explicit typecasting in unioned queries, batch operations may cause an unknown type error in PostgreSQL 16.1 and above.
v1.4.10 β 2024-06-25
Bug Fixes
[Smart] Always cast
uniq_key
to an integer in unique dupes query.The raw uniq_key type is always an integer, but it was injected without any type information. That caused an
indeterminate_datatype
error on the latest Aurora versions.[Worker] Correct
scheduled_at
check for deadline workers.A typo in the
scheduled_at
check prevented correct deadlines for explicitly scheduled jobs.
v1.4.9 β 2024-06-05
Enhancements
[Workflow] Use microseconds in the
orig_scheduled_at
value stored inmeta
Original workflow scheduled timestamps are now stored as microseconds rather than seconds to match the precision of
scheduled_at
. There's fallback support for existing jobs with seconds for a smooth transition.
Bug Fixes
[Smart] Prevent tracked global counts from getting out of sync.
Races between update operations such as scaling or pausing, and job fetching with backoff due to errors, could cause incorrect tracked global counts. Those counts wouldn't self-correct after the initial ack, and eventually the queue would stop fetching new jobs.
[Smart] Omit workflow jobs with incomplete dependencies from retrying.
Using
retry_job
orretry_all_jobs
on an on-hold workflow job may effectively break the workflow. Now held jobs are omitted from retry calls, which are also the mechanism used by Oban Web for "Run now".[Smart] Delete all pending acks after draining jobs in tests.
While draining, acks were persisted and not cleaned up afterwards. That could lead to incorrect batch callbacks firing, which could insert jobs during the wrong test.
[DynamicCron] Prevent missing worker from crashing the plugin on insert.
A single missing worker would crash the plugin when it reloaded entries for insert. Now a missing worker is handled gracefully, allowing the plugin to keep running normally.
This was a regression from v1.3, caused by the switch from insert with uniqueness to
insert_all
.
v1.4.8 β 2024-05-30
Enhancements
[Batch] Heavily optimize Batch query performance.
Applying recent learnings from optimizing workflows, this switches Batch queries to specialized, index backed queries for batch callback checks. In a benchmark containing 10 batches of 350k jobs, the optimized queries performed 450x faster (from 90ms down to 0.2ms).
The batch callback queries are much faster even without an index, but applications that utilize batches should run the new migration.
defmodule MyApp.Repo.Migrations.AddBatchIndex do use Ecto.Migration defdelegate change, to: Oban.Pro.Migrations.Batch end
[Worker] Omit
nil
values when dumping structured args.Optional fields with a
nil
value were always written to the database. That prevented graceful transitions away from a structured field because the optional value would linger. For example, given the following schema:args_schema do field :foo, :id field :bar, :id end
Passing the args
%{foo: 123}
is stored as{"foo":123}
, without an empty"bar"
key.[Smart] Record the original
scheduled_at
timestamp on snooze or retry.Snoozing and rescheduling a job on error overwrites the
scheduled_at
timestamp, which makes it impossible to determine how long a job was truly waited before execution.Now snoozing or erroring will inject the original
scheduled_at
timestamp as unix seconds into the job's meta asorig_scheduled_at
.
Bug Fixes
[Workflow] Restore tuple type syntax to prevent double wrapped options for all functions that used the
new_option
type. This fixes a success typing Dialyzer warning.[Testing] Allow testing all batch callbacks.
The
handle_cancelled
andhandle_retryable
callbacks weren't listed as known callbacks and couldn't be used by theperform_callback
testing helper.[Batch] Safely handle jobs cancelled by the Basic engine.
The
Batcher
incorrectly assumed thatmeta
was available in all cancelled jobs. That's only the case for jobs cancelled by theSmart
engine, not theBasic
engine.
v1.4.7 β 2024-05-15
Enhancements
[Workflow] Add
workflow_name
option to label all jobs in a workflow with a common name.In addition to individual step names, now all jobs in a workflow can have a
workflow_name
that groups them together. For example, to label an ETL workflow:workflow = Workflow.new(workflow_name: "etl_pipeline")
[Workflow] Add
after_cancelled/2
callback to hook into workflow job cancellations.The new callback is triggered after a workflow job is cancelled due to upstream jobs being
cancelled
,deleted
, ordiscarded
. This callback is only called when a job is cancelled because of an upstream dependency. It is never called after normal job execution.For example, here's a trivial
after_cancelled
hook that logs a warning when a workflow job is cancelled:def MyApp.Workflow do use Oban.Pro.Workers.Workflow require Logger @impl true def after_cancelled(reason, job) do Logger.warn("Workflow job #{job.id} cancelled because a dependency was #{reason}") end
Bug Fixes
[Workflow] Respect the original scheduled time when ignoring cancellations.
Workflow jobs were made
available
rather thanscheduled
to the correct scheduled time whenignore_cancelled
orignore_discarded
was triggered.[Workflow] Record an
Oban.Pro.WorkflowError
error when cancelling dependent jobs.Previously, a generic
ErlangError
was recorded as the cancellation reason, which wasn't helpful for identifying cancelled dependents.[DynamicCron] Split migration into distinct
up/0
anddown/0
steps to for rollback.The migration couldn't be automatically rolled back due to the use of
alter table
. Now there are separateup
anddown
migrations that the documentation points to, though thechange
function exists for backward compatibility.[Worker] Indicate that the result passed to
after_process/3
may benil
after an error, in addition to the standard job results.
v1.4.6 β 2024-04-26
Enhancements
[Smart] Use a static pool of
ets
tables to track jobs for async acking.Producers no longer have their own
ets
tables for acking. Instead, there is a central, static pool of tables to reduce the total number ofets
tables, and it ensures that if a producer crashes it won't lose any job acks.There's a substantial memory savings from this change. With 1000 queues, this reduced the initial memory used by
ets
tables 24MB.
Bug Fixes
[Workflow] Prevent race condition causing stuck workflow jobs.
Two separate factors interplayed to cause stuck workflow jobs in queues with numerous concurrent workflows. Changing how handlers were stored, deduplicated, and applied improves the overall reliability of Batch and Workflow handlers.
[Workflow] Periodically rescue stuck workflows from the
DynamicLifeline
.The
DynamicLifeline
will now periodically rescue stuck workflows when upstream dependencies are deleted or downstream deps aren't staged properly. This ensures workflows can always be repaired and resumed without outside intervention.This also removes the database function and trigger because they're incompatible with partitioned tables. Moving a row from one partition to another invokes the delete trigger, which makes the trigger rescue mechanism completely incompatible with partitioned tables.
There's no harm in the trigger for existing single-table setups, and it doesn't need to be removed, but it won't be added by migrations moving forward.
[DynamicCron] Prevent query timeouts by selecting truncated insertions once.
The query duplicated selecting insertions rather than overwriting it. Now the query properly limits insertion reloading.
v1.4.5 β 2024-04-22
Bug Fixes
[Migrations] Remove use of
OR REPLACE
in workflow trigger migration.The OR REPLACE syntax wasn't added until PG14, which is newer than the minimum supported PG12.
v1.4.4 β 2024-04-20
Enhancements
[Workflow] Use a highly optimized partial index for workflow deps checks.
Between an optional partial index and an alternative query that avoids bitmap index scans, workflow deps checks are much faster, and even sub-millisecond in most cases.
To get the benefits of this query change you must run, or re-run, the
Oban.Pro.Migrations.Workflow
migration.[Smart] Limit total flush handlers (Batch, Workflow) per fetch transaction.
When flush handlers accumulate, and they're too slow, they can cause the producer's fetch transaction to fail. Now a fixed number of flush handlers are applied in each transaction, and fetching is scheduled again shortly afterwards to continue flushing if there are additional handlers.
[DynamicCron] Only select latest insertion timestamp when reloading crontab entries.
Reduce the total data pulled from the
oban_crons
table on reload by only selecting the most recent insertion timestamp. Applications with thousands of dynamic crons could experience timeouts from the extensive data transfer. In addition, the number of recorded insertions is halved from 720 to 360 by default.
Bug Fixes
[Workflow] Cascade workflow changes on lifeline job discard.
Workflow deps were left "on hold" when an upstream job exhausted retries and was discarded by the DynamicLifeline. Now discard events are handled and the appropriate workflow transition is applied after the plugin runs.
[DynamicPartitioner] Retain database
schema
when rolling backDynamicPartitioner
migration.The
CREATE SCHEMA
statement is conditional, and there's no guarantee that the schema wasn't already created before rolling it back.[DynamicCron] Drop
:guaranteed
override from job options when inserting cron jobs.A per-entry guaranteed override is supported for checking jobs, but naturally
Oban.Job
doesn't support it as an option. Now the:guaranteed
key is dropped along with:timezone
.[DynamicCron] Shift the timezone when comparing last insertion for guaranteed checks.
Apply per-entry timezones or a globally configured timezone when comparing a parsed time with the last insertion for guaranteed mode.
[DynamicCron] Clear recorded insertions on entry expression or timezone updates.
Expression and per-entry timezone changes now clear prior insertion timestamps to prevent mistaken scheduling.
[Testing] Always include base
:retryable
value indrain_jobs/2
summary output.The output of
drain_jobs
only had a:retryable
value when one or more jobs were retryable, not in all cases as the typespec implied.[Testing] Correct
:exhausted
check for summary output fromrun_workflow/2
.Exhausted jobs were incorrectly tracked as
discarded
when summarizingrun_workflow
output.[Chain] Safely run chained jobs with
:inline
testing mode.In
:inline
mode jobs don't hit the database. Without anid
chain queries cause anArgumentError
, plus,:inline
isn't meant to touch the database at all.
v1.4.3 β 2024-04-08
Enhancements
[Workflow] Use a database trigger function to handle deleted deps.
Querying the full
oban_jobs
table to detect abandonded workflow deps is too slow for sizable workflows. The only consistent intercept for deleted dependencies is the database, so this adds aOban.Pro.Migrations.Workflow
migration that creates a trigger specifically tailored to cleaning up deps after deleting a a workflow job.This approach is vastly faster and applied immediately, without waiting for a pruning event. However, it does require an optional migration to safely handle deleted workflow deps.
See the admonition above on upgrading workflows, or the section on handling deleted deps in the docs for details.
[Smart] Handle
Batch
callback andWorkflow
deps checks after async acking, rather than forcing synchronous acking with a separate debounce cycle.This change improves the overall throughput of queues with
Batch
andWorkflow
jobs, while also increasing the responsiveness of batch callback enqeuing and deps staging.Batch
andWorkflow
checks are still grouped to avoid duplicate work- Acking is always synchronous, there's no blocking job execution for debounced checks
- Queries for acking, fetching, and checking all run in a single transaction
- Debounce options are ignored and no longer documented
[Testing] Mark test processes when draining to simplify async acking checks.
Bug Fixes
[Smart] Track producer meta changes without fetched jobs.
An empty clause prevented tracking global changes for empty partitions, e.g. partitions without additional jobs to fetch. This was most noticeable for partitioned queues with a low limit and sparse jobs.
[Testing] Reload all jobs after draining in
run_workflow/1,2
.Appended jobs weren't included in the result from
run_workflow/1
, despite being inserted and executed. Now all jobs from the workflow are considered for summarization, or returned entirely without a summary.
v1.4.2 β 2024-03-26
[Workflow] Always resolve all workflow deps with single check.
Workflow deps checks with a mismatched number of deps and finished jobs could be made available erroneously.
[Workflow] Reimplement deleted workflow handling for accuracy, performance, and memory.
The workflow deps anti-join query was incorrect, causing it to return jobs that weren't actually part of on-hold workflows. The number of jobs returned grew with the total historic jobs, which could cause memory issues.
v1.4.1 β 2024-03-25
Bug Fixes
[Workflow] Fix workflow staging for jobs with multiple deps.
The final optimization to workflow deps querying introduced a bug that caused jobs with multiple deps to be made available after the first dep completed. This restores the original, correct query.
v1.4.0 β 2024-03-21
Enhancements
[DynamicCron] Add
:sync_mode
for automatic entry management.Most users expect that when they delete an entry from the crontab it won't keep running after the next deploy. A new
sync_mode
option allows selecting betweenautomatic
andmanual
entry management.In addition, this moves cron entry management into the evaluate handler. Inserting and deleting at runtime can't leverage leadership, because in rolling deployments the new nodes are never the leader.
[DynamicCron] Use recorded job insertion timestamps for guaranteed cron.
A new field on the
oban_crons
table records each time a cron job is inserted up to a configurable limit. That field is then used for guaranteed cron jobs, and optionally for historic inspection beyond a job's retention period.[DynamicCron] Stop adding a
unique
option when inserting jobs, regardless of the guaranteed option.There's no need for uniqueness checks now that insertions are tracked for each entry. Inserting without uniqueness is significantly faster.
[DynamicCron] Inject
cron
information into scheduled job meta.This change mirrors the addition of
cron: true
andcron_expr: expression
added to Oban's Cron in order to make cron jobs easier to identify and report on through tools like Sentry.[Worker] Add
:deadline
option for auto cancelling jobsJobs that shouldn't run after some period of time can be marked with a
deadline
. After the deadline has passed the job will be pre-emptively cancelled on its next run, or optionally during its next run if desired.[Workflow] Invert workflow execution to avoid bottlenecks caused by polling and snoozing.
Workflow jobs no longer poll or wait for upstream dependencies while executing. Instead, jobs with dependencies are "held" until they're made available by a facilitator function. This inverted flow makes fewer queries, doesn't clog queues with jobs that aren't ready, avoids snoozing, and is generally more efficient.
[Workflow] Expose functions and direct callback docs to function docs.
Most workflow functions aren't true callbacks, and shouldn't be overwritten. Now all callbacks point to the public function they wrap. Exposing workflow functions makes it easier to find and link to documentation.
Bug Fixes
[DynamicCron] Don't consider the node rebooted until it is leader
With rolling deploys it is frequent that a node isn't the leader the first time it evaluates. However, the
:rebooted
flag was set totrue
on the first run, which prevented reboots from being inserted when the node ultimately acquired leadership.[DynamicQueues] Accept streamline
partition
syntax forglobal_limit
andrate_limit
options.DynamicQueues didn't normalize the newer
partition
syntax before validation. This was an oversight, and a sign that validation had drifted between theProducer
andQueue
schemas. Now schemas use the same changesets to ensure compatibility.[Smart] Handle partitioning by
:worker
and:args
regardless of order.The docs implied partitioning by worker and args was possible, but there wasn't a clause that handled any order correctly.
[Smart] Explicitly cast transactional advisory lock prefix to integer.
Postgres 16.1 may throw an error that it's unable to determine the argument type while taking bulk unique locks.
[Smart] Preserve recorded values between retries when sync acking failed.
Acking a recorded value for a batch or workflow is synchronous, and a crash or timeout failure would lose the recorded value on subsequent attempts. Now the value is persisted between retries to ensure values are always recorded.
[Smart] Revert "Force materialized CTE in smart fetch query".
Forcing a materialized CTE in the fetch query was added for reliability, but it can cause performance regressions under heavy workloads.
[Testing] Use configured queues when ensuring all started.
Starting a supervised Oban in manual mode with tests specified would fail because in
:manual
testing mode the queues option is overridden to be empty.