Changelog for Oban Pro v1.3
This release is entirely dedicated to Smart engine optimizations, from slashing queue transactions to boosting bulk insert performance.
Rather than synchronously recording updates (acks) in a separate transaction after jobs execute, the Smart engine now bundles acks together to minimize transactions and reduce load on the database.
Async tracking, combined with the other enhancements detailed below, showed the following improvements over the previous Smart engine when executing 1,000 jobs with concurrency set to 20:
- Transactions reduced by 97% (1,501 to 51)
- Queries reduced by 94% (3,153 to 203)
That means less Ecto pool contention, fewer transactions, fewer queries, and fewer writes to the
oban_producers table! There are similar, albeit less flashy, improvements over the
engine as well.
Notes and Implementation Details
Acks are stored centrally per queue and flushed with the next transaction using a lock-free mechanism that never drops an operation.
Acks are grouped and executed as a single query whenever possible. This is most visible in high throughput queues.
Acks are preserved across transactions to guarantee nothing is lost in the event of a rollback or an exception.
Acks are flushed on shutdown and when the queue is paused to ensure data remains as consistent as the previous synchronous version.
Acking is synchronous in testing mode, when draining jobs, and when explicitly enabled by a flag provided to the queue.
See the Smart engine's async tracking section for more details and instructions on how to selectively opt out of async mode.
[DynamicLifeline] Track rescues with a counter in meta.
Rescued jobs can now be identified by a
meta. Each rescue increments the
rescuedcount by one.
[Smart] Skip taking unique advisory locks in testing mode.
Advisory locks are global and apply across transactions. That can break async tests with overlapping unique jobs because the lock is held in a concurrent, sandboxed test.
[Smart] Prevent stuck jobs with more reliable async ack management.
Unhandled transaction failures or timeouts while acking could cause uncomitted acks to be lost, leaving jobs stuck in an
executingstate but unable to be rescued.
Now acks are pulled from the producer's ETS table all at once, without a time-based select. Successfully persisted acks are deleted from the table individually rather than by time.
[Smart] Ensure uniqueness across args when no keys are specified regardless of insertion order.
[Smart] Force materializing the CTE when fetching jobs.
The CTE used to prevent optimizations in the engine's fetch query is only referenced once, which may allow the Postgres optimizer to inline it. Inlining can negate the CTE "optimization fence", so we force the CTE to be materialized.
date_partition?default lazily at runtime.
Date partitioning should be disabled in the test environment because the plugin doesn't run to pre-create partitions. However, using
Mix.envat compliation time wasnt't reliable enough to prevent sub-partitioning by date in testing environments. This switches the check to runtime.
[DynamicPartitioner] Improve partitioned structure and indexes for performance.
The partitioner migration now creates fewer partitions to improve staging queries. To match, it creates more specific indexes for each table to prevent sequential scans during common queries.
All partitions gain a primary key index, and the compound index for "completed" states now omits fields that are only needed for incomplete states.
Indexes on existing partitioned tables should be updated as shown below. However, note that indexes can't be added to partitioned tables concurrently.
-- This will cascade down to all partitions CREATE INDEX oban_jobs_pkey ON oban_jobs (id); -- Add an index with the correct state to for available, scheduled, and retryable CREATE INDEX oban_jobs_available_state_queue_priority_scheduled_at_id_index ON oban_jobs_available (state, queue, priority, scheduled_at, id); CREATE INDEX oban_jobs_available_state_queue_priority_scheduled_at_id_index ON oban_jobs_retryable (state, queue, priority, scheduled_at, id); CREATE INDEX oban_jobs_available_state_queue_priority_scheduled_at_id_index ON oban_jobs_scheduled (state, queue, priority, scheduled_at, id); -- Add a simpler index to completed states CREATE INDEX oban_jobs_cancelled_queue_scheduled_at_id_index ON oban_jobs_cancelled (queue, scheduled_at, id); CREATE INDEX oban_jobs_completed_queue_scheduled_at_id_index ON oban_jobs_completed (queue, scheduled_at, id); CREATE INDEX oban_jobs_discarded_queue_scheduled_at_id_index ON oban_jobs_discarded (queue, scheduled_at, id); -- Drop the previous index that lacked the state DROP INDEX oban_jobs_queue_priority_scheduled_at_id_index;
[DynamicPartitioner] Allow using
There are situations where it's still useful to use
DynamicPrunerto aggressively prune a subset of jobs more frequently than partitioning allows.
[Smart] Augment all frequent queries with a
statecondition to aid partitioned queries.
Partitioned table queries require the state for partition pruning, especially for an
idonly query. This changes the Smart engine's queries to run optimally with either a standard or partitioned table.
[DynamicPartitioner] Override configured
DynamicPartitioner backfills retained the configured prefix, which defaults to
public, without respecting the
new_prefixoption. Now the prefix is overridden and correctly escaped before usage.
embed_onewith required fields truly optional
The recursive nature of embedded structs made it impossible to have an optional
embed_onewith required fields. Now the embedded fields are only validated when a value is provided.
Names may be either an atom or a string, and they're always coerced before querying anyhow.
This release depends on Oban v2.17.3 or greater
after_process/3hook callback that includes execution results.
after_process/3callback includes the job's return value as a third argument. That allows hooks to have immediate access to the job's return value without recording it and fetching it from the database.
[Testing] Ensure queues are started before returning from
This prevents race conditions between when
start_supervised_obanreturns and when queues are fully booted, i.e. actively listening to pause/resume/scale signals.
by_state_timestampoption to DynamicPruner.
In rare situations where the
scheduled_attimestamp isn't accurate enough to identify prunable jobs, e.g. cancelling large swaths of jobs scheduled far into the future, the new
by_state_timestamp: trueoption can be used for increased accuracy.
[DynamicQueues] Accept ack_async/refresh_interval in DynamicQueues.
DynamicQueues now allows the
refresh_intervalvirtual fields for parity with standard queues.
[Smart] Serialize all meta updates through the producer.
This fixes all of the race conditions and outdated record issues that the mismatch between registry meta and the producer's own meta caused.
[Smart] Force synchronous acking for all Batch jobs.
Batches require accurate status counts to insert callbacks correctly. The slight delay from async jobs can cause incorrect counts in highly active queues with batches.
[Smart] Ensure global queues keep running with
Global queues that are marked with
ack_async: falsemust refresh the in-memory producer record between job fetching to keep the queue running. Otherwise, tracked jobs linger in the producer record despite successful acking.
[Smart] Prevent a race condition while pausing from stopping global queues.
Pausing a global queue while there are pending acks could trigger a write-after-read race condition that lost tracking changes. Eventually, leaked changes could prevent the queue from fetching new jobs because it looked like the global limit was met.
[Smart] Always split
completedack queries for recorded jobs.
Jobs with different recorded output could mistakenly be written with a single query if they completed within a few
msof each other. This changes the grouping mechanism to only bundle simple completions, never recorded completions.
[Smart] Default to synchronous acking when Oban is in a testing mode.
Acking should always be synchronous during tests to prevent flickering failures from race conditions. Previously, acking relied on a failed registry lookup to switch to synchronous mode, which wasn't accurate enough.
[Smart] Default to synchronous acking for
drain_jobs/2and related test helpers.
Draining runs synchronously in the test process, but not in testing mode. This explicitly disables
ack_asyncwhen draining jobs.
[DynamicPartitioner] Only sub-partition by date in non-test environments.
To prevent testing errors after migration, the
discardedstates are sub-partitioned by date only in
It's possible to enable date partitioning in other production-like environments with the new
[DynamicPartitioner] Rename existing
metaindexes to allow index recreation
When renaming the existing table to
metaindexes weren't renamed. That prevented creating those indexes on the new partitioned table, because Postgres detects that those indexes already existed and so it skips their creation.
[Smart] Skip extra query to "touch" the producer when acking without global or rate limiting enabled. This change reduces overall producer updates from 1 per job to 2 per minute for standard queues.
[Smart] Avoid refetching the local producer's data when fetching new jobs.
Async acking is centralized through the producer, which guarantees global and rate tracking data is up-to-date before fetching without an additional read.
[Smart] Optimize job insertion with fewer iterations.
Iterating through job changesets as a map/reduce with fewer conversions improves inserting 1k jobs by 10% while reducing overall memory by 9%.
[Smart] Efficiently count changesets during
Prevent duplicate iterations through changesets to count unique jobs. Iterating through them once to accumulate multiple counts improved insertion by 3% and reduced overall memory by 2%.
[Smart] Acking cancelled jobs is done with a single operation and limited to queues with global limiting.
[Smart] Always merge acked meta updates dynamically.
All meta-updating queries are dynamically merged with existing
meta. This prevents recorded jobs from clobbering other meta updates made while the job executed.
[Smart] Safely extract producer uuid from
attempted_bywith more than two elements
[DynamicCron] Preserve stored opts such as
priority, etc., on reboot when no new opts are set.
[Relay] Skip attempting relay notifications when the associated Oban pid isn't alive.