Burgeoning Elixirists frequently ask, “Who needs background jobs in Elixir? Isn’t
Task.start/1 is for?” Not quite. Let’s examine why a
is the wrong level of abstraction for critical background work.
Erlang, and therefore Elixir, provides a legendary concurrency story through
lightweight processes (
spawn) and message passing (
send). Those two
functions are technically all you need to build actor-model-based concurrency.
If you really wanted to, you could build an entire application purely with
send. Presumably, it would be tedious, and you’d slowly
reimplement an ad-hoc, informally-specified, bug-ridden version of half of
While Erlang provides basic concurrency primitives, decades of
in-the-field experience has guided the creation of elegant concurrency
abstractions such as
GenServer for long-lived generic processes, and
Supervisor for maintaining trees of processes. So, rather than stitching
systems together with
send, applications are composed of standard,
well-behaved GenServers and Supervisors.
Elixir provides ergonomic abstractions that simplify advanced patterns based on
OTP’s abstractions. For instance,
Registry adds a process-aware wrapper around
ETS tables, and
Agent brings a GenServer tailored for state management. Then
there is the
Task module for one-off OTP-friendly processes.
What Tasks Are and What They Aren’t
Tasks are a more powerful concurrency abstraction than bare spawned processes, making it simple to convert sequential code into concurrent code with varying guarantees. They’re ideal for operations like fetching several URLs or querying the database in parallel. Depending on how tasks are initialized, they have a spectrum of responsibility between best-effort and loosely supervised:
Task.start/1)—Tasks without process linking, supervision, concurrency controls, and no shutdown guarantees.
Task.Supervisor.async/2)—Tasks with process linking, simple enumerability, hard concurrency limits, and configurable shutdown periods.
Supervised tasks improve observability, constrain resources, and provide shutdown guarantees. But, tasks lack essential functionality for mission-critical work. Consider the following:
Enqueueing—How can I wait to execute a task when the supervisor hits a concurrency limit? How do I separate fast and slow tasks to prevent bottlenecks?
Scheduling—How do I run a task at a specific time in the future? What if too many scheduled jobs all need to start simultaneously? How do I reschedule them?
Retries—How do I restart tasks with transient failures? How do I delay and stagger retries with some backoff to prevent concurrent access problems?
Uniqueness—How do I prevent the same task from executing concurrently on the same node? What if I ran a task a few seconds ago, and the result is still usable?
Distribution—How do I distribute tasks evenly between every node in my cluster? What if I only need to run tasks on some nodes?
Instrumentation—How can I measure the run time for various tasks and integrate them with my other application metrics?
Runtime Visibility—What function and arguments is each task currently doing? How long has it been doing it?
Historical Observability—What tasks are complete? When did they start and how long did they take?
That’s a lot of missing functionality, and there’s a more significant issue. Once you’ve implemented a solution for all of those missing pieces (or at least the parts you need right now) there is an essential component missing.
Persistence is Crucial
What happens when your application inevitably restarts, whether intentionally or from cascading failures? To retain tasks between restarts, you need persistent storage.
There are abundant persistence options from the Erlang native RabbitMQ to the inescapable Redis. Any persistent store could work with enough effort. However, the best fit, in our opinion, is PostgreSQL (surely not a surprise, as Oban says, “powered by modern PostgreSQL” right on the tin).
Aside from a well-earned reputation as a flexible, reliable, and highly performant relational database, PostgreSQL’s killer feature is that it’s probably in your application.
Persisting tasks, or at least a task-like wrapper, in a database neatly solves many of the problems we identified earlier:
Enqueueing—With atomic operations, a SQL table can behave like a queue.
Scheduling—With timestamps, we can defer execution until a specific time.
Uniqueness—With persistence, we can query for duplicate tasks.
Distribution—With a central database, nodes can pull tasks when ready.
Historical Observability—With retention, we can look at completed tasks.
Coordinating with a database is certainly slower than spawning a BEAM process, but the upside of persistence is immense. Tasks can be enqueued atomically within the same transaction as your other application code. More importantly, you’re assured that critical tasks won’t disappear unexpectedly during a routine application restart.
Once foundational persistence is sorted, the other layers can fall into place.
Picking Up Where Tasks Leave Off
You could rebuild GenServers, Supervisors, Agents, Tasks, or a Registry, but they already exist in Elixir as a springboard for you to build on.
As Elixir builds on top of OTP, Oban expands on those primitives (and some phenomenal packages) to formalize how well-behaved, observable, reliable, and persistent tasks should operate. In fact, Oban links processes through a Registry, manages queues with a DynamicSupervisor, and executes every job within a supervised Task!
Even in an environment with the subjectively best concurrency story of any runtime, you still need additional functionality for mission-critical tasks.
That’s where Oban starts.
As usual, if you have any questions or comments, ask in the Elixir Forum or the #oban channel on Elixir Slack. For future announcements and insight into what we're working on next, subscribe to our newsletter.