Oban.Pro.Plugins.DynamicScaler (Oban Pro v1.5.0-rc.4)
The DynamicScaler
examines queue throughput and issues commands to horizontally scale
cloud infrastructure to optimize processing. With auto-scaling you can spin up additional nodes
during high traffic events, and pare down to a single node during a lull. Beyond optimizing
throughput, scaling may save money in environments with little to no usage at off-peak times,
e.g. staging.
Horizontal scaling is applied at the node level, not the queue level, so you can distribute processing over more phyiscal hardware.
Predictive Scaling — The optimal scale is calculated by predicting the future size of a queue based on recent trends. Multiple samples are then used to prevent overreacting to changes in queue depth or throughput. Your provide an acceptible
range
of nodes and auto-scaling takes care of the rest.Multi-Cloud — Cloud integration is provided by a simple, flexible, behaviour that you implement for your specific environment and configure for each scaler.
Queue Filtering — By default, all queues are considered for scale calculations. However, you can restrict calculations to one or more ciritical queues.
Multiple Scalers — Some systems may restrict work to specific node types, e.g. generating exports or processing videos. Other hybrid systems may straddle multiple clouds. In either case, you can configure multiple independent scalers driven by distinct queues.
Non Linear — An optional
step
parameter allows you to conservatively scale up or down one node at a time, or optimize for responsiveness and jump from the min to the max in a single scaling period.Prevent Thrashing — A
cooldown
period skips scaling when there was recent scale activity to prevent unnecessarily scaling nodes up or down. Nodes may take several minutes to start within an environment, so the defaultcooldown
period is 2 minutes.
Using and Configuring
Clouds
There are ample hosting platforms, aka "clouds", out there and we can't support them all! Before you can begin dynamic scaling you'll need to implement a
Cloud
module for your environment. Don't worry, we have copy-and-paste examples for some popular platforms and a guide to walk through implementations for your environment.
With a cloud
module in hand you're ready to add the DynamicScaler
plugin to your Oban
config:
config :my_app, Oban,
plugins: [
{Oban.Pro.Plugins.DynamicScaler, ...}
]
Then, add a scaler with a range
to define the minimum and maximum nodes, and your cloud
strategy:
{DynamicScaler, scalers: [range: 1..5, cloud: MyApp.Cloud]}
Now, every minute, DynamicScaler
will calculate the optimal number of nodes for your queue's
throughput and issue scaling commands accordingly.
Configuring Scalers
Scalers have options beyond :cloud
and :range
for more advanced systems or to constrain
resource usage. Here's a breakdown of all options, followed by specific examples of each.
:cloud
— Amodule
or{module, options}
tuple that interacts with the external cloud during scale events. Required.:range
— The range of compute units to scale between. For example,1..3
declares a minimum of 1 node and a maximum of 3. The minimum must be 0 or more, and the maximum must be 1 or at least match the minimum. Required.:cooldown
— The minimum time between scaling events. Defaults to 120 seconds.:lookback
— The historic time to check queues. Defaults to 60 seconds.:queues
— Either:all
or a list of queues to consider when measuring throughput and backlog.:step
— Either:none
or the maximum nodes to scale up or down at once. Defaults to:none
.
Scaler Examples
Filter throughput queries to the :media
queue:
scalers: [queues: :media, range: 1..3, cloud: MyApp.Cloud]
Filter throughput queries to both :audio
and :video
queues:
scalers: [queues: [:audio, :video], range: 1..3, cloud: MyApp.Cloud]
Configure scalers driven by different queues (note, queues may not overlap):
scalers: [
[queues: :audio, range: 0..2, cloud: {MyApp.Cloud, asg: "my-audio-asg"}],
[queues: :video, range: 0..5, cloud: {MyApp.Cloud, asg: "my-video-asg"}]
]
Limit scaling to one node up or down at a time:
scalers: [range: 1..3, step: 1, cloud: MyApp.Cloud]
Wait at least 5 minutes (300 seconds) between scaling events:
scalers: [range: 1..3, cloud: MyApp.Cloud, cooldown: 300]
Increase the period used to calculate historic throughput to 90 seconds:
scalers: [range: 1..3, cloud: MyApp.Cloud, lookback: 90]
Scaling Down to Zero Nodes
It's possible to scale down to zero nodes in staging environments or production applications with periods of downtime. However, it is only viable for multi-node setups with dedicated worker nodes and another instance type that isn't controlled by
DynamicScaler
. Without a separate "web" node, or something that is always running, you run the risk of scaling down without the ability to scale back up.
Cloud Modules
There are a lot of hosting platforms, aka "clouds" out there and we can't support them all! Even
with optional dependencies, it would be a mess of libraries that may not agree with your
application decisions. Instead, the Oban.Pro.Cloud
behaviour defines two simple callbacks, and
integrating with platforms typically takes a single HTTP query or library call.
The following links contain gists of full implementations for popular cloud platforms. Feel free to copy-and-paste to use them as-is or as the basis for your own cloud modules.
Let us know if an integration for your platform is missing (which is rather likely) and you'd like assistance. Otherwise, follow the guide below to write your own integration!
Writing Cloud Modules
Cloud callback modules must define an init/1
function to prepare configuration at runtime, and
a scale/2
callback called with the desired number of nodes and the prepared configuration.
The following example demonstrates a complete callback module for scaling EC2 Auto Scaling Groups on AWS using the SetDesiredCapacity action. It assumes you're using the ex_aws package with the proper credentials.
defmodule MyApp.ASG do
@behaviour Oban.Pro.Cloud
@impl Oban.Pro.Cloud
def init(opts), do: Map.new(opts)
@impl Oban.Pro.Cloud
def scale(desired, conf) do
params = %{
"Action" => "SetDesiredCapacity",
"AutoScalingGroupName" => conf.asg,
"DesiredCapacity" => desired,
"Version" => "2011-01-01"
}
query = %ExAws.Operation.Query{
path: "",
params: params,
service: :autoscaling,
action: :set_desired_capacity
}
with {:ok, _} <- ExAws.request(query), do: {:ok, conf}
end
end
You'd then use your cloud module as a scaler option:
{DynamicScaler, scalers: [range: 1..3, cloud: {MyApp.ASG, asg: "my-asg-name"}]}
Clouds can also pull from the application or system environment to build configuration. If your module pulls from the environment exclusively, then you can pass the module name rather than a tuple:
{DynamicScaler, scalers: [range: 1..3, cloud: MyApp.ASG]}
Optimizing Throughput Queries
While the scaler's throughput queries are optimized for a standard load, high throughput queues, or systems that retain a large volume of jobs, may benefit from an additional index that aids calculating throughput. Use the following migration to add an index if you find that scaling queries are too slow or timing out:
@disable_ddl_transaction true
@disable_migration_lock true
def change do
create_if_not_exists index(
:oban_jobs,
[:state, :queue, :attempted_at, :attempted_by],
concurrently: true,
where: "attempted_at IS NOT NULL",
prefix: "public"
)
end
Alternatively, you can change the timeout used for scaler inspection queries:
{DynamicScaler, timeout: :timer.seconds(15), scalers: ...}
Instrumenting with Telemetry
The DynamicScaler
plugin adds the following metadata to the [:oban, :plugin, :stop]
event:
:scaler
- details of the active scaler config with recent scaling values:error
— the value returned fromscale/2
when scaling fails
When multiple scalers
are configured one event is emitted for each scaler.