Applications experience variable load. You can handle increased load by upgrading your service to have more resources or by scaling it to have more instances.
Render supports the following scaling methods:
- Manual scaling: Scale your service to a specific number of instances.
- Autoscaling: Automatically scale your service based on target CPU and memory utilization.
Manual scaling is the simplest way to scale out your service. Your service will consistently have the desired number of instances. Open the Scaling tab in your service dashboard, enter the desired number of instances in the Manual Scaling section, and click Save Changes. Your service will immediately start to scale.
With autoscaling, Render scales your service up and down based on average CPU and/or memory utilization so you don’t have to predict or overprovision services to meet peak load.
By default, autoscaling is disabled. You can enable autoscaling by opening the Scaling tab in your service dashboard and toggling the autoscaling switch on. After autoscaling is enabled, you can toggle and specify the target CPU and memory utilization separately. You can also set the minimum and maximum number of instances for your service, and Render will ensure your instance count stays within this range even if your average CPU/memory utilization is higher or lower than the target. This can be helpful for controlling costs, or if additional instances take a long time to start up.
Once autoscaling is enabled, Render periodically monitors the current average CPU and memory utilization across all instances of your service, and compares the values with the specified target utilization. We calculate the desired number of instances by multiplying the current number of instances by the ratio of the current utilization and target utilization. If both CPU and memory targets are specified, we calculate the desired number of instances for each target and apply the largest number.
For example, for a service with 2 instances, if the current CPU utilization is 80% and the desired CPU utilization is 60%,
Render will scale this service up to 3 instances since
ceil[2 * 80%/60%] = 3.
If this service also has a memory utilization target of 40% and the current memory utilization is 80%,
the desired number of instances based on memory will be 4, since
ceil[2 * 80%/40%] = 4. Since this is greater than the CPU target instance count (3), Render will scale this service to 4 instances.
We have different time windows for scaling up and down. The scale up window is short, so that services are quickly able to respond to traffic bursts. The scale down window, on the other hand, is much longer to prevent the number of instances from fluctuating up and down too much in a short amount of time.
Autoscaling events are created when autoscaling configuration changes or when scaling up/down happens. These can be found in the Events tab in your service dashboard.
You can view the number of instances and the average CPU and memory utilization over time in the Scaling tab for your Render service.
Your service is billed for the actual number of instances up at every second multiplied by the plan rate. There is no extra cost to enable autoscaling. Here are some examples to clarify.
If you have 2 instances running at all times (manual scaling), you’ll be billed 2x your plan’s rate. If your service is on the Starter plan for the whole month, you’ll be billed $7 × 2 = $14.
Suppose you have a Starter service with autoscaling enabled and, every day, your service scales to 2 instances for 6 hours and back to 1 instance for the remaining 18 hours,
the actual number of instances of is
(2 × 6 + 1 × 18) / 24 = 1.25, and you’ll be billed $7 × 1.25 = $8.75.
You can find the exact number of instance hours for your service for the month on the billing page and in the monthly invoices.
- Services with disks can only have a single instance and cannot be manually or automatically scaled. Consider moving persistent state out of services that you want to scale.
- There is a load balancer in front of service instances to evenly distribute network request traffic to scaled web services.