Does Render autoscale on every plan?

No. Manual scaling is available on all workspaces. Autoscaling requires a Pro workspace or higher.

Can I autoscale a service with a persistent disk?

No. A service with an attached persistent disk cannot scale to multiple instances. Use external storage or a managed datastore if you need both persistence and horizontal scaling.

What status code should my health check return?

Return any 2xx or 3xx status when the instance is ready to serve traffic. Return a 4xx or 5xx status, such as 503, while dependencies are still initializing.

How quickly does Render scale up during a spike?

Render scales up immediately when averaged utilization exceeds your target. New instances still need time to boot and pass health checks before they receive traffic.

Why did my service scale up but users still saw errors?

Existing instances may have saturated before new ones became ready. Lower targets, increase minInstances, or reduce startup time so capacity arrives earlier.

Can I set both CPU and memory autoscaling targets?

Yes. Render calculates a recommended instance count for each enabled target and applies whichever result is larger.

Do I need an external connection pooler when scaling web services?

If your app opens a database connection per instance, yes. Render Postgres connection pooling helps prevent connection exhaustion as instance count grows.

Platform

How Render handles traffic spikes

April 01, 2026

A traffic spike happens when incoming HTTP requests suddenly exceed what your current web service instances can handle. Render already routes public traffic through a load balancer and can run multiple instances of the same service, but surviving a spike still depends on how you configure scaling, health checks, and downstream dependencies.

Render handles load balancing for you. You decide whether to scale manually, enable autoscaling, and how aggressively to add capacity before existing instances saturate.

When traffic increases, requests reach Render's load balancer first. The load balancer distributes HTTP and HTTPS traffic across healthy instances of your web service.

If you enable autoscaling, Render periodically calculates average CPU and/or memory utilization across all running instances. When utilization exceeds your target, Render provisions additional instances. Render scales up immediately when load rises. It waits a few minutes before scaling down so brief spikes do not cause unnecessary churn.

Autoscaling reacts to metrics, so new capacity is not instantaneous. While new instances boot, existing instances continue serving traffic. If they saturate completely, clients may see errors such as 502 Bad Gateway or 503 Service Unavailable until additional instances pass their readiness checks.

New instances do not receive traffic until they pass their health checks. By default, Render probes each instance with a TCP connection check, so gating happens even if you configure nothing. For application-level readiness, configure a health check path: Render then sends HTTP GET requests to that endpoint, and a healthy instance responds with any 2xx or 3xx status code. During startup, the same endpoint can return 503 so Render does not route production traffic too early.

You can scale a web service in two ways:

Method	Who can use it	Behavior
Manual scaling	All Render workspaces	You set a fixed instance count (1 to 100).
Autoscaling	Pro workspaces and higher	Render adjusts instance count between your minimum and maximum based on CPU and/or memory targets.

Configure either option from your service's Scaling page in the Render Dashboard or in a render.yaml Blueprint file.

In Blueprint, autoscaling uses a scaling block:

yaml

services:
  - type: web
    name: api
    runtime: node
    plan: standard
    healthCheckPath: /health
    scaling:
      minInstances: 2
      maxInstances: 5
      targetCPUPercent: 60
      # Optional: targetMemoryPercent: 60

Set minInstances to at least 2 in production if you want redundancy across multiple running instances before a spike hits. Set targetCPUPercent and/or targetMemoryPercent conservatively so new instances have time to boot before existing ones saturate. If you enable both CPU and memory targets, Render calculates a scale recommendation for each and uses the larger result.

Keep these platform limits in mind:

Services with an attached persistent disk cannot scale to multiple instances.
Each scaled instance uses the same instance type and is billed for compute usage prorated by the second.
You can scale up to 100 instances per service.

Because autoscaling responds to averaged metrics over time, load testing is the most reliable way to validate your thresholds. Boot time matters: a Node.js app that needs 45 seconds to start needs different headroom than a Go binary that starts in seconds.

This minimal Node.js handler shows one way to generate sustained CPU load in a test environment:

javascript

const crypto = require('crypto');

// Warning: teaching example only. Do not expose in production.
exports.cpuSpike = (req, res) => {
  const start = Date.now();
  while (Date.now() - start < 5000) {
    crypto.pbkdf2Sync('secret', 'salt', 10000, 64, 'sha512');
  }
  res.status(200).send('Spike complete');
};

Remove or protect endpoints like this before production. Prefer dedicated load-testing tools such as k6 or Artillery to simulate realistic concurrent traffic instead of a single hot loop.

Targets set too high. Render caps autoscaling targets at 90% (the valid range is 1–90). Even at the top of that range, autoscaling may trigger only after instances are already struggling. Many teams start around 60% to 75% so new instances have time to boot before existing ones saturate.

Slow startup times. Autoscaling adds instances, but it cannot make a slow app start faster. Use a readiness-focused health check and reduce startup work where you can.

Treating autoscaling as a memory-leak fix. If RAM usage climbs because of a leak, autoscaling may keep adding instances until it hits maxInstances and the service still fails.

Forgetting database connection limits. Horizontal scaling multiplies open connections to Render Postgres. Enable integrated connection pooling or another pooler so scale-up events do not exhaust database connections.

Scaling Render services: manual scaling, autoscaling formulas, and billing for scaled services.
Health checks: verify readiness before Render routes traffic to new instances.
Connection pooling for Render Postgres: multiplex connections as instance count grows.
Blueprint specification: define scaling, healthCheckPath, and related service settings in render.yaml.

How Render handles traffic spikes

How traffic spikes affect your web service

What happens during a spike

Configure autoscaling on Render

Test your scaling configuration

Common mistakes to avoid

Next steps

Frequently asked questions