How Render handles traffic spikes
How traffic spikes affect your web service
A traffic spike happens when incoming HTTP requests suddenly exceed what your current web service instances can handle. Render already routes public traffic through a load balancer and can run multiple instances of the same service, but surviving a spike still depends on how you configure scaling, health checks, and downstream dependencies.
Render handles load balancing for you. You decide whether to scale manually, enable autoscaling, and how aggressively to add capacity before existing instances saturate.
What happens during a spike
When traffic increases, requests reach Render's load balancer first. The load balancer distributes HTTP and HTTPS traffic across healthy instances of your web service.
If you enable autoscaling, Render periodically calculates average CPU and/or memory utilization across all running instances. When utilization exceeds your target, Render provisions additional instances. Render scales up immediately when load rises. It waits a few minutes before scaling down so brief spikes do not cause unnecessary churn.
Autoscaling reacts to metrics, so new capacity is not instantaneous. While new instances boot, existing instances continue serving traffic. If they saturate completely, clients may see errors such as 502 Bad Gateway or 503 Service Unavailable until additional instances pass their readiness checks.
New instances do not receive traffic until they pass their health checks. By default, Render probes each instance with a TCP connection check, so gating happens even if you configure nothing. For application-level readiness, configure a health check path: Render then sends HTTP GET requests to that endpoint, and a healthy instance responds with any 2xx or 3xx status code. During startup, the same endpoint can return 503 so Render does not route production traffic too early.
Configure autoscaling on Render
You can scale a web service in two ways:
Method | Who can use it | Behavior |
|---|---|---|
All Render workspaces | You set a fixed instance count (1 to 100). | |
Render adjusts instance count between your minimum and maximum based on CPU and/or memory targets. |
Configure either option from your service's Scaling page in the Render Dashboard or in a render.yaml Blueprint file.
In Blueprint, autoscaling uses a scaling block:
Set minInstances to at least 2 in production if you want redundancy across multiple running instances before a spike hits. Set targetCPUPercent and/or targetMemoryPercent conservatively so new instances have time to boot before existing ones saturate. If you enable both CPU and memory targets, Render calculates a scale recommendation for each and uses the larger result.
Keep these platform limits in mind:
- Services with an attached persistent disk cannot scale to multiple instances.
- Each scaled instance uses the same instance type and is billed for compute usage prorated by the second.
- You can scale up to 100 instances per service.
Test your scaling configuration
Because autoscaling responds to averaged metrics over time, load testing is the most reliable way to validate your thresholds. Boot time matters: a Node.js app that needs 45 seconds to start needs different headroom than a Go binary that starts in seconds.
This minimal Node.js handler shows one way to generate sustained CPU load in a test environment:
Remove or protect endpoints like this before production. Prefer dedicated load-testing tools such as k6 or Artillery to simulate realistic concurrent traffic instead of a single hot loop.
Common mistakes to avoid
Targets set too high. Render caps autoscaling targets at 90% (the valid range is 1–90). Even at the top of that range, autoscaling may trigger only after instances are already struggling. Many teams start around 60% to 75% so new instances have time to boot before existing ones saturate.
Slow startup times. Autoscaling adds instances, but it cannot make a slow app start faster. Use a readiness-focused health check and reduce startup work where you can.
Treating autoscaling as a memory-leak fix. If RAM usage climbs because of a leak, autoscaling may keep adding instances until it hits maxInstances and the service still fails.
Forgetting database connection limits. Horizontal scaling multiplies open connections to Render Postgres. Enable integrated connection pooling or another pooler so scale-up events do not exhaust database connections.
Next steps
- Scaling Render services: manual scaling, autoscaling formulas, and billing for scaled services.
- Health checks: verify readiness before Render routes traffic to new instances.
- Connection pooling for Render Postgres: multiplex connections as instance count grows.
- Blueprint specification: define
scaling,healthCheckPath, and related service settings inrender.yaml.