Best Practices for Maximizing Uptime
To help keep your Render services healthy and responsive, we recommend the following best practices. Many of these apply to services on any deployment platform—not just Render!
Server hardware isn’t perfect, and neither are the data centers that orchestrate that hardware. When you scale your service to multiple instances, Render runs those instances on different nodes. This means that if a particular instance (or an entire node) goes down, at least one instance of your service stays up and running.
When issues like these occur, Render also automatically moves affected services to new instances, but this can take a few minutes. By running multiple instances, your service remains up during the automatic transition.
Sometimes a service instance gets into an unresponsive state and needs a quick restart. Render can detect this situation with health checks.
You define an HTTP endpoint path in your service that always returns a
2xx response (if the service is functioning normally), and Render sends periodic requests to that path. If those requests fail several times in a row for a particular instance, Render restarts it.
Health checks also protect you from bad deploys: if a new deploy fails its health check, Render keeps the previous deploy running.
All inbound requests to Render web services pass through Cloudflare for DDoS protection:
Cloudflare assigns a unique ID to each request and sets it as the value of the
CF-Ray HTTP header. Render includes this header in the request it sends along to your service.
Whenever your web service receives an incoming request, it should include the value of the
CF-Ray header in all logs generated for that request, including logging as soon as the request is received.
CF-Ray ID for each request helps you trace the execution of your individual requests, and it also helps Render’s support team diagnose any issues that might occur earlier in this request flow.
An external monitoring probe is similar to a health check, but it sends periodic HTTP requests to your web service from outside Render. This more closely simulates traffic from your service’s users.
We recommend creating your probe with a third-party monitoring provider, such as Heii On-Call or Better Stack. In case of an incident, your provider will send a notification that includes the
CF-Ray ID returned by Cloudflare. This is handy for debugging in combination with your service’s logs, or for sharing with Render support.
If you maintain long-lived connections to your service (such as over WebSocket), make sure to implement retry logic for those connections. Render routes each connection to a particular instance of your service running on a particular machine, and Render might replace an instance at any time as part of a deploy or standard maintenance.
Replacing an instance this way is a zero-downtime event, but terminating the old instance does by necessity terminate all connections to it. By implementing retry logic, you can quickly restore your long-lived connection to a running instance.
Database hiccups happen, and you can resolve them much faster when you’ve prepared ahead of time. Make sure you’ve thoroughly tested your data backup and recovery procedure, so you can fix your service as quickly as possible whenever the time comes.