Render Tutorials
When deploys go wrong

Runtime errors

⏱ 9 min

This is the family of failures the user sees. The deploy went green, the service is live, but requests are coming back wrong. Browser shows a 502 Bad Gateway, the API returns 500 Internal Server Error for some routes, or a Django admin user complains they can’t log in.

The HTTP status code is your first signal. Render’s troubleshooting docs lay out the most common causes per code - this step expands them with the diagnostic moves and code-level fixes.

The HTTP code → cause map

flowchart TB
  code{"HTTP code<br/>from the browser<br/>or curl"}
  c400["400 Bad Request"]
  c404["404 Not Found"]
  c500["500 Internal Server Error"]
  c502["502 Bad Gateway"]
  c503["503 Service Unavailable"]

  code --> c400 --> sec1["Section 1:<br/>Django ALLOWED_HOSTS"]
  code --> c404 --> sec2["Section 2:<br/>Routing, redirects,<br/>missing files"]
  code --> c500 --> sec3["Section 3:<br/>Uncaught exceptions<br/>+ DB connections"]
  code --> c502 --> sec4["Section 4:<br/>Render edge can't reach<br/>your process"]
  code --> c503 --> sec5["Section 5:<br/>No instances available<br/>(overload, scaling)"]

The first move on any runtime error is to distinguish between Render’s edge proxy and your app. The codes themselves tell you which:

  • 400, 404, 500 come from your application. Render’s edge just forwarded them.
  • 502 comes from Render’s edge - it tried to talk to your process and couldn’t. The cause is usually your process, not the platform.
  • 503 can be either; check the logs.

1. 400 Bad Request (often Django ALLOWED_HOSTS)

What you’ll see

In the browser:

Bad Request (400)

In the runtime logs:

Django log
DisallowedHost: Invalid HTTP_HOST header: 'api.example.com'.
You may need to add 'api.example.com' to ALLOWED_HOSTS.

What it means

A request reached your Django app with a Host header it doesn’t recognise. Django (and a few other frameworks) gatekeep on Host for security; you have to add your domains explicitly.

The fix

settings.py
import os
# Read from env so each environment configures itself
ALLOWED_HOSTS = os.environ.get("ALLOWED_HOSTS", "").split(",")
# Or, for Render specifically, use the Render-provided env var
if "RENDER_EXTERNAL_HOSTNAME" in os.environ:
ALLOWED_HOSTS.append(os.environ["RENDER_EXTERNAL_HOSTNAME"])

Render injects RENDER_EXTERNAL_HOSTNAME automatically (e.g. myapp.onrender.com). When you add a custom domain, add it to ALLOWED_HOSTS too - either via the env var or directly.

2. 404 Not Found

What you’ll see

curl
curl -i https://myapp.onrender.com/api/users
HTTP/2 404
content-type: text/html

What it means and the four common causes

CauseDiagnosisFix
SPA without rewrite ruleStatic site loads / but any deep link 404sAdd a rewrite rule from /* to /index.html in the service’s Redirects/Rewrites section
Wrong route in codecurl with a typo’d path returns 404Fix the route definition
Missing static fileImage or CSS 404s but only on RenderFilesystem casing bug (Linux is case-sensitive) - see step 04
Persistent file missingA file your app wrote yesterday is goneYour service has no persistent disk; the filesystem is ephemeral. Add a persistent disk or use object storage

The SPA case is the most common in practice. A typical React/Vue SPA on a Render static site needs:

render.yaml: SPA rewrite
services:
- type: web
name: web
runtime: static
buildCommand: npm run build
staticPublishPath: ./dist
routes:
- type: rewrite
source: /*
destination: /index.html

Without this, hitting /dashboard directly (e.g. via a refresh) hits the file system, finds no dashboard.html, and 404s. With it, the static server returns index.html for unknown paths and the SPA router takes over.

3. 500 Internal Server Error

What you’ll see

curl
curl -i https://myapp.onrender.com/api/users
HTTP/2 500
{"error": "Internal Server Error"}

What it means

Your app threw an unhandled exception responding to the request. The interesting information is in the logs, not the response.

Diagnosis

Grab the matching log line:

Terminal
SRV=srv-xxxxx
# Filter for errors in the last 10 minutes
render logs -r "$SRV" --start 10m --level error --limit 20

The first match is almost always your stack trace.

Common 500 patterns

Database connection issues:

Runtime log: Postgres SSL
psycopg2.OperationalError: SSL connection has been closed unexpectedly

This usually means one of:

  • The connection has been idle long enough that the database closed it but the pool didn’t notice.
  • The pool is exhausted and a new connection failed.
  • The DB and the service are in different regions (cross-region adds latency that breaks short timeouts).

Fix: add a connection pool, use pool_pre_ping=True in SQLAlchemy (or your stack’s equivalent), and make sure sslmode=require is in the URL if you’re using the external Postgres URL. (Internal URL doesn’t need it.) The Postgres tutorial step 06 goes deeper.

Uncaught exception in a request handler:

Node runtime log
TypeError: Cannot read properties of undefined (reading 'email')
at /opt/render/project/src/routes/users.js:42:23

Fix the bug. There’s nothing platform-specific here - but two Render-flavored notes:

  • An uncaught exception that escapes the process (not the request handler) will crash the whole instance, not just the one request. Use your framework’s error middleware to catch.
  • Render restarts crashed processes automatically, so a sufficiently-bad bug shows up as both 500s on individual requests and restart events in the runtime log.

Resource exhaustion:

Runtime log: thread pool exhaustion
[CRITICAL] WORKER TIMEOUT (pid:42)
psycopg2.pool.PoolError: connection pool exhausted

The service is overwhelmed. The fix depends on which resource:

SymptomFix
All workers timing outSlow upstream (DB, third-party API). Add timeouts, retries with backoff, or cache the upstream
Connection pool exhaustedToo many concurrent requests for your pool size. Increase pool, add PgBouncer, or upgrade Postgres plan
CPU pegged at 100%Need bigger instance or autoscaling. See scaling docs

4. 502 Bad Gateway

The most “platform-feeling” error, and the one most-often misdiagnosed.

What you’ll see

curl
curl -i https://myapp.onrender.com/
HTTP/2 502
< Bad Gateway

The runtime logs at the same time:

Runtime log: nothing obvious
(possibly nothing, or:)
[ERROR] Connection reset by peer

What it means

Render’s edge proxy tried to forward your request to a service instance and couldn’t get a response. This is almost always one of:

CauseTell
Service isn’t bound to 0.0.0.0:$PORT (regression from step 05)Persistent 502s on every request, even right after deploy
New custom domain not propagated yet502s on a new domain that resolve within an hour
Node keepAliveTimeout too shortIntermittent 502s under low traffic, especially on Node
Worker killed mid-request (SIGTERM/OOM)502 on one request, others succeed
Service has zero healthy instancesDeploy is in progress, or you’re scaling from 0

The Node keep-alive 502

A subtle but common one worth its own callout. Render’s edge keeps HTTP/1.1 connections alive to your instance for reuse. If your Node server closes them faster than the edge does, the edge tries to reuse a closed connection - and the user sees a 502.

server.js: tune keep-alive timeouts
const server = app.listen(port, "0.0.0.0");
// Render's edge waits up to 75s. Match or exceed that.
server.keepAliveTimeout = 120 * 1000; // 120 seconds
server.headersTimeout = 120 * 1000;

Without this, you’ll see intermittent 502s that don’t correlate with any error in your logs. The Node defaults are too aggressive for Render’s edge timing.

The gunicorn WORKER TIMEOUT 502

Python services using gunicorn have an analogous issue: workers are killed after --timeout seconds, and any in-flight request becomes a 502.

Runtime log
[CRITICAL] WORKER TIMEOUT (pid:42)
[ERROR] Worker (pid:42) was sent SIGKILL!
[INFO] Booting worker with pid: 89

Bump the timeout for long requests:

startCommand
gunicorn myapp:app --bind 0.0.0.0:$PORT --workers 4 --timeout 120

But also ask: should this request really take 120 seconds? If yes, move it to a background worker (synchronous web requests over 30 seconds are a UX problem regardless of platform).

5. 503 Service Unavailable

Less common, but a clear signal.

What you’ll see

HTTP/2 503
< Service Unavailable

What it means

Render’s edge has no healthy instances to route to. Either:

  • A deploy is in progress and the old version is gone before the new version is live.
  • All instances are over capacity and refusing new connections.
  • Autoscaling can’t keep up with the spike.

The fix

For “during deploy” 503s, ensure zero-downtime deploys are working - the new instance must pass its health check before the old one is torn down. If you don’t have a healthCheckPath configured, you don’t get zero-downtime. Set one (see step 06).

For “overloaded” 503s, scale. The scaling docs cover autoscaling targets and manual instance counts.

Telling platform errors from application errors

The single most useful skill for runtime triage: distinguishing what’s your problem from what’s a platform issue.

Signal”It’s your app""It’s the platform”
HTTP code400, 404, 500502 (sometimes), 503
Logs at the timeStack trace in your codeNo application output, just Render’s ==> lines
Affected servicesJust yoursMultiple services across the workspace
Affected usersSpecific cohorts or actionsEveryone
Status pageGreenIncidents listed

When in doubt, check status.render.com before opening a support ticket. A real platform issue will be acknowledged there within minutes, often with running updates.

A worked example: intermittent 502s

A user reports: “Every now and then our API returns 502, but it’s not consistent. The deploys are green, the service is showing live, but maybe 1 in 50 requests fails.”

Walking the method:

  1. Reproduce: non-deterministic, but consistently 1-2%. Not “every request”, not “no requests”. That’s a clue.
  2. Locate the surface: runtime - service is live, errors are mid-request.
  3. Read the first error: logs show no stack trace at the time of the 502. Just ECONNRESET occasionally.
  4. Hypothesis: Node’s keep-alive timeout is shorter than Render’s edge, so the edge sometimes tries to reuse a closed connection.
  5. Smallest fix: add server.keepAliveTimeout = 120000 and server.headersTimeout = 120000 to the listener, redeploy.
  6. Result: 502s drop to 0.

The key clue was step 1: 1-in-50 is not “every request” (which would be a port binding issue) and not “no requests” (which would be normal). The intermittency pointed at a timing issue, and the absence of stack traces ruled out a regular bug. Pattern recognition saved a 30-minute investigation.

A Node service is intermittently returning 502s under low traffic. The runtime logs show no stack traces around the failures, just an occasional `ECONNRESET`. The service is bound correctly to `0.0.0.0:$PORT`. What's the most likely fix?

What you learned

  • **400** is usually an app-level allowlist (Django `ALLOWED_HOSTS`); add the hostname
  • **404** breaks down into: SPA without rewrite, wrong route, case-sensitive filename, ephemeral filesystem with no disk
  • **500** is your application throwing - grab the stack trace from the runtime logs; common causes are DB SSL, uncaught exceptions, and pool exhaustion
  • **502** is Render's edge unable to reach your process. For Node, fix `keepAliveTimeout` / `headersTimeout` to match the edge. For Python, check gunicorn worker timeouts
  • **503** means no healthy instances - either mid-deploy without a health check, or overloaded
  • Check the status page before assuming a platform bug. Multiple-service-wide failures correlate with platform; single-service correlate with code