Runtime errors — When deploys go wrong

This is the family of failures the user sees. The deploy went green, the service is live, but requests are coming back wrong. Browser shows a 502 Bad Gateway, the API returns 500 Internal Server Error for some routes, or a Django admin user complains they can’t log in.

The HTTP status code is your first signal. Render’s troubleshooting docs lay out the most common causes per code - this step expands them with the diagnostic moves and code-level fixes.

The HTTP code → cause map

flowchart TB
  code{"HTTP code<br/>from the browser<br/>or curl"}
  c400["400 Bad Request"]
  c404["404 Not Found"]
  c500["500 Internal Server Error"]
  c502["502 Bad Gateway"]
  c503["503 Service Unavailable"]

  code --> c400 --> sec1["Section 1:<br/>Django ALLOWED_HOSTS"]
  code --> c404 --> sec2["Section 2:<br/>Routing, redirects,<br/>missing files"]
  code --> c500 --> sec3["Section 3:<br/>Uncaught exceptions<br/>+ DB connections"]
  code --> c502 --> sec4["Section 4:<br/>Render edge can't reach<br/>your process"]
  code --> c503 --> sec5["Section 5:<br/>No instances available<br/>(overload, scaling)"]

The first move on any runtime error is to distinguish between Render’s edge proxy and your app. The codes themselves tell you which:

400, 404, 500 come from your application. Render’s edge just forwarded them.
502 comes from Render’s edge - it tried to talk to your process and couldn’t. The cause is usually your process, not the platform.
503 can be either; check the logs.

1. `400 Bad Request` (often Django ALLOWED_HOSTS)

What you’ll see

In the browser:

Bad Request (400)

In the runtime logs:

DisallowedHost: Invalid HTTP_HOST header: 'api.example.com'.
You may need to add 'api.example.com' to ALLOWED_HOSTS.

What it means

A request reached your Django app with a Host header it doesn’t recognise. Django (and a few other frameworks) gatekeep on Host for security; you have to add your domains explicitly.

The fix

import os

# Read from env so each environment configures itself
ALLOWED_HOSTS = os.environ.get("ALLOWED_HOSTS", "").split(",")

# Or, for Render specifically, use the Render-provided env var
if "RENDER_EXTERNAL_HOSTNAME" in os.environ:
    ALLOWED_HOSTS.append(os.environ["RENDER_EXTERNAL_HOSTNAME"])

Render injects RENDER_EXTERNAL_HOSTNAME automatically (e.g. myapp.onrender.com). When you add a custom domain, add it to ALLOWED_HOSTS too - either via the env var or directly.

2. `404 Not Found`

What you’ll see

curl -i https://myapp.onrender.com/api/users
HTTP/2 404
content-type: text/html

What it means and the four common causes

Cause	Diagnosis	Fix
SPA without rewrite rule	Static site loads `/` but any deep link 404s	Add a rewrite rule from `/*` to `/index.html` in the service’s Redirects/Rewrites section
Wrong route in code	curl with a typo’d path returns 404	Fix the route definition
Missing static file	Image or CSS 404s but only on Render	Filesystem casing bug (Linux is case-sensitive) - see step 04
Persistent file missing	A file your app wrote yesterday is gone	Your service has no persistent disk; the filesystem is ephemeral. Add a persistent disk or use object storage

The SPA case is the most common in practice. A typical React/Vue SPA on a Render static site needs:

services:
  - type: web
    name: web
    runtime: static
    buildCommand: npm run build
    staticPublishPath: ./dist
    routes:
      - type: rewrite
        source: /*
        destination: /index.html

Without this, hitting /dashboard directly (e.g. via a refresh) hits the file system, finds no dashboard.html, and 404s. With it, the static server returns index.html for unknown paths and the SPA router takes over.

3. `500 Internal Server Error`

What you’ll see

curl -i https://myapp.onrender.com/api/users
HTTP/2 500
{"error": "Internal Server Error"}

What it means

Your app threw an unhandled exception responding to the request. The interesting information is in the logs, not the response.

Diagnosis

Grab the matching log line:

SRV=srv-xxxxx
# Filter for errors in the last 10 minutes
render logs -r "$SRV" --start 10m --level error --limit 20

The first match is almost always your stack trace.

Common 500 patterns

Database connection issues:

psycopg2.OperationalError: SSL connection has been closed unexpectedly

This usually means one of:

The connection has been idle long enough that the database closed it but the pool didn’t notice.
The pool is exhausted and a new connection failed.
The DB and the service are in different regions (cross-region adds latency that breaks short timeouts).

Fix: add a connection pool, use pool_pre_ping=True in SQLAlchemy (or your stack’s equivalent), and make sure sslmode=require is in the URL if you’re using the external Postgres URL. (Internal URL doesn’t need it.) The Postgres tutorial step 06 goes deeper.

Uncaught exception in a request handler:

TypeError: Cannot read properties of undefined (reading 'email')
    at /opt/render/project/src/routes/users.js:42:23

Fix the bug. There’s nothing platform-specific here - but two Render-flavored notes:

An uncaught exception that escapes the process (not the request handler) will crash the whole instance, not just the one request. Use your framework’s error middleware to catch.
Render restarts crashed processes automatically, so a sufficiently-bad bug shows up as both 500s on individual requests and restart events in the runtime log.

Resource exhaustion:

[CRITICAL] WORKER TIMEOUT (pid:42)
psycopg2.pool.PoolError: connection pool exhausted

The service is overwhelmed. The fix depends on which resource:

Symptom	Fix
All workers timing out	Slow upstream (DB, third-party API). Add timeouts, retries with backoff, or cache the upstream
Connection pool exhausted	Too many concurrent requests for your pool size. Increase pool, add PgBouncer, or upgrade Postgres plan
CPU pegged at 100%	Need bigger instance or autoscaling. See scaling docs

4. `502 Bad Gateway`

The most “platform-feeling” error, and the one most-often misdiagnosed.

What you’ll see

curl -i https://myapp.onrender.com/
HTTP/2 502
< Bad Gateway

The runtime logs at the same time:

(possibly nothing, or:)
[ERROR] Connection reset by peer

What it means

Render’s edge proxy tried to forward your request to a service instance and couldn’t get a response. This is almost always one of:

Cause	Tell
Service isn’t bound to `0.0.0.0:$PORT` (regression from step 05)	Persistent 502s on every request, even right after deploy
New custom domain not propagated yet	502s on a new domain that resolve within an hour
Node `keepAliveTimeout` too short	Intermittent 502s under low traffic, especially on Node
Worker killed mid-request (SIGTERM/OOM)	502 on one request, others succeed
Service has zero healthy instances	Deploy is in progress, or you’re scaling from 0

The Node keep-alive 502

A subtle but common one worth its own callout. Render’s edge keeps HTTP/1.1 connections alive to your instance for reuse. If your Node server closes them faster than the edge does, the edge tries to reuse a closed connection - and the user sees a 502.

const server = app.listen(port, "0.0.0.0");
// Render's edge waits up to 75s. Match or exceed that.
server.keepAliveTimeout = 120 * 1000;   // 120 seconds
server.headersTimeout    = 120 * 1000;

Without this, you’ll see intermittent 502s that don’t correlate with any error in your logs. The Node defaults are too aggressive for Render’s edge timing.

The gunicorn `WORKER TIMEOUT` 502

Python services using gunicorn have an analogous issue: workers are killed after --timeout seconds, and any in-flight request becomes a 502.

[CRITICAL] WORKER TIMEOUT (pid:42)
[ERROR] Worker (pid:42) was sent SIGKILL!
[INFO] Booting worker with pid: 89

Bump the timeout for long requests:

gunicorn myapp:app --bind 0.0.0.0:$PORT --workers 4 --timeout 120

But also ask: should this request really take 120 seconds? If yes, move it to a background worker (synchronous web requests over 30 seconds are a UX problem regardless of platform).

5. `503 Service Unavailable`

Less common, but a clear signal.

What you’ll see

HTTP/2 503
< Service Unavailable

What it means

Render’s edge has no healthy instances to route to. Either:

A deploy is in progress and the old version is gone before the new version is live.
All instances are over capacity and refusing new connections.
Autoscaling can’t keep up with the spike.

The fix

For “during deploy” 503s, ensure zero-downtime deploys are working - the new instance must pass its health check before the old one is torn down. If you don’t have a healthCheckPath configured, you don’t get zero-downtime. Set one (see step 06).

For “overloaded” 503s, scale. The scaling docs cover autoscaling targets and manual instance counts.

Telling platform errors from application errors

The single most useful skill for runtime triage: distinguishing what’s your problem from what’s a platform issue.

Signal	”It’s your app"	"It’s the platform”
HTTP code	400, 404, 500	502 (sometimes), 503
Logs at the time	Stack trace in your code	No application output, just Render’s `==>` lines
Affected services	Just yours	Multiple services across the workspace
Affected users	Specific cohorts or actions	Everyone
Status page	Green	Incidents listed

When in doubt, check status.render.com before opening a support ticket. A real platform issue will be acknowledged there within minutes, often with running updates.

A worked example: intermittent 502s

A user reports: “Every now and then our API returns 502, but it’s not consistent. The deploys are green, the service is showing live, but maybe 1 in 50 requests fails.”

Walking the method:

Reproduce: non-deterministic, but consistently 1-2%. Not “every request”, not “no requests”. That’s a clue.
Locate the surface: runtime - service is live, errors are mid-request.
Read the first error: logs show no stack trace at the time of the 502. Just ECONNRESET occasionally.
Hypothesis: Node’s keep-alive timeout is shorter than Render’s edge, so the edge sometimes tries to reuse a closed connection.
Smallest fix: add server.keepAliveTimeout = 120000 and server.headersTimeout = 120000 to the listener, redeploy.
Result: 502s drop to 0.

The key clue was step 1: 1-in-50 is not “every request” (which would be a port binding issue) and not “no requests” (which would be normal). The intermittency pointed at a timing issue, and the absence of stack traces ruled out a regular bug. Pattern recognition saved a 30-minute investigation.

A Node service is intermittently returning 502s under low traffic. The runtime logs show no stack traces around the failures, just an occasional `ECONNRESET`. The service is bound correctly to `0.0.0.0:$PORT`. What's the most likely fix?

Switch the service to a Docker runtimeSet `server.keepAliveTimeout = 120000` and `server.headersTimeout = 120000`, then redeployAdd a health check at `/healthz`Upgrade to a larger instance size

What you learned

**400** is usually an app-level allowlist (Django `ALLOWED_HOSTS`); add the hostname
**404** breaks down into: SPA without rewrite, wrong route, case-sensitive filename, ephemeral filesystem with no disk
**500** is your application throwing - grab the stack trace from the runtime logs; common causes are DB SSL, uncaught exceptions, and pool exhaustion
**502** is Render's edge unable to reach your process. For Node, fix `keepAliveTimeout` / `headersTimeout` to match the edge. For Python, check gunicorn worker timeouts
**503** means no healthy instances - either mid-deploy without a health check, or overloaded
Check the status page before assuming a platform bug. Multiple-service-wide failures correlate with platform; single-service correlate with code

The HTTP code → cause map

1. 400 Bad Request (often Django ALLOWED_HOSTS)

What you’ll see

What it means

The fix

2. 404 Not Found

What you’ll see

What it means and the four common causes

3. 500 Internal Server Error

What you’ll see

What it means

Diagnosis

Common 500 patterns

4. 502 Bad Gateway

What you’ll see

What it means

The Node keep-alive 502

The gunicorn WORKER TIMEOUT 502

5. 503 Service Unavailable

What you’ll see

What it means

The fix

Telling platform errors from application errors

A worked example: intermittent 502s

What you learned

1. `400 Bad Request` (often Django ALLOWED_HOSTS)

2. `404 Not Found`

3. `500 Internal Server Error`

4. `502 Bad Gateway`

The gunicorn `WORKER TIMEOUT` 502

5. `503 Service Unavailable`