This is the family of failures the user sees. The deploy went green, the service is live, but requests are coming back wrong. Browser shows a 502 Bad Gateway, the API returns 500 Internal Server Error for some routes, or a Django admin user complains they can’t log in.
The HTTP status code is your first signal. Render’s troubleshooting docs lay out the most common causes per code - this step expands them with the diagnostic moves and code-level fixes.
The HTTP code → cause map
flowchart TB
code{"HTTP code<br/>from the browser<br/>or curl"}
c400["400 Bad Request"]
c404["404 Not Found"]
c500["500 Internal Server Error"]
c502["502 Bad Gateway"]
c503["503 Service Unavailable"]
code --> c400 --> sec1["Section 1:<br/>Django ALLOWED_HOSTS"]
code --> c404 --> sec2["Section 2:<br/>Routing, redirects,<br/>missing files"]
code --> c500 --> sec3["Section 3:<br/>Uncaught exceptions<br/>+ DB connections"]
code --> c502 --> sec4["Section 4:<br/>Render edge can't reach<br/>your process"]
code --> c503 --> sec5["Section 5:<br/>No instances available<br/>(overload, scaling)"]
The first move on any runtime error is to distinguish between Render’s edge proxy and your app. The codes themselves tell you which:
- 400, 404, 500 come from your application. Render’s edge just forwarded them.
- 502 comes from Render’s edge - it tried to talk to your process and couldn’t. The cause is usually your process, not the platform.
- 503 can be either; check the logs.
1. 400 Bad Request (often Django ALLOWED_HOSTS)
What you’ll see
In the browser:
Bad Request (400)In the runtime logs:
DisallowedHost: Invalid HTTP_HOST header: 'api.example.com'.You may need to add 'api.example.com' to ALLOWED_HOSTS.What it means
A request reached your Django app with a Host header it doesn’t recognise. Django (and a few other frameworks) gatekeep on Host for security; you have to add your domains explicitly.
The fix
import os
# Read from env so each environment configures itselfALLOWED_HOSTS = os.environ.get("ALLOWED_HOSTS", "").split(",")
# Or, for Render specifically, use the Render-provided env varif "RENDER_EXTERNAL_HOSTNAME" in os.environ: ALLOWED_HOSTS.append(os.environ["RENDER_EXTERNAL_HOSTNAME"])Render injects RENDER_EXTERNAL_HOSTNAME automatically (e.g. myapp.onrender.com). When you add a custom domain, add it to ALLOWED_HOSTS too - either via the env var or directly.
2. 404 Not Found
What you’ll see
curl -i https://myapp.onrender.com/api/usersHTTP/2 404content-type: text/htmlWhat it means and the four common causes
| Cause | Diagnosis | Fix |
|---|---|---|
| SPA without rewrite rule | Static site loads / but any deep link 404s | Add a rewrite rule from /* to /index.html in the service’s Redirects/Rewrites section |
| Wrong route in code | curl with a typo’d path returns 404 | Fix the route definition |
| Missing static file | Image or CSS 404s but only on Render | Filesystem casing bug (Linux is case-sensitive) - see step 04 |
| Persistent file missing | A file your app wrote yesterday is gone | Your service has no persistent disk; the filesystem is ephemeral. Add a persistent disk or use object storage |
The SPA case is the most common in practice. A typical React/Vue SPA on a Render static site needs:
services: - type: web name: web runtime: static buildCommand: npm run build staticPublishPath: ./dist routes: - type: rewrite source: /* destination: /index.htmlWithout this, hitting /dashboard directly (e.g. via a refresh) hits the file system, finds no dashboard.html, and 404s. With it, the static server returns index.html for unknown paths and the SPA router takes over.
3. 500 Internal Server Error
What you’ll see
curl -i https://myapp.onrender.com/api/usersHTTP/2 500{"error": "Internal Server Error"}What it means
Your app threw an unhandled exception responding to the request. The interesting information is in the logs, not the response.
Diagnosis
Grab the matching log line:
SRV=srv-xxxxx# Filter for errors in the last 10 minutesrender logs -r "$SRV" --start 10m --level error --limit 20The first match is almost always your stack trace.
Common 500 patterns
Database connection issues:
psycopg2.OperationalError: SSL connection has been closed unexpectedlyThis usually means one of:
- The connection has been idle long enough that the database closed it but the pool didn’t notice.
- The pool is exhausted and a new connection failed.
- The DB and the service are in different regions (cross-region adds latency that breaks short timeouts).
Fix: add a connection pool, use pool_pre_ping=True in SQLAlchemy (or your stack’s equivalent), and make sure sslmode=require is in the URL if you’re using the external Postgres URL. (Internal URL doesn’t need it.) The Postgres tutorial step 06 goes deeper.
Uncaught exception in a request handler:
TypeError: Cannot read properties of undefined (reading 'email') at /opt/render/project/src/routes/users.js:42:23Fix the bug. There’s nothing platform-specific here - but two Render-flavored notes:
- An uncaught exception that escapes the process (not the request handler) will crash the whole instance, not just the one request. Use your framework’s error middleware to catch.
- Render restarts crashed processes automatically, so a sufficiently-bad bug shows up as both 500s on individual requests and restart events in the runtime log.
Resource exhaustion:
[CRITICAL] WORKER TIMEOUT (pid:42)psycopg2.pool.PoolError: connection pool exhaustedThe service is overwhelmed. The fix depends on which resource:
| Symptom | Fix |
|---|---|
| All workers timing out | Slow upstream (DB, third-party API). Add timeouts, retries with backoff, or cache the upstream |
| Connection pool exhausted | Too many concurrent requests for your pool size. Increase pool, add PgBouncer, or upgrade Postgres plan |
| CPU pegged at 100% | Need bigger instance or autoscaling. See scaling docs |
4. 502 Bad Gateway
The most “platform-feeling” error, and the one most-often misdiagnosed.
What you’ll see
curl -i https://myapp.onrender.com/HTTP/2 502< Bad GatewayThe runtime logs at the same time:
(possibly nothing, or:)[ERROR] Connection reset by peerWhat it means
Render’s edge proxy tried to forward your request to a service instance and couldn’t get a response. This is almost always one of:
| Cause | Tell |
|---|---|
Service isn’t bound to 0.0.0.0:$PORT (regression from step 05) | Persistent 502s on every request, even right after deploy |
| New custom domain not propagated yet | 502s on a new domain that resolve within an hour |
Node keepAliveTimeout too short | Intermittent 502s under low traffic, especially on Node |
| Worker killed mid-request (SIGTERM/OOM) | 502 on one request, others succeed |
| Service has zero healthy instances | Deploy is in progress, or you’re scaling from 0 |
The Node keep-alive 502
A subtle but common one worth its own callout. Render’s edge keeps HTTP/1.1 connections alive to your instance for reuse. If your Node server closes them faster than the edge does, the edge tries to reuse a closed connection - and the user sees a 502.
const server = app.listen(port, "0.0.0.0");// Render's edge waits up to 75s. Match or exceed that.server.keepAliveTimeout = 120 * 1000; // 120 secondsserver.headersTimeout = 120 * 1000;Without this, you’ll see intermittent 502s that don’t correlate with any error in your logs. The Node defaults are too aggressive for Render’s edge timing.
The gunicorn WORKER TIMEOUT 502
Python services using gunicorn have an analogous issue: workers are killed after --timeout seconds, and any in-flight request becomes a 502.
[CRITICAL] WORKER TIMEOUT (pid:42)[ERROR] Worker (pid:42) was sent SIGKILL![INFO] Booting worker with pid: 89Bump the timeout for long requests:
gunicorn myapp:app --bind 0.0.0.0:$PORT --workers 4 --timeout 120But also ask: should this request really take 120 seconds? If yes, move it to a background worker (synchronous web requests over 30 seconds are a UX problem regardless of platform).
5. 503 Service Unavailable
Less common, but a clear signal.
What you’ll see
HTTP/2 503< Service UnavailableWhat it means
Render’s edge has no healthy instances to route to. Either:
- A deploy is in progress and the old version is gone before the new version is live.
- All instances are over capacity and refusing new connections.
- Autoscaling can’t keep up with the spike.
The fix
For “during deploy” 503s, ensure zero-downtime deploys are working - the new instance must pass its health check before the old one is torn down. If you don’t have a healthCheckPath configured, you don’t get zero-downtime. Set one (see step 06).
For “overloaded” 503s, scale. The scaling docs cover autoscaling targets and manual instance counts.
Telling platform errors from application errors
The single most useful skill for runtime triage: distinguishing what’s your problem from what’s a platform issue.
| Signal | ”It’s your app" | "It’s the platform” |
|---|---|---|
| HTTP code | 400, 404, 500 | 502 (sometimes), 503 |
| Logs at the time | Stack trace in your code | No application output, just Render’s ==> lines |
| Affected services | Just yours | Multiple services across the workspace |
| Affected users | Specific cohorts or actions | Everyone |
| Status page | Green | Incidents listed |
When in doubt, check status.render.com before opening a support ticket. A real platform issue will be acknowledged there within minutes, often with running updates.
A worked example: intermittent 502s
A user reports: “Every now and then our API returns 502, but it’s not consistent. The deploys are green, the service is showing live, but maybe 1 in 50 requests fails.”
Walking the method:
- Reproduce: non-deterministic, but consistently 1-2%. Not “every request”, not “no requests”. That’s a clue.
- Locate the surface: runtime - service is live, errors are mid-request.
- Read the first error: logs show no stack trace at the time of the 502. Just
ECONNRESEToccasionally. - Hypothesis: Node’s keep-alive timeout is shorter than Render’s edge, so the edge sometimes tries to reuse a closed connection.
- Smallest fix: add
server.keepAliveTimeout = 120000andserver.headersTimeout = 120000to the listener, redeploy. - Result: 502s drop to 0.
The key clue was step 1: 1-in-50 is not “every request” (which would be a port binding issue) and not “no requests” (which would be normal). The intermittency pointed at a timing issue, and the absence of stack traces ruled out a regular bug. Pattern recognition saved a 30-minute investigation.
What you learned
- **400** is usually an app-level allowlist (Django `ALLOWED_HOSTS`); add the hostname
- **404** breaks down into: SPA without rewrite, wrong route, case-sensitive filename, ephemeral filesystem with no disk
- **500** is your application throwing - grab the stack trace from the runtime logs; common causes are DB SSL, uncaught exceptions, and pool exhaustion
- **502** is Render's edge unable to reach your process. For Node, fix `keepAliveTimeout` / `headersTimeout` to match the edge. For Python, check gunicorn worker timeouts
- **503** means no healthy instances - either mid-deploy without a health check, or overloaded
- Check the status page before assuming a platform bug. Multiple-service-wide failures correlate with platform; single-service correlate with code