High availability and read replicas — Postgres on Render: a deep dive

Two related but distinct features: high availability keeps your primary running through plan changes and infrastructure events, and read replicas spread read traffic across multiple Postgres instances. Each has its own set of requirements and gotchas.

This step covers both, with extra attention on the readReplicas Blueprint behavior that’s burned more than one team.

High availability

HA on Render Postgres means Render runs a standby instance in a separate zone (same region, geographically separated by tens of kilometers) that asynchronously replicates the primary. If the primary becomes unavailable for 30 seconds, Render automatically fails over to the standby. The whole flip takes a few seconds, after which the standby becomes the new primary at the same URL.

Requirements

flowchart LR
  plan["Database instance type:<br/>Pro or Accelerated"]
  pg["PG version: 13+"]
  ha["HA available"]

  plan --> ha
  pg --> ha

Two requirements:

Requirement	Notes
Database instance type: Pro or Accelerated	Or, on legacy instance types: Pro or higher
PostgreSQL major version: 13 or newer	Underlying replication features

If you provisioned on a smaller plan and want HA, change the instance type first (a brief unavailability window - itself the kind of event HA helps with later).

What HA actually buys you

The trade is more nuanced than “no downtime ever”:

Instance type changes: with HA, your database is unavailable for only a few seconds. Without HA, expect a few minutes.
Automatic failover on primary issues: 30 seconds of primary downtime triggers failover. The flip itself takes a few seconds.
Same connection URL after failover: clients reconnect to the same hostname; the new primary is reachable there. Make sure your client code includes retry logic so it survives the brief gap.

What HA doesn’t prevent or guarantee:

Logical errors (DROP TABLE, bad migrations). The standby has the same data - replication doesn’t filter mistakes.
Connection limit exhaustion. The cap belongs to the primary; failing over doesn’t raise it.
Application bugs. HA is about availability of the database, not correctness of what’s in it.
Zero data loss on automatic failover. Replication is asynchronous - typically no more than a few seconds of pre-failover writes can be lost. Manual failovers from a healthy primary almost never lose anything.

The latency cost

HA adds ~1 ms of round-trip latency to every query. Render runs a small proxy in front of the database to detect connectivity issues and trigger failovers, and that proxy hop is on every connection. For most apps it’s invisible. For latency-critical workloads, measure before turning it on.

Toggling HA

databases:
  - name: app-db
    plan: pro-4gb
    postgresMajorVersion: "16"
    highAvailability: true

Set in the Blueprint or via the Render Dashboard. Toggling on or off is a mutable operation - you can change your mind later. The plan and major version are still immutable, so make sure you’ve got those right first.

Legacy → current-generation: a one-way door

flowchart LR
  legacy["Legacy plan"]
  current["Current-generation plan"]
  back["Legacy plan"]

  legacy -->|"upgrade<br/>(one-way)"| current
  current -.->|"not allowed"| back

This usually doesn’t bite anyone - current-generation plans are better in every measurable way - but it’s worth knowing before you click the button.

Read replicas

Read replicas are read-only Postgres instances that mirror the primary. Routing read-heavy queries to them spreads load and keeps the primary free for writes and latency-sensitive reads.

Requirements

Two thresholds before you can add a replica:

Requirement	Notes
Database storage: at least 10 GB	Smaller databases can’t have replicas. Resize storage up before adding the first replica
Instance type: Basic-1gb or higher	On legacy instance types: Standard or higher

Each replica has the same instance type and storage as the primary, and is billed accordingly. A primary on a pro-4gb plan with 100 GB of disk gets replicas on the same plan with the same disk - plan the cost up front.

When replicas help

Workload shape	Replicas help?
Heavily read-dominated app (e.g. blog, marketing site backend)	Yes - most reads can hit a replica
Analytics workload (long, expensive read queries)	Yes - keep them off the primary
Write-heavy with rare reads	Not really - replicas don’t speed up writes
Tight read-after-write consistency needs	Not really - replicas have replication lag

The lag is usually milliseconds, sometimes seconds under load - not zero. If your app reads a row immediately after writing it and expects to see the new version, route that read to the primary, not a replica.

Up to five replicas, declared in YAML

databases:
  - name: app-db
    plan: pro-4gb
    postgresMajorVersion: "16"
    readReplicas:
      - name: app-db-replica-1
      - name: app-db-replica-2

Up to 5 replicas per primary. Each replica gets its own host and connection string - wire them into services with fromDatabase referencing the replica’s name:

services:
  - type: web
    name: api
    runtime: node
    plan: starter
    envVars:
      - key: DATABASE_URL
        fromDatabase:
          name: app-db
          property: connectionString
      - key: DATABASE_REPLICA_URL
        fromDatabase:
          name: app-db-replica-1
          property: connectionString

In your app, route read-only queries to DATABASE_REPLICA_URL and writes (plus consistency-critical reads) to DATABASE_URL. Most ORMs support this with a readonly: flag or a separate connection.

The destructive empty list footgun

This is the single most expensive mistake people make with readReplicas. The list is authoritative - not additive.

databases:
  - name: app-db
    plan: pro-4gb
    readReplicas: []     # ← destroys ALL replicas on next sync

This bites teams in three common scenarios:

You inherit a Blueprint and remove the readReplicas block thinking you’re “leaving it unchanged.” Removing it changes the desired state to empty.
You add an envVarGroup block and accidentally indent it under readReplicas as an empty list.
You’re refactoring and rename a replica. The old name disappears, the new name appears - Render destroys the old replica and creates a new one. Brief but real disruption.

Treating `readReplicas` as desired state

The mental model that makes this safe: readReplicas is exactly the set of replicas you want to exist after sync. Always include every replica you want to keep, every time.

flowchart LR
  blueprint["readReplicas in YAML<br/>(desired state)"]
  reconcile["Render reconciles"]
  live["Live replicas<br/>(actual state)"]

  blueprint -->|"add to YAML → create"| live
  blueprint -.->|"remove from YAML → destroy"| live

A safer change pattern when you’re modifying replicas:

Read the live state first Either the Render Dashboard or render psql with \l-style queries to confirm what replicas exist.
Edit the Blueprint to match live state, then add your changes Don’t trust the Blueprint as-is - match it to reality first, then make your edit on top.
Run `render blueprints validate` Catches the obvious typos.
Diff the plan before applying The Render Dashboard shows you the proposed changes. Read every “destroy” line carefully.
Sync only after you've checked the diff If anything says “destroy replica”, make sure that’s actually what you want.

A linter rule worth adding to your CI: fail if readReplicas: [] appears in render.yaml. There’s no good reason to write an explicit empty list - if you don’t want replicas, just omit the field.

Routing reads in the app

Once you have replicas, the application has to know about them. Three common patterns:

Pattern A: Two connection strings, manual routing

import { Pool } from "pg";

const writePool = new Pool({ connectionString: process.env.DATABASE_URL });
const readPool = new Pool({ connectionString: process.env.DATABASE_REPLICA_URL });

async function getUser(id) {
  const { rows } = await readPool.query("SELECT * FROM users WHERE id = $1", [id]);
  return rows[0];
}

async function createUser(data) {
  const { rows } = await writePool.query(/* ... */);
  return rows[0];
}

The most explicit; works everywhere; you have to remember to use the right pool. Easy to get wrong.

Pattern B: ORM-managed routing

Most ORMs have built-in support - Django (primary / replica databases), Rails (reading / writing roles), Prisma (previewFeatures = ["readReplicas"]). They handle routing for typical cases (read-only queries automatically go to a replica) and let you opt out for consistency-critical reads.

Pattern C: Round-robin proxy

For multiple replicas, you can run a small proxy (HAProxy, PgBouncer with target_role=replica) that round-robins reads. Most teams don’t need this until they have several replicas.

A complete HA + replicas Blueprint

databases:
  - name: app-db
    plan: pro-4gb
    region: oregon
    postgresMajorVersion: "16"
    diskSizeGB: 100
    highAvailability: true
    readReplicas:
      - name: app-db-replica-1
      - name: app-db-replica-2

services:
  - type: web
    name: api
    runtime: node
    plan: standard
    buildCommand: npm ci && npm run build
    startCommand: npm start
    healthCheckPath: /health
    envVars:
      - key: DATABASE_URL
        fromDatabase:
          name: app-db
          property: connectionString
      - key: DATABASE_REPLICA_URL
        fromDatabase:
          name: app-db-replica-1
          property: connectionString

That’s a production-grade shape: HA primary, two replicas, a web service that knows how to route reads.

Your Blueprint has `readReplicas: [- name: app-db-replica-1, - name: app-db-replica-2]` and you have two replicas running. A teammate's PR refactors the YAML and accidentally leaves `readReplicas:` with no entries (an empty list). The PR passes `render blueprints validate`. What happens after merge?

Nothing - Render ignores empty lists and keeps the existing replicasRender warns you in the Blueprint sync UI but doesn't act unless you confirmBoth replicas are destroyed on the next sync - `readReplicas` is authoritative desired state, and an empty list means 'no replicas'The Blueprint fails to apply because it conflicts with the existing replicas

What you learned

HA requires a Pro or Accelerated database instance type and PostgreSQL 13+. Standby runs in a separate zone in the same region, takes over after 30 s of primary unavailability, costs ~1 ms of added latency
HA doesn't protect against logical errors, application bugs, or connection limit exhaustion - it's about *availability* of the database
Read replicas need at least 10 GB of storage and a Basic-1gb (or higher) instance type. Each replica matches the primary's instance type and storage, billed accordingly
Up to 5 read replicas per primary; each gets its own connection string. Route reads to replicas, writes (and consistency-critical reads) to the primary
`readReplicas` in a Blueprint is *authoritative* desired state. An empty list destroys all replicas. Treat it like a `terraform apply`, not a patch
Add a CI rule: fail any Blueprint that contains `readReplicas: []`. There's no good reason to write that - omit the field instead