Render Tutorials
← All tutorials
intermediate ⏱ 60 min 7 steps

ETL on Workflows, Part 2: Productionize and scale it

Take the sharded customer-data pipeline from Part 1 and make it production-ready with retries, idempotency, structured logs, a chaos drill, and a benchmarked scale-up.

Prerequisites

  • Part 1 of the series (ETL on Workflows, Part 1: Build a sharded pipeline), or a working sharded Workflow you've built yourself
  • Render account with Workflows enabled
  • Render CLI 2.11.0+
  • Python 3.11+ or Node 20+
  • Comfortable cloning a public repo and using the Render Dashboard

Steps

  1. 01 What you'll build Part 2 of ETL on Workflows. Take the sharded pipeline from Part 1 and make it production-ready. 5 min
  2. 02 Tour the repo and run one shard locally Clone the reference repo, install just the workflow you need, generate a small dataset, and verify the tasks register on the local dev server. 8 min
  3. 03 Understand the fan-out pattern Read the existing task code, see how 10 subtasks run in parallel, and trigger your first end-to-end run with a small SDK client script. 10 min
  4. 04 Deploy the workflow to Render Push your fork, create the Workflow service in the Render Dashboard, and trigger the deployed task from the same SDK client you used locally. 8 min
  5. 05 Harden the tasks (retries, idempotency, structured logs) Add retry policies, make each shard idempotent, and emit structured per-shard timing logs so the chaos drill in step 6 has something to observe. 10 min
  6. 06 Chaos drill (break a shard, prove recovery) Inject a controlled failure into one shard, watch the retry timeline in the Render Dashboard, and verify the final output has no duplicates and no missing customers. 10 min
  7. 07 Scale up and benchmark Regenerate the dataset at 1M rows, run the workflow at default size, then bump shards and instance plan and record your own before/after numbers. 9 min