intermediate ⏱ 60 min 7 steps

ETL on Workflows, Part 2: Productionize and scale it

Take the sharded customer-data pipeline from Part 1 and make it production-ready with retries, idempotency, structured logs, a chaos drill, and a benchmarked scale-up.

Start tutorial

#workflows #python #typescript #etl #etl-series

Prerequisites

Part 1 of the series (ETL on Workflows, Part 1: Build a sharded pipeline), or a working sharded Workflow you've built yourself
Render account with Workflows enabled
Render CLI 2.11.0+
Python 3.11+ or Node 20+
Comfortable cloning a public repo and using the Render Dashboard

Steps

01 What you'll build Part 2 of ETL on Workflows. Take the sharded pipeline from Part 1 and make it production-ready. 5 min
02 Tour the repo and run one shard locally Clone the reference repo, install just the workflow you need, generate a small dataset, and verify the tasks register on the local dev server. 8 min
03 Understand the fan-out pattern Read the existing task code, see how 10 subtasks run in parallel, and trigger your first end-to-end run with a small SDK client script. 10 min
04 Deploy the workflow to Render Push your fork, create the Workflow service in the Render Dashboard, and trigger the deployed task from the same SDK client you used locally. 8 min
05 Harden the tasks (retries, idempotency, structured logs) Add retry policies, make each shard idempotent, and emit structured per-shard timing logs so the chaos drill in step 6 has something to observe. 10 min
06 Chaos drill (break a shard, prove recovery) Inject a controlled failure into one shard, watch the retry timeline in the Render Dashboard, and verify the final output has no duplicates and no missing customers. 10 min
07 Scale up and benchmark Regenerate the dataset at 1M rows, run the workflow at default size, then bump shards and instance plan and record your own before/after numbers. 9 min