Render Tutorials
← All tutorials
intermediate ⏱ 90 min 8 steps

ETL on Workflows, Part 1: Build a sharded pipeline

Design and build a sharded customer-data pipeline from scratch with hash routing, fan-out, and aggregation. Part 2 productionizes and scales the same pipeline.

Prerequisites

  • Completed the Render Workflows quickstart, or equivalent SDK familiarity
  • Render CLI 2.11.0+
  • Python 3.11+ or Node 20+
  • Comfortable in a terminal and with a Python or Node project
  • A Render account with Workflows enabled (for the deploy step at the end)

Steps

  1. 01 What you'll build Part 1 of ETL on Workflows. Design the sharded customer-data merge pipeline you'll later harden and scale. 5 min
  2. 02 Scaffold with `render workflows init` and strip the examples Get a clean project that registers zero tasks, ready to receive your own. 10 min
  3. 03 Drop in sample data Pull the data generator from the reference repo, generate 1K-row CSVs for the four sources, and look at what the merge will have to handle. 10 min
  4. 04 Write the sharding helper A small module with one function that hashes a customer_id into a stable shard index. Deterministic, source-agnostic, no external deps. 8 min
  5. 05 Write the shard worker (`process_shard`) process_shard takes a shard_id, loads all four CSVs, filters to its shard's customers, merges across sources, enriches each profile, and returns them. 15 min
  6. 06 Write the orchestrator (`merge_customer_data`) merge_customer_data spawns N process_shard subtasks in parallel, awaits all of them, and aggregates the returns into a stats summary. 12 min
  7. 07 Trigger and verify locally Write a small SDK client script, run the pipeline end-to-end against 1K rows, and confirm the aggregated output. 8 min
  8. 08 Deploy to Render Push your code to GitHub, create the Workflow service in the Render Dashboard, and run the same trigger script against the deployed slug. 15 min