Render Tutorials
ETL on Workflows, Part 1: Build a sharded pipeline

Deploy to Render

⏱ 15 min

In this step you’ll deploy the pipeline to Render and run it remotely from the same trigger script. The script you wrote in step 7 keeps working. You change two env vars and that’s it.

Push to GitHub

The 1K-row sample_data/ directory is small (~250 KB) and the deployed Workflow needs to read it, so commit it with the rest of the project:

Terminal
$git init && git add -A && git commit -m 'Sharded customer-merge ETL'
$gh repo create customer-merge --public --source=. --remote=origin --push
Created repository <your-user>/customer-merge

Create the Workflow service

  1. In the Render Dashboard, click New then Workflow.
  2. Connect your customer-merge repo.
  3. Set the Root Directory to workflows and the build/start commands per the table below.
  4. Set DATA_DIR=../sample_data as a service environment variable. That’s where the deployed code reads the four CSVs from, relative to the workflows root.
  5. Click Deploy Workflow and wait for the first build to finish.
  6. Copy the workflow slug from the service page. You’ll need it for the trigger script.
FieldValue
Namecustomer-merge-py
LanguagePython 3
Root Directoryworkflows
Build Commandpip install -r requirements.txt
Start Commandpython main.py
FieldValue
Namecustomer-merge-ts
LanguageNode
Root Directoryworkflows
Build Commandnpm install && npm run build
Start Commandnpm start

Trigger the deployed workflow

Flip RENDER_USE_LOCAL_DEV off, set RENDER_API_KEY and the workflow slug, and run the same trigger script:

Terminal
$export RENDER_API_KEY=<your-render-api-key>
$export WORKFLOW_SLUG=customer-merge-py/merge_customer_data
$unset RENDER_USE_LOCAL_DEV && python trigger.py
Generated 1000 profiles across 10 shards Avg health score: 52.7 Churn distribution: {'LOW': 412, 'MEDIUM': 487, 'HIGH': 101} Sample profile keys: [...] OK
Terminal
$export RENDER_API_KEY=<your-render-api-key>
$export WORKFLOW_SLUG=customer-merge-ts/merge_customer_data
$unset RENDER_USE_LOCAL_DEV && npx tsx trigger.ts
Generated 1000 profiles across 10 shards Avg health score: 52.7 Churn distribution: {"LOW":412,"MEDIUM":487,"HIGH":101} Sample profile keys: ... OK

Open the Runs tab on your Workflow service in the Render Dashboard. A healthy run shows one parent row for merge_customer_data and ten child rows for process_shard (one per shard), all green. Click into any subtask to see its stdout: the Shard X: Loading CSV files... and Shard X: Generated NNN enriched profiles lines from the task’s print() calls.

You want to pass a `pandas.DataFrame` of pre-filtered customers from your trigger script to `merge_customer_data`. The local run fails with a serialization error. What's the most direct fix?

What’s next

You have a working sharded ETL on Render. The next tutorial picks up exactly where this one leaves off and turns it into something you’d run in production.

Part 2: Productionize an ETL pipeline with Render Workflows adds retries with exponential backoff, idempotency keys so re-runs are safe, structured per-shard logs, a chaos drill that proves recovery, and a benchmarked scale-up to 1M+ rows.

What you learned

  • Workflow services are created in the Render Dashboard. Blueprints don't cover them yet
  • The trigger script you wrote in step 7 works locally and remotely. Only env vars change
  • The Runs tab shows one parent task plus one row per shard subtask. All green is the success signal
  • You now have a working pipeline. Part 2 makes it production-safe