Deploy to Render — ETL on Workflows, Part 1: Build a sharded pipeline

In this step you’ll deploy the pipeline to Render and run it remotely from the same trigger script. The script you wrote in step 7 keeps working. You change two env vars and that’s it.

Push to GitHub

The 1K-row sample_data/ directory is small (~250 KB) and the deployed Workflow needs to read it, so commit it with the rest of the project:

Terminal

$git init && git add -A && git commit -m 'Sharded customer-merge ETL'
$gh repo create customer-merge --public --source=. --remote=origin --push
Created repository <your-user>/customer-merge

Create the Workflow service

In the Render Dashboard, click New then Workflow.
Connect your customer-merge repo.
Set the Root Directory to workflows and the build/start commands per the table below.
Set DATA_DIR=../sample_data as a service environment variable. That’s where the deployed code reads the four CSVs from, relative to the workflows root.
Click Deploy Workflow and wait for the first build to finish.
Copy the workflow slug from the service page. You’ll need it for the trigger script.

Field	Value
Name	`customer-merge-py`
Language	Python 3
Root Directory	`workflows`
Build Command	`pip install -r requirements.txt`
Start Command	`python main.py`

Field	Value
Name	`customer-merge-ts`
Language	Node
Root Directory	`workflows`
Build Command	`npm install && npm run build`
Start Command	`npm start`

Trigger the deployed workflow

Flip RENDER_USE_LOCAL_DEV off, set RENDER_API_KEY and the workflow slug, and run the same trigger script:

Terminal

$export RENDER_API_KEY=<your-render-api-key>
$export WORKFLOW_SLUG=customer-merge-py/merge_customer_data
$unset RENDER_USE_LOCAL_DEV && python trigger.py
Generated 1000 profiles across 10 shards
Avg health score: 52.7
Churn distribution: {'LOW': 412, 'MEDIUM': 487, 'HIGH': 101}
Sample profile keys: [...]
OK

Terminal

$export RENDER_API_KEY=<your-render-api-key>
$export WORKFLOW_SLUG=customer-merge-ts/merge_customer_data
$unset RENDER_USE_LOCAL_DEV && npx tsx trigger.ts
Generated 1000 profiles across 10 shards
Avg health score: 52.7
Churn distribution: {"LOW":412,"MEDIUM":487,"HIGH":101}
Sample profile keys: ...
OK

Open the Runs tab on your Workflow service in the Render Dashboard. A healthy run shows one parent row for merge_customer_data and ten child rows for process_shard (one per shard), all green. Click into any subtask to see its stdout: the Shard X: Loading CSV files... and Shard X: Generated NNN enriched profiles lines from the task’s print() calls.

You want to pass a `pandas.DataFrame` of pre-filtered customers from your trigger script to `merge_customer_data`. The local run fails with a serialization error. What's the most direct fix?

Wrap the DataFrame in a custom class that adds a `__json__` methodConvert the DataFrame to a list of dicts (`df.to_dict('records')`) before passing it as an argumentSwitch to passing it via an environment variableRender Workflows doesn't support filtering inputs; do the filter inside the orchestrator

What’s next

You have a working sharded ETL on Render. The next tutorial picks up exactly where this one leaves off and turns it into something you’d run in production.

Part 2: Productionize an ETL pipeline with Render Workflows adds retries with exponential backoff, idempotency keys so re-runs are safe, structured per-shard logs, a chaos drill that proves recovery, and a benchmarked scale-up to 1M+ rows.

What you learned

Workflow services are created in the Render Dashboard. Blueprints don't cover them yet
The trigger script you wrote in step 7 works locally and remotely. Only env vars change
The Runs tab shows one parent task plus one row per shard subtask. All green is the success signal
You now have a working pipeline. Part 2 makes it production-safe