Render Tutorials
ETL on Workflows, Part 2: Productionize and scale it

Tour the repo and run one shard locally

⏱ 8 min

In this step you’ll clone render-examples/data-processor-workflow, install the workflow service for your chosen language, and run it against a tiny sample dataset so the rest of the tutorial has a working baseline.

What’s in the repo

The python/workflows/ and typescript/workflows/ folders contain the task code you’ll edit. The scripts/ folder generates sample CSVs, and sample_data/ is what the workflow reads. You can ignore frontend/, */api/, and render.yaml for now. That Blueprint covers the demo web app and API, not the Workflow service you’ll create manually in step 4.

Clone and install

Terminal
$git clone https://github.com/render-examples/data-processor-workflow.git
Cloning into 'data-processor-workflow'...
$cd data-processor-workflow/python/workflows
$python -m venv .venv && source .venv/bin/activate
$pip install -r requirements.txt
Successfully installed render_sdk ...
Terminal
$git clone https://github.com/render-examples/data-processor-workflow.git
Cloning into 'data-processor-workflow'...
$cd data-processor-workflow/typescript/workflows
$npm install
added N packages in Ns

Generate a tiny sample dataset

Start with 1K rows so installs, local runs, and retries stay fast. You’ll regenerate the same four CSVs at 1M rows in step 7.

Terminal
$cd ../../scripts
$python generate_data.py --rows 1000
Wrote sample_data/crm.csv (1000 rows) Wrote sample_data/billing.csv (1000 rows) Wrote sample_data/product.csv (1000 rows) Wrote sample_data/support.csv (1000 rows)

Start the local task server

Terminal
$cd ../python/workflows
$render workflows dev -- python main.py
Local workflow server listening on :8120
Terminal
$cd ../typescript/workflows
$render workflows dev -- npx tsx src/main.ts
Local workflow server listening on :8120

In a second terminal, list the registered tasks:

Terminal
$render workflows tasks list --local
merge_customer_data process_shard ...

If the list is empty, check that the dev server is still running, the start command points at the workflow entry file, and app.start() is guarded by the file’s __main__ block in Python.

What you learned

  • The reference repo ships the sharded ETL workflow in both Python and TypeScript
  • Only the workflows/ folder matters for this tutorial. Frontend and API are out of scope
  • 1K-row sample data keeps the feedback loop fast while you wire things up
  • `render workflows dev` runs your tasks locally and registers them with the local server