Scale up and benchmark — ETL on Workflows, Part 2: Productionize and scale it

In this step you’ll turn the dial on two knobs (shard count and instance plan) and write down the wall-clock impact yourself. By the end you’ll have a defensible answer to “how do I make this faster?” for your own ETL.

Regenerate the data at 1M rows

Terminal

$cd scripts
$python generate_data.py --rows 1000000
Wrote sample_data/crm.csv (1000000 rows)
Wrote sample_data/billing.csv (1000000 rows)
Wrote sample_data/product.csv (1000000 rows)
Wrote sample_data/support.csv (1000000 rows)

Your deployed Workflow service only sees files that are in the repo or mounted into the service. For this tutorial, commit the regenerated sample_data/ directory and push it before the benchmark. For a real ETL, put the dataset in object storage or a database and change load_csv(filename) to read from that durable source. Do not assume a local sample_data/ directory on your laptop exists on Render.

Baseline run

Run the default config first: 10 shards on the standard plan. Trigger the same script, record the wall-clock, and keep the run id so you can compare logs after the scale-up.

Terminal

$python trigger.py --report-elapsed
Run started: <run-id>
Merged 4000000 source rows into 1000000 enriched profiles in 32.6s

Bump shard count and instance plan

Before

- NUM_SHARDS = 10
 
  
- @app.task
 
  async def merge_customer_data() -> dict:
 
 
 
 
 
 
 
    ...

After

 
+ NUM_SHARDS = 25
  
 
+ @app.task(plan="pro")
  async def merge_customer_data() -> dict:
+   ...
+ 
+ @app.task(
+   retry=Retry(max_retries=3, wait_duration_ms=2000),
+   plan="pro",
+ )
+ def process_shard(shard_id: int) -> dict:
    ...

Before

- const NUM_SHARDS = 10;
 
  
  const mergeCustomerData = task(
- { name: "merge_customer_data" },
 
  async function mergeCustomerData() {
    // ...
  }
 
 
 
 
 
 
 
 
 
  );

After

 
+ const NUM_SHARDS = 25;
  
  const mergeCustomerData = task(
 
+ { name: "merge_customer_data", plan: "pro" },
  async function mergeCustomerData() {
    // ...
  }
+ );
+ 
+ const processShard = task(
+ {
+   name: "process_shard",
+   retry: { maxRetries: 3, waitDurationMs: 2000 },
+   plan: "pro",
+ },
+ function processShard(shardId: number) { /* ... */ }
  );

Terminal

$git add -A && git commit -m 'scale up' && git push
$# wait for the deploy, then:
$python trigger.py --report-elapsed
Run started: <run-id>
Merged 4000000 source rows into 1000000 enriched profiles in 11.4s

Fill in your own numbers

Record both runs in the table below so you have something to point at the next time someone asks “is this worth scaling?”

Config	Records	Wall-clock	Cost/run (rough)
10 shards, `standard` (baseline)	1M	your time	your number
25 shards, `pro` (scaled)	1M	your time	your number
Speedup	n/a	ratio	n/a

Your 10-shard baseline takes 30s. You bump to 50 shards on the same `standard` plan and the run only drops to 22s. What's the most likely bottleneck?

Render Workflows caps fan-out at 20 subtasks, so 30 of the shards never ranCSV loading happens once in the orchestrator before fan-out, so reading 4 files is now the dominant cost. Shard count can't help thatThe `standard` plan caches results, so subsequent shards return instantlyAdding shards always slows things down because hash collisions increase

What’s next

If you want to go back to the SDK basics, use the Render Workflows quickstart. Keep Workflows limits nearby as you push shard count, payload size, and run duration further.

What you learned

Shard count scales parallelism; instance plan scales per-shard throughput
Past a point you stop being shard-bound. Profile to find the next bottleneck
Always pair a scale-up with the chaos drill from step 6. Bigger fan-out means more chances to hit a flake
Your own before/after numbers are the most credible benchmark you'll have