In this step you’ll write the trigger script you’ll use for the rest of the series, run the pipeline against your 1K-row sample, and confirm merge_customer_data returns the expected stats summary.
Write the trigger script
Put this file in the repo root (one level up from workflows/):
import osfrom render_sdk import Render
render = Render()
slug = os.getenv("WORKFLOW_SLUG", "local/merge_customer_data")result = render.workflows.run_task(slug, [])data = result.results
print(f"Generated {data['profiles_generated']} profiles across {data['shards_processed']} shards")print(f"Avg health score: {data['statistics']['avg_health_score']}")print(f"Churn distribution: {data['statistics']['churn_distribution']}")print(f"Sample profile keys: {sorted((data['sample_profile'] or {}).keys())}")
assert data["profiles_generated"] == 1000, f"expected 1000 profiles, got {data['profiles_generated']}"print("OK")import { Render } from "@renderinc/sdk";
const render = new Render();
const slug = process.env.WORKFLOW_SLUG ?? "local/merge_customer_data";const started = await render.workflows.startTask(slug, []);const finished = await started.get();const data = finished.results as { profiles_generated: number; shards_processed: number; sample_profile: Record<string, unknown> | null; statistics: { avg_health_score: number; churn_distribution: Record<string, number> };};
console.log(`Generated ${data.profiles_generated} profiles across ${data.shards_processed} shards`);console.log(`Avg health score: ${data.statistics.avg_health_score}`);console.log(`Churn distribution: ${JSON.stringify(data.statistics.churn_distribution)}`);console.log(`Sample profile keys: ${Object.keys(data.sample_profile ?? {}).sort().join(", ")}`);
if (data.profiles_generated !== 1000) { throw new Error(`expected 1000 profiles, got ${data.profiles_generated}`);}console.log("OK");The script defaults to the local-dev slug (local/merge_customer_data). On Render it switches to the deployed slug via WORKFLOW_SLUG. The same file works locally and remotely. You’ll change two env vars in step 8 and run it again.
Run it end to end
Make sure the dev server from step 6 is still running. In a second terminal:
$RENDER_USE_LOCAL_DEV=true python trigger.pyGenerated 1000 profiles across 10 shards Avg health score: 52.7 Churn distribution: {'LOW': 412, 'MEDIUM': 487, 'HIGH': 101} Sample profile keys: ['account_status', 'avg_resolution_hrs', 'churn_risk', 'company_name', 'csat_score', 'customer_id', 'deal_stage', 'deal_value', 'email', 'employee_count', 'expansion_potential', 'features_used', 'health_score', 'industry', 'last_active', 'last_contact', 'last_payment', 'last_ticket_date', 'mrr', 'nps_score', 'open_tickets', 'payment_status', 'plan', 'sales_owner', 'signup_date', 'subscription_start', 'total_sessions', 'total_tickets', 'usage_pct'] OK
$RENDER_USE_LOCAL_DEV=true npx tsx trigger.tsGenerated 1000 profiles across 10 shards Avg health score: 52.7 Churn distribution: {"LOW":412,"MEDIUM":487,"HIGH":101} Sample profile keys: account_status, avg_resolution_hrs, churn_risk, ... OK
Three signals confirm the pipeline works:
profiles_generated == 1000matches the input row count. No customers were dropped, none doubled.shards_processed == 10means all ten subtasks completed.- The sample profile has fields from all four sources (
industryfrom CRM,mrrfrom Billing,total_sessionsfrom Product,nps_scorefrom Support, plus the three enrichment fields). The merge worked.
The aggregated output deliberately does not return the full profile list. That would risk the 4 MB return-payload limit on the orchestrator and isn’t useful for verification at this size. In a real pipeline the orchestrator would write profiles to S3, Postgres, or another sink before returning the stats summary.
Show hint
Most first-run failures fall into three buckets:
DATA_DIRpoints at the wrong path. The default is../sample_datarelative toworkflows/. If you rangenerate_data.pyfrom a different location, setDATA_DIRexplicitly or move the CSVs.RENDER_USE_LOCAL_DEVis not set. Without it, the SDK tries to call Render’s API and fails with an auth error.- The dev server isn’t running, or it crashed on a missing import. Restart it in the foreground (no
&) so you can see the traceback.
What you learned
- The SDK client is one file. The same script works locally and against the deployed Workflow
- `RENDER_USE_LOCAL_DEV=true` targets the local dev server. Without it, the SDK talks to Render
- Three signals prove correctness: total profile count, shard count, sample-profile field shape
- The aggregated output is intentionally a stats summary, not the raw profiles, to stay inside the 4 MB return limit