This guest post is kindly contributed by LlamaIndex, who help teams automate document processing with agent-powered OCR.
Document processing at scale is hard. A single slow PDF can block your server and degrade unrelated requests. Parsing, classifying, and extracting structured data from documents needs to be reliable, retryable, and non-blocking.
This post walks through a reference architecture for a document processing pipeline that pairs LlamaParse for document intelligence with Render Workflows for scalable, distributed task execution.
A monolithic problem
The most basic approach to document processing usually looks like this:
- A client uploads one or more files to a server.
- That same server also handles processing the uploaded file(s).
- After processing completes, the server persists the results and returns them to the client.
This approach works for handling small workloads, but it hits a wall at any meaningful scale: a single massive file can block threads, trigger parsing failures, or time out requests. To make matters worse, just one failure can mean re-running an entire job from scratch.
To help our application scale with the work we give it, we can separate its two primary concerns into a proper pipeline:
- Restrict our server's scope to receiving uploads and streaming progress to clients.
- Spin up isolated, retryable workflow tasks to perform individual processing steps with LlamaParse and LlamaCloud.
Architecture overview
Our scalable pipeline consists of three services deployed on Render:
- Web service: This is our server. It accepts file uploads or URL downloads, streams real-time progress via Server-Sent Events, and exposes search and RAG endpoints.
- Workflow: This is our orchestration layer. It defines and executes five discrete tasks, each with a specific instance type, timeout, and retry policy.
- Postgres database: This stores the results of our document processing.
Whenever a user uploads a document, our web service reads the bytes and dispatches them to the first workflow task run. From there, everything executes asynchronously:

Example code for this project is available on GitHub.
Dispatch with Render Workflows
Our workflow service defines five tasks that each handle a different step of our document processing pipeline: upload_to_llamacloud, classify_document, parse_document, extract_fields, and store_results.
In our workflow code, we define each task as a TypeScript function with the Render SDK and configure its resource plan, timeout, and retry policy:
In our web service, we again use the Render SDK to dispatch our workflow tasks and poll for results:
Each task run executes in its own instance, which means parsing a large PDF gets its own isolated environment. If a run fails, its retry policy automatically handles exponential backoff, with no manual logic required.
Document intelligence with LlamaCloud
Each workflow task delegates document intelligence to LlamaCloud, using shared client configuration for authentication:
Classification
The classify_document task sends the uploaded file to LlamaCloud Classify, comparing it against a set of document type rules:
Rules describe document types such as invoices, contracts, resumes, financial statements, and more. LlamaCloud returns the best match, a confidence score, and human-readable reasoning.
Parsing with LlamaParse
The parse_document task uses LlamaParse's agentic tier, which handles 130+ file formats and returns clean markdown and plain text:
The agentic tier handles complex layouts (tables and charts, multi-column text, images, etc.) and returns structured markdown ready for downstream processing.
Structured extraction with LlamaExtract
Once the document type is known, the extract_fields task runs LlamaExtract against a predefined JSON Schema, or generates one on the fly for unknown types:
For an invoice, this yields structured fields like invoice_number, vendor_name, line_items, and total_amount. For unknown document types, LlamaExtract generates an appropriate schema automatically using a prompt.
Storage and semantic search
Finally, the store_results task writes results to Postgres and optionally indexes the parsed text into a LlamaCloud-managed pipeline:
Once indexed, documents are searchable with embeddings, hybrid retrieval, and reranking, all managed by LlamaCloud. The web service exposes /search and /ask endpoints backed by this pipeline.
Live progress streaming
While tasks run, the web service streams real-time status to the frontend via Server-Sent Events:
Users see each stage complete in real time (uploading, classifying, parsing, extracting, storing) without any polling from the client. Here's the full pipeline:

Deploy it yourself on Render
To run this architecture yourself, follow the deployment guide in the project README. At a high level, you'll:
- Deploy the web service and Postgres database via the repository Blueprint
- Create the workflow service manually in the Render Dashboard (
npm install && npm run build, thennode dist/tasks/index.js) - Set environment variables for the web and workflow services to connect to the Postgres database and LlamaCloud
Key takeaways
- Render Workflows gives each stage of our processing pipeline its own compute plan, timeout, and retry policy: no need to manage queues, workers, or infrastructure.
- The web service stays thin. All document intelligence calls run in isolated workflow tasks, keeping the HTTP layer free for uploads and SSE streaming.
- LlamaParse handles the hard part of document parsing (tables, complex layouts, scanned PDFs) across 130+ formats.
- LlamaCloud Classify and Extract layer document type detection and structured field extraction on top of raw parsing.
- LlamaCloud-managed pipelines make parsed documents instantly searchable with embeddings and reranking.