We're removing seat fees and making pricing better for fast-growing teams

Learn more
Guest
May 01, 2026

Building Document Pipelines That Actually Scale

Clelia Astra Bertelli

This guest post is kindly contributed by LlamaIndex, who help teams automate document processing with agent-powered OCR.

Document processing at scale is hard. A single slow PDF can block your server and degrade unrelated requests. Parsing, classifying, and extracting structured data from documents needs to be reliable, retryable, and non-blocking.

This post walks through a reference architecture for a document processing pipeline that pairs LlamaParse for document intelligence with Render Workflows for scalable, distributed task execution.

A monolithic problem

The most basic approach to document processing usually looks like this:

  1. A client uploads one or more files to a server.
  2. That same server also handles processing the uploaded file(s).
  3. After processing completes, the server persists the results and returns them to the client.

This approach works for handling small workloads, but it hits a wall at any meaningful scale: a single massive file can block threads, trigger parsing failures, or time out requests. To make matters worse, just one failure can mean re-running an entire job from scratch.

To help our application scale with the work we give it, we can separate its two primary concerns into a proper pipeline:

  • Restrict our server's scope to receiving uploads and streaming progress to clients.
  • Spin up isolated, retryable workflow tasks to perform individual processing steps with LlamaParse and LlamaCloud.

Architecture overview

Our scalable pipeline consists of three services deployed on Render:

  • Web service: This is our server. It accepts file uploads or URL downloads, streams real-time progress via Server-Sent Events, and exposes search and RAG endpoints.
  • Workflow: This is our orchestration layer. It defines and executes five discrete tasks, each with a specific instance type, timeout, and retry policy.
  • Postgres database: This stores the results of our document processing.

Whenever a user uploads a document, our web service reads the bytes and dispatches them to the first workflow task run. From there, everything executes asynchronously:

Diagram showing the processing pipeline architecture
Diagram showing the processing pipeline architecture

Example code for this project is available on GitHub.

Dispatch with Render Workflows

Our workflow service defines five tasks that each handle a different step of our document processing pipeline: upload_to_llamacloud, classify_document, parse_document, extract_fields, and store_results.

In our workflow code, we define each task as a TypeScript function with the Render SDK and configure its resource plan, timeout, and retry policy:

In our web service, we again use the Render SDK to dispatch our workflow tasks and poll for results:

Each task run executes in its own instance, which means parsing a large PDF gets its own isolated environment. If a run fails, its retry policy automatically handles exponential backoff, with no manual logic required.

Document intelligence with LlamaCloud

Each workflow task delegates document intelligence to LlamaCloud, using shared client configuration for authentication:

Classification

The classify_document task sends the uploaded file to LlamaCloud Classify, comparing it against a set of document type rules:

Rules describe document types such as invoices, contracts, resumes, financial statements, and more. LlamaCloud returns the best match, a confidence score, and human-readable reasoning.

Parsing with LlamaParse

The parse_document task uses LlamaParse's agentic tier, which handles 130+ file formats and returns clean markdown and plain text:

The agentic tier handles complex layouts (tables and charts, multi-column text, images, etc.) and returns structured markdown ready for downstream processing.

Structured extraction with LlamaExtract

Once the document type is known, the extract_fields task runs LlamaExtract against a predefined JSON Schema, or generates one on the fly for unknown types:

For an invoice, this yields structured fields like invoice_number, vendor_name, line_items, and total_amount. For unknown document types, LlamaExtract generates an appropriate schema automatically using a prompt.

Finally, the store_results task writes results to Postgres and optionally indexes the parsed text into a LlamaCloud-managed pipeline:

Once indexed, documents are searchable with embeddings, hybrid retrieval, and reranking, all managed by LlamaCloud. The web service exposes /search and /ask endpoints backed by this pipeline.

Live progress streaming

While tasks run, the web service streams real-time status to the frontend via Server-Sent Events:

Users see each stage complete in real time (uploading, classifying, parsing, extracting, storing) without any polling from the client. Here's the full pipeline:

Timeline showing the full document processing pipeline
Timeline showing the full document processing pipeline

Deploy it yourself on Render

To run this architecture yourself, follow the deployment guide in the project README. At a high level, you'll:

  • Deploy the web service and Postgres database via the repository Blueprint
  • Create the workflow service manually in the Render Dashboard (npm install && npm run build, then node dist/tasks/index.js)
  • Set environment variables for the web and workflow services to connect to the Postgres database and LlamaCloud

Go to GitHub

Key takeaways

  • Render Workflows gives each stage of our processing pipeline its own compute plan, timeout, and retry policy: no need to manage queues, workers, or infrastructure.
  • The web service stays thin. All document intelligence calls run in isolated workflow tasks, keeping the HTTP layer free for uploads and SSE streaming.
  • LlamaParse handles the hard part of document parsing (tables, complex layouts, scanned PDFs) across 130+ formats.
  • LlamaCloud Classify and Extract layer document type detection and structured field extraction on top of raw parsing.
  • LlamaCloud-managed pipelines make parsed documents instantly searchable with embeddings and reranking.