Join us for localhost:2026 in SF on June 18

Guest

May 01, 2026

Guest

Building Document Pipelines That Actually Scale

Clelia Astra Bertelli

May 01, 2026

Clelia Astra Bertelli

This guest post is kindly contributed by LlamaIndex, who help teams automate document processing with agent-powered OCR.

Document processing at scale is hard. A single slow PDF can block your server and degrade unrelated requests. Parsing, classifying, and extracting structured data from documents needs to be reliable, retryable, and non-blocking.

This post walks through a reference architecture for a document processing pipeline that pairs LlamaParse for document intelligence with Render Workflows for scalable, distributed task execution.

The most basic approach to document processing usually looks like this:

A client uploads one or more files to a server.
That same server also handles processing the uploaded file(s).
After processing completes, the server persists the results and returns them to the client.

This approach works for handling small workloads, but it hits a wall at any meaningful scale: a single massive file can block threads, trigger parsing failures, or time out requests. To make matters worse, just one failure can mean re-running an entire job from scratch.

To help our application scale with the work we give it, we can separate its two primary concerns into a proper pipeline:

Restrict our server's scope to receiving uploads and streaming progress to clients.
Spin up isolated, retryable workflow tasks to perform individual processing steps with LlamaParse and LlamaCloud.

Our scalable pipeline consists of three services deployed on Render:

Web service: This is our server. It accepts file uploads or URL downloads, streams real-time progress via Server-Sent Events, and exposes search and RAG endpoints.
Workflow: This is our orchestration layer. It defines and executes five discrete tasks, each with a specific instance type, timeout, and retry policy.
Postgres database: This stores the results of our document processing.

Whenever a user uploads a document, our web service reads the bytes and dispatches them to the first workflow task run. From there, everything executes asynchronously:

Diagram showing the processing pipeline architecture

Example code for this project is available on GitHub.

Our workflow service defines five tasks that each handle a different step of our document processing pipeline: upload_to_llamacloud, classify_document, parse_document, extract_fields, and store_results.

In our workflow code, we define each task as a TypeScript function with the Render SDK and configure its resource plan, timeout, and retry policy:

export const parseDocument = task(
  {
    name: "parse_document",
    plan: "standard",
    timeoutSeconds: 600,
    retry: { maxRetries: 2, waitDurationMs: 5000, backoffScaling: 2 },
  },
  async function parseDocument(fileId: string) { ... }
);

In our web service, we again use the Render SDK to dispatch our workflow tasks and poll for results:

async function startAndWait(taskPath: string, params: unknown[]) {
  const started = await render.workflows.startTask(
    `${WORKFLOW_SLUG}/${taskPath}`, params
  );
  while (true) {
    await new Promise((r) => setTimeout(r, POLL_INTERVAL_MS));
    const details = await render.workflows.getTaskRun(started.taskRunId);
    if (details.status === "completed") return details.results?.[0] ?? null;
    if (details.status === "failed" || details.status === "canceled") {
      throw new Error(`Task ${taskPath} ${details.status}: ${details.error}`);
    }
  }
}

Each task run executes in its own instance, which means parsing a large PDF gets its own isolated environment. If a run fails, its retry policy automatically handles exponential backoff, with no manual logic required.

Each workflow task delegates document intelligence to LlamaCloud, using shared client configuration for authentication:

import LlamaCloud from "@llamaindex/llama-cloud";

export function getLlamaClient(): LlamaCloud {
  if (!_client) {
    _client = new LlamaCloud({ apiKey: process.env.LLAMA_CLOUD_API_KEY });
  }
  return _client;
}

The classify_document task sends the uploaded file to LlamaCloud Classify, comparing it against a set of document type rules:

const job = await client.classify.create({
  file_input: fileId,
  configuration: { rules: CLASSIFICATION_RULES },
});

Rules describe document types such as invoices, contracts, resumes, financial statements, and more. LlamaCloud returns the best match, a confidence score, and human-readable reasoning.

The parse_document task uses LlamaParse's agentic tier, which handles 130+ file formats and returns clean markdown and plain text:

const result = await client.parsing.parse({
  file_id: fileId,
  tier: "agentic",
  version: "latest",
  expand: ["markdown_full", "text_full"],
});

The agentic tier handles complex layouts (tables and charts, multi-column text, images, etc.) and returns structured markdown ready for downstream processing.

Once the document type is known, the extract_fields task runs LlamaExtract against a predefined JSON Schema, or generates one on the fly for unknown types:

const result = await client.extract.run({
  file_input: fileId,
  configuration: {
    data_schema: schema,
    cite_sources: true,
    confidence_scores: true,
  },
});

For an invoice, this yields structured fields like invoice_number, vendor_name, line_items, and total_amount. For unknown document types, LlamaExtract generates an appropriate schema automatically using a prompt.

Finally, the store_results task writes results to Postgres and optionally indexes the parsed text into a LlamaCloud-managed pipeline:

await client.pipelines.documents.upsert(PIPELINE_ID, {
  body: [{
    id: `doc-${documentId}`,
    text: textToIndex,
    metadata: { document_id: documentId, filename, doc_type: classification.docType },
  }],
});

Once indexed, documents are searchable with embeddings, hybrid retrieval, and reranking, all managed by LlamaCloud. The web service exposes /search and /ask endpoints backed by this pipeline.

While tasks run, the web service streams real-time status to the frontend via Server-Sent Events:

yield sse("status", { phase: "parsing", tools: ["LlamaParse", "Render Workflows"] });
// ... task runs in the workflow service ...
yield sse("parsed", { pageCount: parsed.pageCount, markdownPreview: "..." });

Users see each stage complete in real time (uploading, classifying, parsing, extracting, storing) without any polling from the client. Here's the full pipeline:

Timeline showing the full document processing pipeline

To run this architecture yourself, follow the deployment guide in the project README. At a high level, you'll:

Deploy the web service and Postgres database via the repository Blueprint
Create the workflow service manually in the Render Dashboard (npm install && npm run build, then node dist/tasks/index.js)
Set environment variables for the web and workflow services to connect to the Postgres database and LlamaCloud

Go to GitHub

Render Workflows gives each stage of our processing pipeline its own compute plan, timeout, and retry policy: no need to manage queues, workers, or infrastructure.
The web service stays thin. All document intelligence calls run in isolated workflow tasks, keeping the HTTP layer free for uploads and SSE streaming.
LlamaParse handles the hard part of document parsing (tables, complex layouts, scanned PDFs) across 130+ formats.
LlamaCloud Classify and Extract layer document type detection and structured field extraction on top of raw parsing.
LlamaCloud-managed pipelines make parsed documents instantly searchable with embeddings and reranking.

Building Document Pipelines That Actually Scale

A monolithic problem

Architecture overview

Dispatch with Render Workflows

Document intelligence with LlamaCloud

Classification

Parsing with LlamaParse

Structured extraction with LlamaExtract

Storage and semantic search

Live progress streaming

Deploy it yourself on Render

Key takeaways