Document Stack
Document Stack
Docs

Batch PDF Generation

Generate hundreds or thousands of PDFs efficiently using batch processing patterns.

Overview

Batch generation is essential when you need to produce many documents at once — monthly invoices, event badges, shipping labels, or certificates. This guide covers patterns for efficient, reliable batch processing.

Basic Pattern

The simplest approach: iterate over your data set and generate one PDF per item.

Sequential Batch (Node.js)
import { DocumentStack } from "@document-stack/sdk-node";

const client = new DocumentStack({ apiKey: process.env.DS_API_KEY! });

const customers = await db.query("SELECT * FROM customers WHERE invoice_due = true");

for (const customer of customers) {
    const pdf = await client.generate({
        templateId: "tmpl_invoice",
        data: {
            name: customer.name,
            amount: customer.balance,
            dueDate: customer.dueDate,
        },
    });
    console.log(`Generated invoice for ${customer.name}: ${pdf.url}`);
}

Rate Limits

Sequential processing is the safest but slowest approach. For larger batches, use concurrent processing with rate limit awareness. See Rate Limits.

Concurrent Processing

Process multiple PDFs in parallel using a concurrency limiter:

Concurrent Batch
async function batchGenerate(items: any[], concurrency = 5) {
    const results: Array<{ id: string; url: string; error?: string }> = [];

    for (let i = 0; i < items.length; i += concurrency) {
        const batch = items.slice(i, i + concurrency);

        const batchResults = await Promise.allSettled(
            batch.map(async (item) => {
                const pdf = await client.generate({
                    templateId: "tmpl_invoice",
                    data: item,
                });
                return { id: item.id, url: pdf.url };
            })
        );

        for (const result of batchResults) {
            if (result.status === "fulfilled") {
                results.push(result.value);
            } else {
                results.push({ id: "unknown", url: "", error: result.reason.message });
            }
        }

        // Brief pause between batches
        await new Promise((r) => setTimeout(r, 200));
    }

    return results;
}

Queue-Based Processing

For production workloads with thousands of documents, use a job queue:

BullMQ Example
import { Queue, Worker } from "bullmq";

const pdfQueue = new Queue("pdf-generation");

// Enqueue jobs
for (const customer of customers) {
    await pdfQueue.add("generate", {
        templateId: "tmpl_invoice",
        data: { name: customer.name, amount: customer.balance },
        customerId: customer.id,
    });
}

// Worker processes jobs
const worker = new Worker("pdf-generation", async (job) => {
    const pdf = await client.generate({
        templateId: job.data.templateId,
        data: job.data.data,
    });

    // Store the result
    await db.update("invoices")
        .set({ pdfUrl: pdf.url })
        .where({ customerId: job.data.customerId });

    return pdf.url;
}, { concurrency: 10 });

Error Handling

  • Retry failed items — Track failures and retry with exponential backoff
  • Dead letter queue — Move persistently failing items to a separate queue for manual review
  • Idempotency — Design so re-processing the same item is safe
  • Progress tracking — Log progress so you can resume if the process is interrupted

Python Example

Async Batch (Python)
import asyncio
from document_stack import AsyncDocumentStack

client = AsyncDocumentStack(api_key=os.environ["DS_API_KEY"])

async def batch_generate(items, concurrency=5):
    semaphore = asyncio.Semaphore(concurrency)
    results = []

    async def process(item):
        async with semaphore:
            pdf = await client.generate(
                template_id="tmpl_invoice",
                data=item,
            )
            return {"id": item["id"], "url": pdf.url}

    tasks = [process(item) for item in items]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    return results

Best Practices

  • Start with low concurrency (3–5) and increase based on your rate limits
  • Monitor X-RateLimit-Remaining headers and throttle dynamically
  • Use a job queue for batches over 100 items
  • Cache template data to avoid redundant lookups
  • Store PDF URLs in your database for retrieval later

Next Steps