LLM Integration v4.8.0¶

Kreuzberg integrates with 146 LLM providers (including local inference engines) via liter-llm for three capabilities: VLM OCR, structured extraction, and provider-hosted embeddings.

Feature gate

Requires the llm Cargo feature. Not included in the default feature set.

VLM OCR¶

Use vision-language models as an OCR backend. The document page is rendered as an image and sent to the VLM, which returns the extracted text.

When to Use¶

Low-quality scanned documents where traditional OCR struggles
Handwritten text recognition
Arabic, Farsi, and other scripts with poor Tesseract/PaddleOCR support
Complex layouts where traditional OCR fails (mixed tables, forms, diagrams)
When you need higher accuracy and can accept higher latency and API costs

Configuration¶

PythonTypeScriptRustCLITOMLEnvironment Variables

Python

import asyncio
from kreuzberg import extract_file, ExtractionConfig, OcrConfig, LlmConfig

async def main() -> None:
    config = ExtractionConfig(
        force_ocr=True,
        ocr=OcrConfig(
            backend="vlm",
            vlm_config=LlmConfig(model="openai/gpt-4o-mini"),
        ),
    )
    result = await extract_file("scan.pdf", config=config)
    print(result.content)

asyncio.run(main())

TypeScript

import { extractFileSync } from '@kreuzberg/node';

const config = {
    forceOcr: true,
    ocr: {
        backend: 'vlm',
        vlmConfig: {
            model: 'openai/gpt-4o-mini',
        },
    },
};

const result = extractFileSync('scan.pdf', null, config);
console.log(result.content);

Rust

use kreuzberg::{extract_file, ExtractionConfig, OcrConfig, LlmConfig};

let config = ExtractionConfig {
    force_ocr: true,
    ocr: Some(OcrConfig {
        backend: "vlm".to_string(),
        vlm_config: Some(LlmConfig {
            model: "openai/gpt-4o-mini".to_string(),
            ..Default::default()
        }),
        ..Default::default()
    }),
    ..Default::default()
};
let result = extract_file("scan.pdf", None, &config).await?;

Terminal

kreuzberg extract scan.pdf --force-ocr true \
  --vlm-model openai/gpt-4o-mini

kreuzberg.toml

force_ocr = true

[ocr]
backend = "vlm"

[ocr.vlm_config]
model = "openai/gpt-4o-mini"

Terminal

export KREUZBERG_VLM_OCR_MODEL=openai/gpt-4o-mini
export OPENAI_API_KEY=sk-...

Custom VLM Prompt¶

Override the default prompt template for VLM OCR:

Python

from kreuzberg import ExtractionConfig, OcrConfig, LlmConfig

config = ExtractionConfig(
    force_ocr=True,
    ocr=OcrConfig(
        backend="vlm",
        vlm_config=LlmConfig(model="openai/gpt-4o-mini"),
        vlm_prompt="Extract all text from this document image. Preserve formatting.",
    ),
)

Supported Providers¶

Any liter-llm vision-capable provider works as a VLM OCR backend:

Provider	Example Model
OpenAI	`openai/gpt-4o`, `openai/gpt-4o-mini`
Anthropic	`anthropic/claude-sonnet-4-20250514`
Google	`google/gemini-2.0-flash`
Groq	`groq/llama-3.2-90b-vision-preview`
Ollama (local)	`ollama/llama3.2-vision`
LM Studio (local)	`lmstudio/llava-1.5`
vLLM (local)	`vllm/llava-next`

Structured Extraction¶

Extract structured JSON data from documents by providing a JSON schema. The document is first extracted as text, then sent to an LLM with the schema to produce conforming output.

Basic Usage¶

PythonTypeScriptRustCLITOML

Python

import asyncio
from kreuzberg import extract_file, ExtractionConfig, StructuredExtractionConfig, LlmConfig

async def main() -> None:
    config = ExtractionConfig(
        structured_extraction=StructuredExtractionConfig(
            schema={
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "authors": {"type": "array", "items": {"type": "string"}},
                    "date": {"type": "string"},
                },
                "required": ["title", "authors", "date"],
                "additionalProperties": False,
            },
            llm=LlmConfig(model="openai/gpt-4o-mini"),
            strict=True,
        ),
    )
    result = await extract_file("paper.pdf", config=config)
    print(result.structured_output)
    # {"title": "...", "authors": ["..."], "date": "..."}

asyncio.run(main())

TypeScript

import { extractFileSync } from '@kreuzberg/node';

const config = {
    structuredExtraction: {
        schema: {
            type: 'object',
            properties: {
                title: { type: 'string' },
                authors: { type: 'array', items: { type: 'string' } },
                date: { type: 'string' },
            },
            required: ['title', 'authors', 'date'],
            additionalProperties: false,
        },
        llm: {
            model: 'openai/gpt-4o-mini',
        },
        strict: true,
    },
};

const result = extractFileSync('paper.pdf', null, config);
console.log(result.structuredOutput);

Rust

use kreuzberg::{
    extract_file, ExtractionConfig, LlmConfig, StructuredExtractionConfig,
};
use serde_json::json;

#[tokio::main]
async fn main() -> kreuzberg::Result<()> {
    let config = ExtractionConfig {
        structured_extraction: Some(StructuredExtractionConfig {
            schema: json!({
                "type": "object",
                "properties": {
                    "title": { "type": "string" },
                    "authors": { "type": "array", "items": { "type": "string" } },
                    "date": { "type": "string" }
                },
                "required": ["title", "authors", "date"],
                "additionalProperties": false
            }),
            llm: LlmConfig {
                model: "openai/gpt-4o-mini".to_string(),
                ..Default::default()
            },
            strict: true,
            ..Default::default()
        }),
        ..Default::default()
    };

    let result = extract_file("paper.pdf", None, &config).await?;
    if let Some(structured) = &result.structured_output {
        println!("{}", structured);
    }
    Ok(())
}

Terminal

kreuzberg extract-structured paper.pdf \
  --schema schema.json \
  --model openai/gpt-4o-mini \
  --strict

kreuzberg.toml

[structured_extraction]
schema_name = "paper_metadata"
strict = true

[structured_extraction.schema]
type = "object"

[structured_extraction.schema.properties.title]
type = "string"

[structured_extraction.schema.properties.date]
type = "string"

[structured_extraction.llm]
model = "openai/gpt-4o-mini"

Custom Prompts (Jinja2)¶

Override the default extraction prompt with a Jinja2 template:

Python

from kreuzberg import ExtractionConfig, StructuredExtractionConfig, LlmConfig

config = ExtractionConfig(
    structured_extraction=StructuredExtractionConfig(
        schema={"type": "object", "properties": {"title": {"type": "string"}}},
        llm=LlmConfig(model="openai/gpt-4o-mini"),
        prompt=(
            "Analyze this document and extract key metadata.\n\n"
            "Document:\n{{ content }}\n\n"
            "Schema: {{ schema }}"
        ),
    ),
)

Available template variables:

Variable	Description
`{{ content }}`	The extracted document text
`{{ schema }}`	The JSON schema as a formatted string
`{{ schema_name }}`	The schema name (default: `"extraction"`)
`{{ schema_description }}`	The schema description (may be empty)

Cross-Provider Compatibility¶

Structured extraction handles provider differences automatically:

OpenAI: Full strict mode with additionalProperties enforcement
Anthropic/Gemini: additionalProperties automatically stripped (not supported by these providers)
All providers: Markdown code fence wrapping in responses is automatically handled

Strict Mode¶

When strict=True, the LLM is instructed to produce output that exactly matches the schema. This enables OpenAI's structured output mode and adds validation on the response.

VLM Embeddings¶

Use provider-hosted embedding models instead of local ONNX models. Useful when you want to match the embedding model used by your vector database or when local ONNX models are not available.

Configuration¶

PythonTypeScriptRustCLI

Python

import asyncio
from kreuzberg import embed, EmbeddingConfig, EmbeddingModelType, LlmConfig

async def main() -> None:
    config = EmbeddingConfig(
        model=EmbeddingModelType.llm(
            LlmConfig(model="openai/text-embedding-3-small")
        ),
        normalize=True,
    )
    embeddings = await embed(["Hello world"], config=config)
    print(len(embeddings[0]))  # 1536

asyncio.run(main())

TypeScript

import { embedSync } from '@kreuzberg/node';

const embeddings = embedSync(['Hello world'], {
  model: {
    modelType: 'llm',
    value: 'openai/text-embedding-3-small',
  },
  normalize: true,
});
console.log(embeddings[0].length); // 1536

Rust

use kreuzberg::{embed_texts, EmbeddingConfig, EmbeddingModelType, LlmConfig};

let config = EmbeddingConfig {
    model: EmbeddingModelType::Llm {
        llm: LlmConfig {
            model: "openai/text-embedding-3-small".to_string(),
            ..Default::default()
        },
    },
    normalize: true,
    ..Default::default()
};
let embeddings = embed_texts(&["Hello world"], &config)?;

Terminal

kreuzberg embed \
  --provider llm \
  --model openai/text-embedding-3-small \
  --text "Hello world"

Available Models¶

Model	Dimensions	Provider
`openai/text-embedding-3-small`	1536	OpenAI
`openai/text-embedding-3-large`	3072	OpenAI
`mistral/mistral-embed`	1024	Mistral
Any liter-llm embedding-capable provider	Varies	Various

Local LLM Support¶

v4.8.0

Kreuzberg supports local LLM inference engines via liter-llm's built-in provider routing. No API key required — just point to your local server.

Supported Local Engines¶

Engine	Prefix	Default URL	Install
Ollama	`ollama/`	`http://localhost:11434/v1`	`brew install ollama`
LM Studio	`lmstudio/`	`http://localhost:1234/v1`	Desktop app
vLLM	`vllm/`	`http://localhost:8000/v1`	`pip install vllm`
llama.cpp	`llamacpp/`	`http://localhost:8080/v1`	Build from source
LocalAI	`localai/`	`http://localhost:8080/v1`	Docker
llamafile	`llamafile/`	`http://localhost:8080/v1`	Single binary

Example: Ollama¶

CLIPythonTOML Config

# Start Ollama and pull a model

ollama pull llama3.2-vision

# Use it for VLM OCR (no API key needed)
kreuzberg extract scan.pdf --force-ocr true \
  --vlm-model ollama/llama3.2-vision

# Use it for structured extraction
kreuzberg extract-structured doc.pdf \
  --schema schema.json \
  --model ollama/llama3.2

# Use it for embeddings
kreuzberg embed --provider llm \
  --model ollama/all-minilm \
  --text "Hello world"

from Kreuzberg import extract_file, ExtractionConfig, StructuredExtractionConfig, LlmConfig

config = ExtractionConfig(
    structured_extraction=StructuredExtractionConfig(
        schema={"type": "object", "properties": {"title": {"type": "string"}}},
        llm=LlmConfig(model="ollama/llama3.2"),  # No api_key needed
    ),
)
result = await extract_file("doc.pdf", config=config)

[structured_extraction.llm]
model = "ollama/llama3.2"

# No api_key needed for local providers

Custom Base URL

If your local server runs on a non-default port, use base_url: python LlmConfig(model="ollama/llama3.2", base_url="http://localhost:11435/v1")

API Key Configuration¶

API keys can be set via (in order of precedence):

api_key field in LlmConfig — highest priority, per-request
Provider standard env vars (OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, etc.)
Kreuzberg-specific env var (KREUZBERG_LLM_API_KEY) — used as fallback for any provider

Local providers skip API key lookup

Local inference engines (Ollama, LM Studio, vLLM, llama.cpp, LocalAI, llamafile) do not require an API key. If you use a local provider prefix (for example, ollama/), the API key fields are ignored.

Python

from kreuzberg import LlmConfig

# Explicit API key
config = LlmConfig(model="openai/gpt-4o", api_key="sk-...")

# Custom base URL (e.g., Azure OpenAI, local proxy)
config = LlmConfig(
    model="openai/gpt-4o",
    base_url="https://my-proxy.example.com/v1",
)

LlmConfig Reference¶

Field	Type	Default	Description
`model`	`str`	required	Provider/model in liter-llm format (for example, `"openai/gpt-4o"`)
`api_key`	`str \\| None`	`None`	API key (falls back to env vars)
`base_url`	`str \\| None`	`None`	Custom endpoint URL
`timeout_secs`	`int \\| None`	`60`	Request timeout in seconds
`max_retries`	`int \\| None`	`3`	Maximum retry attempts
`temperature`	`float \\| None`	`None`	Sampling temperature
`max_tokens`	`int \\| None`	`None`	Maximum tokens to generate

REST API¶

Structured Extraction¶

POST /extract-structured — multipart form with file + schema + model configuration.

Terminal

curl -X POST http://localhost:4000/extract-structured \
  -F "file=@invoice.pdf" \
  -F 'schema={"type":"object","properties":{"vendor":{"type":"string"},"total":{"type":"number"}}}' \
  -F "model=openai/gpt-4o-mini" \
  -F "strict=true"

MCP Tools¶

When running Kreuzberg as an MCP server, LLM features are available as tools:

extract_structured — extract structured data from a document using a JSON schema
embed_text — extended with model parameter for LLM-hosted embeddings

OCR — OCR backends including VLM OCR
Configuration Reference — full field reference for all config types
Advanced Features — chunking, language detection, local embeddings
API Server — REST API endpoints

LLM Integration v4.8.0¶

VLM OCR¶

When to Use¶

Configuration¶

Custom VLM Prompt¶

Supported Providers¶

Structured Extraction¶

Basic Usage¶

Custom Prompts (Jinja2)¶

Cross-Provider Compatibility¶

Strict Mode¶

VLM Embeddings¶

Configuration¶

Available Models¶

Local LLM Support¶

Supported Local Engines¶

Example: Ollama¶

API Key Configuration¶

LlmConfig Reference¶

REST API¶

Structured Extraction¶

MCP Tools¶

Related¶