LLM Integration v4.8.0¶
Kreuzberg integrates with 146 LLM providers (including local inference engines) via liter-llm for three capabilities: VLM OCR, structured extraction, and provider-hosted embeddings.
Feature gate
Requires the llm Cargo feature. Not included in the default feature set.
VLM OCR¶
Use vision-language models as an OCR backend. The document page is rendered as an image and sent to the VLM, which returns the extracted text.
When to Use¶
- Low-quality scanned documents where traditional OCR struggles
- Handwritten text recognition
- Arabic, Farsi, and other scripts with poor Tesseract/PaddleOCR support
- Complex layouts where traditional OCR fails (mixed tables, forms, diagrams)
- When you need higher accuracy and can accept higher latency and API costs
Configuration¶
import asyncio
from kreuzberg import extract_file, ExtractionConfig, OcrConfig, LlmConfig
async def main() -> None:
config = ExtractionConfig(
force_ocr=True,
ocr=OcrConfig(
backend="vlm",
vlm_config=LlmConfig(model="openai/gpt-4o-mini"),
),
)
result = await extract_file("scan.pdf", config=config)
print(result.content)
asyncio.run(main())
use kreuzberg::{extract_file, ExtractionConfig, OcrConfig, LlmConfig};
let config = ExtractionConfig {
force_ocr: true,
ocr: Some(OcrConfig {
backend: "vlm".to_string(),
vlm_config: Some(LlmConfig {
model: "openai/gpt-4o-mini".to_string(),
..Default::default()
}),
..Default::default()
}),
..Default::default()
};
let result = extract_file("scan.pdf", None, &config).await?;
Custom VLM Prompt¶
Override the default prompt template for VLM OCR:
from kreuzberg import ExtractionConfig, OcrConfig, LlmConfig
config = ExtractionConfig(
force_ocr=True,
ocr=OcrConfig(
backend="vlm",
vlm_config=LlmConfig(model="openai/gpt-4o-mini"),
vlm_prompt="Extract all text from this document image. Preserve formatting.",
),
)
Supported Providers¶
Any liter-llm vision-capable provider works as a VLM OCR backend:
| Provider | Example Model |
|---|---|
| OpenAI | openai/gpt-4o, openai/gpt-4o-mini |
| Anthropic | anthropic/claude-sonnet-4-20250514 |
google/gemini-2.0-flash |
|
| Groq | groq/llama-3.2-90b-vision-preview |
| Ollama (local) | ollama/llama3.2-vision |
| LM Studio (local) | lmstudio/llava-1.5 |
| vLLM (local) | vllm/llava-next |
Structured Extraction¶
Extract structured JSON data from documents by providing a JSON schema. The document is first extracted as text, then sent to an LLM with the schema to produce conforming output.
Basic Usage¶
import asyncio
from kreuzberg import extract_file, ExtractionConfig, StructuredExtractionConfig, LlmConfig
async def main() -> None:
config = ExtractionConfig(
structured_extraction=StructuredExtractionConfig(
schema={
"type": "object",
"properties": {
"title": {"type": "string"},
"authors": {"type": "array", "items": {"type": "string"}},
"date": {"type": "string"},
},
"required": ["title", "authors", "date"],
"additionalProperties": False,
},
llm=LlmConfig(model="openai/gpt-4o-mini"),
strict=True,
),
)
result = await extract_file("paper.pdf", config=config)
print(result.structured_output)
# {"title": "...", "authors": ["..."], "date": "..."}
asyncio.run(main())
import { extractFileSync } from '@kreuzberg/node';
const config = {
structuredExtraction: {
schema: {
type: 'object',
properties: {
title: { type: 'string' },
authors: { type: 'array', items: { type: 'string' } },
date: { type: 'string' },
},
required: ['title', 'authors', 'date'],
additionalProperties: false,
},
llm: {
model: 'openai/gpt-4o-mini',
},
strict: true,
},
};
const result = extractFileSync('paper.pdf', null, config);
console.log(result.structuredOutput);
use kreuzberg::{
extract_file, ExtractionConfig, LlmConfig, StructuredExtractionConfig,
};
use serde_json::json;
#[tokio::main]
async fn main() -> kreuzberg::Result<()> {
let config = ExtractionConfig {
structured_extraction: Some(StructuredExtractionConfig {
schema: json!({
"type": "object",
"properties": {
"title": { "type": "string" },
"authors": { "type": "array", "items": { "type": "string" } },
"date": { "type": "string" }
},
"required": ["title", "authors", "date"],
"additionalProperties": false
}),
llm: LlmConfig {
model: "openai/gpt-4o-mini".to_string(),
..Default::default()
},
strict: true,
..Default::default()
}),
..Default::default()
};
let result = extract_file("paper.pdf", None, &config).await?;
if let Some(structured) = &result.structured_output {
println!("{}", structured);
}
Ok(())
}
[structured_extraction]
schema_name = "paper_metadata"
strict = true
[structured_extraction.schema]
type = "object"
[structured_extraction.schema.properties.title]
type = "string"
[structured_extraction.schema.properties.date]
type = "string"
[structured_extraction.llm]
model = "openai/gpt-4o-mini"
Custom Prompts (Jinja2)¶
Override the default extraction prompt with a Jinja2 template:
from kreuzberg import ExtractionConfig, StructuredExtractionConfig, LlmConfig
config = ExtractionConfig(
structured_extraction=StructuredExtractionConfig(
schema={"type": "object", "properties": {"title": {"type": "string"}}},
llm=LlmConfig(model="openai/gpt-4o-mini"),
prompt=(
"Analyze this document and extract key metadata.\n\n"
"Document:\n{{ content }}\n\n"
"Schema: {{ schema }}"
),
),
)
Available template variables:
| Variable | Description |
|---|---|
{{ content }} |
The extracted document text |
{{ schema }} |
The JSON schema as a formatted string |
{{ schema_name }} |
The schema name (default: "extraction") |
{{ schema_description }} |
The schema description (may be empty) |
Cross-Provider Compatibility¶
Structured extraction handles provider differences automatically:
- OpenAI: Full strict mode with
additionalPropertiesenforcement - Anthropic/Gemini:
additionalPropertiesautomatically stripped (not supported by these providers) - All providers: Markdown code fence wrapping in responses is automatically handled
Strict Mode¶
When strict=True, the LLM is instructed to produce output that exactly matches the schema. This enables OpenAI's structured output mode and adds validation on the response.
VLM Embeddings¶
Use provider-hosted embedding models instead of local ONNX models. Useful when you want to match the embedding model used by your vector database or when local ONNX models are not available.
Configuration¶
import asyncio
from kreuzberg import embed, EmbeddingConfig, EmbeddingModelType, LlmConfig
async def main() -> None:
config = EmbeddingConfig(
model=EmbeddingModelType.llm(
LlmConfig(model="openai/text-embedding-3-small")
),
normalize=True,
)
embeddings = await embed(["Hello world"], config=config)
print(len(embeddings[0])) # 1536
asyncio.run(main())
use kreuzberg::{embed_texts, EmbeddingConfig, EmbeddingModelType, LlmConfig};
let config = EmbeddingConfig {
model: EmbeddingModelType::Llm {
llm: LlmConfig {
model: "openai/text-embedding-3-small".to_string(),
..Default::default()
},
},
normalize: true,
..Default::default()
};
let embeddings = embed_texts(&["Hello world"], &config)?;
Available Models¶
| Model | Dimensions | Provider |
|---|---|---|
openai/text-embedding-3-small |
1536 | OpenAI |
openai/text-embedding-3-large |
3072 | OpenAI |
mistral/mistral-embed |
1024 | Mistral |
| Any liter-llm embedding-capable provider | Varies | Various |
Local LLM Support¶
v4.8.0
Kreuzberg supports local LLM inference engines via liter-llm's built-in provider routing. No API key required — just point to your local server.
Supported Local Engines¶
| Engine | Prefix | Default URL | Install |
|---|---|---|---|
| Ollama | ollama/ |
http://localhost:11434/v1 |
brew install ollama |
| LM Studio | lmstudio/ |
http://localhost:1234/v1 |
Desktop app |
| vLLM | vllm/ |
http://localhost:8000/v1 |
pip install vllm |
| llama.cpp | llamacpp/ |
http://localhost:8080/v1 |
Build from source |
| LocalAI | localai/ |
http://localhost:8080/v1 |
Docker |
| llamafile | llamafile/ |
http://localhost:8080/v1 |
Single binary |
Example: Ollama¶
# Start Ollama and pull a model
ollama pull llama3.2-vision
# Use it for VLM OCR (no API key needed)
kreuzberg extract scan.pdf --force-ocr true \
--vlm-model ollama/llama3.2-vision
# Use it for structured extraction
kreuzberg extract-structured doc.pdf \
--schema schema.json \
--model ollama/llama3.2
# Use it for embeddings
kreuzberg embed --provider llm \
--model ollama/all-minilm \
--text "Hello world"
from Kreuzberg import extract_file, ExtractionConfig, StructuredExtractionConfig, LlmConfig
config = ExtractionConfig(
structured_extraction=StructuredExtractionConfig(
schema={"type": "object", "properties": {"title": {"type": "string"}}},
llm=LlmConfig(model="ollama/llama3.2"), # No api_key needed
),
)
result = await extract_file("doc.pdf", config=config)
Custom Base URL
If your local server runs on a non-default port, use base_url:
python
LlmConfig(model="ollama/llama3.2", base_url="http://localhost:11435/v1")
API Key Configuration¶
API keys can be set via (in order of precedence):
api_keyfield inLlmConfig— highest priority, per-request- Provider standard env vars (
OPENAI_API_KEY,ANTHROPIC_API_KEY,GOOGLE_API_KEY, etc.) - Kreuzberg-specific env var (
KREUZBERG_LLM_API_KEY) — used as fallback for any provider
Local providers skip API key lookup
Local inference engines (Ollama, LM Studio, vLLM, llama.cpp, LocalAI, llamafile) do not require an API key. If you use a local provider prefix (for example, ollama/), the API key fields are ignored.
from kreuzberg import LlmConfig
# Explicit API key
config = LlmConfig(model="openai/gpt-4o", api_key="sk-...")
# Custom base URL (e.g., Azure OpenAI, local proxy)
config = LlmConfig(
model="openai/gpt-4o",
base_url="https://my-proxy.example.com/v1",
)
LlmConfig Reference¶
| Field | Type | Default | Description |
|---|---|---|---|
model |
str |
required | Provider/model in liter-llm format (for example, "openai/gpt-4o") |
api_key |
str \| None |
None |
API key (falls back to env vars) |
base_url |
str \| None |
None |
Custom endpoint URL |
timeout_secs |
int \| None |
60 |
Request timeout in seconds |
max_retries |
int \| None |
3 |
Maximum retry attempts |
temperature |
float \| None |
None |
Sampling temperature |
max_tokens |
int \| None |
None |
Maximum tokens to generate |
REST API¶
Structured Extraction¶
POST /extract-structured — multipart form with file + schema + model configuration.
curl -X POST http://localhost:4000/extract-structured \
-F "file=@invoice.pdf" \
-F 'schema={"type":"object","properties":{"vendor":{"type":"string"},"total":{"type":"number"}}}' \
-F "model=openai/gpt-4o-mini" \
-F "strict=true"
MCP Tools¶
When running Kreuzberg as an MCP server, LLM features are available as tools:
extract_structured— extract structured data from a document using a JSON schemaembed_text— extended withmodelparameter for LLM-hosted embeddings
Related¶
- OCR — OCR backends including VLM OCR
- Configuration Reference — full field reference for all config types
- Advanced Features — chunking, language detection, local embeddings
- API Server — REST API endpoints