Kreuzberg¶
Kreuzberg is a document intelligence platform with a high‑performance Rust core and native bindings for Python, TypeScript/Node.js, C#, Ruby, Go, Elixir, and Rust itself. You can use it as an SDK, CLI, Docker image, REST API server, or MCP tool to extract text, tables, and metadata from 91+ file formats (PDF, Office, images, HTML, XML, archives, email, and more) with optional OCR and post-processing pipelines.
Why Kreuzberg¶
-
High Performance
Rust core with native PDFium, SIMD optimizations, and full parallelism. Process thousands of documents per minute without a GPU.
-
91+ File Formats
PDF, DOCX, XLSX, PPTX, images, HTML, XML, emails, archives, academic formats — one API handles them all.
-
Multi-Engine OCR
Tesseract and PaddleOCR work across all language bindings. EasyOCR is available for Python only.
-
12 Language Bindings
Native bindings for Python, TypeScript, Rust, Go, Java, C#, Ruby, PHP, Elixir, R, C, and WebAssembly.
-
Code Intelligence
Extract functions, classes, imports, symbols, and docstrings from 248 programming languages. Results in
code_intelligencefield with semantic chunking. -
Plugin System
Register custom extractors, OCR backends, post-processors, and validators. Plugin authoring is primarily supported in Python; all bindings can consume registered plugins.
-
Flexible Deployment
Use as a library, CLI tool, REST API server, MCP server, or Docker container. Pick what fits your stack.
Language Support¶
Precompiled binaries for Linux (x86_64 & aarch64), macOS (Apple Silicon), and Windows (x64).
| Language | Package | Docs |
|---|---|---|
| Python | pip install kreuzberg |
API Reference |
| TypeScript (Native) | npm install @kreuzberg/node |
API Reference |
| TypeScript (WASM) | npm install @kreuzberg/wasm |
API Reference |
| Rust | cargo add kreuzberg |
API Reference |
| Go | go get .../kreuzberg/packages/go/v4 |
API Reference |
| Java | Maven Central dev.kreuzberg:kreuzberg |
API Reference |
| C# | dotnet add package Kreuzberg |
API Reference |
| Ruby | gem install kreuzberg |
API Reference |
| PHP | composer require kreuzberg/kreuzberg |
API Reference |
| Elixir | {:kreuzberg, "~> 4.0"} |
API Reference |
| R | r-universe kreuzberg |
API Reference |
| C (FFI) | Shared library + header | API Reference |
| CLI | brew install kreuzberg-dev/tap/kreuzberg |
CLI Guide |
| Docker | ghcr.io/kreuzberg-dev/kreuzberg |
Docker Guide |
Choosing Between TypeScript Packages
@kreuzberg/node — Use for Node.js servers and CLI tools. Native performance (100% speed).
@kreuzberg/wasm — Use for browsers, Cloudflare Workers, Deno, Bun, and serverless environments (60-80% speed, cross-platform).
Explore the Docs¶
-
Getting Started
Install Kreuzberg and extract your first document in minutes.
-
Guides
Configuration, OCR setup, Docker deployment, plugins, and more.
-
Concepts
Architecture, extraction pipeline, MIME detection, and performance.
-
API Reference
Complete API docs for every language binding, types, and errors.
-
CLI & Servers
Command-line tool, REST API server, and MCP server for AI agents.
-
Migration
Upgrade from v3 to v4, or migrate from Unstructured.
Getting Help¶
- Bugs & feature requests — Open an issue on GitHub
- Community chat — Join the Discord
- Contributing — Read the contributor guide