Skip to content

Kreuzberg

Document intelligence with a Rust core and native bindings for 17 languages. Extract text, tables, and metadata from 90+ formats with optional OCR — usable as an SDK, CLI, REST API, MCP server, or Docker image.


Why Kreuzberg

  • High Performance

Rust core with native PDFium, SIMD optimizations, and full parallelism. Process thousands of documents per minute without a GPU.

  • 90+ File Formats

PDF, DOCX, XLSX, PPTX, images, HTML, XML, emails, archives, academic formats — one API handles them all.

  • Multi-Engine OCR

Tesseract and PaddleOCR work across all language bindings. EasyOCR is available for Python only.

  • 16 Language Bindings

Native bindings for Python, TypeScript, Rust, Go, Java, Kotlin, C#, Ruby, PHP, Elixir, R, Dart, Swift, Zig, C, and WebAssembly.

  • Code Intelligence

Extract functions, classes, imports, symbols, and docstrings from 300+ programming languages. Results in the code_intelligence field with semantic chunking.

  • Plugin System

Register custom extractors, OCR backends, post-processors, and validators. Plugin authoring is primarily supported in Python; all bindings can consume registered plugins.

  • Flexible Deployment

Use as a library, CLI tool, REST API server, MCP server, or Docker container. Pick what fits your stack.

  • Redaction & Anonymisation

Strip PII before extracted content leaves the pipeline. Pattern engine covers emails, phones, IBANs, SSNs, credit cards; pair with NER for names, organisations, and locations.

  • Named-Entity Recognition

Detect people, organisations, locations, dates, money, and caller-supplied zero-shot labels in extracted text. ONNX (gline-rs) or LLM backend.

See all features


Language Support

Language Package Docs
Python pip install kreuzberg API Reference
TypeScript (Native) npm install @kreuzberg/node API Reference
TypeScript (WASM) npm install @kreuzberg/wasm API Reference
Rust cargo add kreuzberg API Reference
Go go get github.com/kreuzberg-dev/kreuzberg/v5 API Reference
Java Maven Central dev.kreuzberg:kreuzberg API Reference
Kotlin Maven Central dev.kreuzberg:kreuzberg-kotlin API Reference
C# dotnet add package Kreuzberg API Reference
Ruby gem install kreuzberg API Reference
PHP composer require kreuzberg/kreuzberg API Reference
Elixir {:kreuzberg, "~> 5.0.0-rc.1"} API Reference
R r-universe kreuzberg API Reference
Dart / Flutter dart pub add kreuzberg API Reference
Swift Swift Package Manager API Reference
Zig zig fetch --save from GitHub API Reference
C (FFI) Shared library + header API Reference
CLI brew install kreuzberg-dev/tap/kreuzberg CLI Guide
Docker ghcr.io/kreuzberg-dev/kreuzberg Docker Guide

Choosing Between TypeScript Packages

@kreuzberg/node — Use for Node.js servers and CLI tools. Native performance (100% speed).

@kreuzberg/wasm — Use for browsers, Cloudflare Workers, Deno, Bun, and serverless environments (60-80% speed, cross-platform).


Explore the Docs

  • Getting Started

Install Kreuzberg and extract your first document in minutes.

Quick Start

  • Guides

Configuration, OCR setup, Docker deployment, plugins, and more.

All Guides

  • Concepts

Architecture, extraction pipeline, MIME detection, and performance.

Architecture

  • API Reference

Complete API docs for every language binding, types, and errors.

References

  • CLI & Servers

Command-line tool, REST API server, and MCP server for AI agents.

CLI Usage

  • Migration

Migrate from Unstructured or other document extraction libraries.

Migration Guides


Getting Help

Edit this page on GitHub