Skip to content

Kreuzberg Documentation

Kreuzberg is a document intelligence platform with a high‑performance Rust core and native bindings for Python, TypeScript/Node.js, C#, Ruby, Go, Elixir, and Rust itself. Use it as an SDK, CLI, Docker image, REST API server, or MCP tool to extract text, tables, and metadata from 75+ file formats (PDF, Office, images, HTML, XML, archives, email, and more) with optional OCR and post-processing pipelines.

What You Can Do

  • Single API across languages – Binding idioms follow each ecosystem, but features (extraction, OCR, chunking, embeddings, plugins) map 1:1.
  • Structured extraction – Convert PDFs, Office docs, images, emails, HTML, XML, and archives into clean Markdown/JSON, preserving tables and metadata.
  • Multi-engine OCR – Built-in Tesseract and PaddleOCR support in all bindings, with EasyOCR extension for Python.
  • Plugin ecosystem – Register post-processors, validators, OCR backends, and run them from any binding or via the CLI/API server.
  • Deployment flexibility – Ship as a library, run the CLI, or host the API server/MCP adapter inside containers.
  • AI coding assistant support – Ships with an Agent Skill for Claude Code, Codex, Gemini CLI, Cursor, and other AI tools.

Documentation Map

  • Getting Started – First extraction in each language.
  • Installation – Dependency matrix for Rust, Python, Ruby, Node.js, CLI, and Docker users.
  • Guides – How to configure extraction, OCR, advanced features, plugins, and Docker/API deployments.
  • Concepts – Architecture, extraction pipeline, MIME detection, plugin runtime, and performance strategies.
  • Features directory – Exhaustive capability list per format/binding plus OCR and chunking options.
  • Reference – API references for all supported languages, configuration schema, supported formats, types, and errors.
  • CLI – Command syntax, flags, exit codes, and automation tips.
  • API Server – Running the REST service and integrating with MCP.
  • AI Coding Assistants – Agent Skill for Claude Code, Codex, Gemini CLI, Cursor, and more.
  • Migration and Changelog – Track breaking changes and release history.

Supported Platforms

Binding / Interface Package Use Case Docs
Python pip install kreuzberg Server-side, data processing Python API Reference
TypeScript/Node.js (Native) npm install @kreuzberg/node Node.js servers, command-line tools, native performance TypeScript API Reference
WebAssembly (WASM) npm install @kreuzberg/wasm Browsers, Cloudflare Workers, Deno, serverless WASM API Reference
Java dev.kreuzberg:kreuzberg (Maven) Server-side Java, FFM API Java API Reference
C# dotnet add package Kreuzberg .NET applications, Windows servers C# API Reference
PHP kreuzberg/kreuzberg (Composer) PHP applications, ext-ffi PHP API Reference
Ruby gem install kreuzberg Server-side, Rails applications Ruby API Reference
Go go get github.com/kreuzberg-dev/kreuzberg/packages/go/v4@latest Server-side, systems tools Go API Reference
Elixir {:kreuzberg, "~> 4.0"} BEAM applications, Phoenix apps Elixir API Reference
Rust cargo add kreuzberg System libraries, performance-critical Rust API Reference
CLI brew install kreuzberg-dev/tap/kreuzberg or cargo install kreuzberg-cli Terminal automation, scripting CLI Usage
API Server / MCP Docker image ghcr.io/kreuzberg-dev/kreuzberg:core Containerized services, MCP integration API Server Guide

Choosing Between TypeScript Packages

Kreuzberg provides two distinct TypeScript packages optimized for different runtimes:

Native TypeScript/Node.js (@kreuzberg/node)

Use @kreuzberg/node if you're targeting:

  • Node.js servers and applications
  • Command-line tools and scripts
  • Environments requiring maximum performance (near-native speeds)
  • Server-side batch processing and data pipelines

Native bindings compile to C++ N-API and deliver the best performance across all platforms.

Terminal
npm install @kreuzberg/node

WebAssembly (@kreuzberg/wasm)

Use @kreuzberg/wasm if you're targeting:

  • Web browsers (Chrome, Firefox, Safari, Edge)
  • Cloudflare Workers and other edge computing platforms
  • Deno and other JavaScript runtimes
  • Serverless environments (AWS Lambda, Vercel, etc.)
  • In-browser document processing without server dependencies

WASM bindings run entirely in WebAssembly and work in any JavaScript runtime with WASM support. See Performance for tradeoffs.

Terminal
npm install @kreuzberg/wasm

Performance Comparison

Binding Speed Relative to Native Memory Platform Support Use Case
Native (@kreuzberg/node) 100% (baseline) Efficient Node.js only Server-side, high-performance
WASM (@kreuzberg/wasm) 60-80% Higher Browsers, Workers, Deno, Bun In-browser, edge, serverless

WASM provides broad platform compatibility at the cost of performance. For server-side Node.js applications, always use native @kreuzberg/node.

Getting Help

Happy extracting!