Skip to content

Docker Deployment v4.0.0

Official Docker images built on the Rust core with Debian 13 (Trixie). Each image supports three execution modes: API server (default), CLI tool, and MCP server.

Quick Start

Pull and Run

Bash
# Start API server (default mode)
docker run -p 8000:8000 ghcr.io/kreuzberg-dev/kreuzberg:latest

# Test the API
curl -F "files=@document.pdf" http://localhost:8000/extract
Bash
# Extract a single file
docker run -v $(pwd):/data ghcr.io/kreuzberg-dev/kreuzberg:latest \
  extract /data/document.pdf

# Batch process multiple files
docker run -v $(pwd):/data ghcr.io/kreuzberg-dev/kreuzberg:latest \
  batch /data/*.pdf --output-format json

# Detect MIME type
docker run -v $(pwd):/data ghcr.io/kreuzberg-dev/kreuzberg:latest \
  detect /data/unknown-file.bin
Bash
# Start MCP server
docker run ghcr.io/kreuzberg-dev/kreuzberg:latest mcp

Pull Image

Bash
docker pull ghcr.io/kreuzberg-dev/kreuzberg:core
Bash
docker pull ghcr.io/kreuzberg-dev/kreuzberg:latest

Image Variants

Core Full
Image ghcr.io/kreuzberg-dev/kreuzberg:latest ghcr.io/kreuzberg-dev/kreuzberg:latest
Size ~1.0–1.3 GB ~1.5–2.1 GB
Tesseract OCR 12 languages 12 languages
Modern Office DOCX, PPTX, XLSX DOCX, PPTX, XLSX
Legacy Office DOC, PPT, XLS (native OLE/CFB) DOC, PPT, XLS (native OLE/CFB)
Startup ~1s ~1s

Core is optimized for production deployments where image size matters. Full adds legacy format support for complete document intelligence pipelines.

Both images include: Tesseract OCR (eng, spa, fra, deu, ita, por, chi-sim, chi-tra, jpn, ara, rus, hin), pdfium, images, HTML, email, archives.

Execution Modes

API Server (Default)

Terminal
docker run -p 8000:8000 ghcr.io/kreuzberg-dev/kreuzberg:latest

# Custom port and CORS
docker run -p 9000:9000 \
  -e KREUZBERG_CORS_ORIGINS="https://myapp.com" \
  ghcr.io/kreuzberg-dev/kreuzberg:latest \
  serve --host 0.0.0.0 --port 9000

# With config file
docker run -p 8000:8000 \
  -v $(pwd)/kreuzberg.toml:/config/kreuzberg.toml \
  ghcr.io/kreuzberg-dev/kreuzberg:latest \
  serve --config /config/kreuzberg.toml

See API Server Guide for endpoint documentation.

CLI Mode

Terminal
# Extract a file
docker run -v $(pwd):/data ghcr.io/kreuzberg-dev/kreuzberg:latest \
  extract /data/document.pdf

# Extract with OCR
docker run -v $(pwd):/data ghcr.io/kreuzberg-dev/kreuzberg:latest \
  extract /data/scanned.pdf --ocr true

# Batch processing
docker run -v $(pwd):/data ghcr.io/kreuzberg-dev/kreuzberg:latest \
  batch /data/*.pdf --format json

# MIME detection
docker run -v $(pwd):/data ghcr.io/kreuzberg-dev/kreuzberg:latest \
  detect /data/unknown-file.bin

MCP Server

Terminal
docker run ghcr.io/kreuzberg-dev/kreuzberg:latest mcp

# With config
docker run \
  -v $(pwd)/kreuzberg.toml:/config/kreuzberg.toml \
  ghcr.io/kreuzberg-dev/kreuzberg:latest \
  mcp --config /config/kreuzberg.toml

See API Server Guide - MCP Section for integration details.

Environment Variables

Variable Default Description
KREUZBERG_MAX_UPLOAD_SIZE_MB 100 Max upload size in MB
KREUZBERG_CORS_ORIGINS * Comma-separated allowed origins
RUST_LOG info Log level: error, warn, info, debug, trace
KREUZBERG_CACHE_DIR /app/.kreuzberg Cache directory (set explicitly in Docker; outside containers defaults to platform global cache)
HF_HOME /app/.kreuzberg/huggingface HuggingFace model cache

Host and port are set via CLI args: serve --host 0.0.0.0 --port 8000.

Volume Mounts

Terminal
# Cache persistence (embedding models, OCR cache)
docker run -p 8000:8000 \
  -v kreuzberg-cache:/app/.kreuzberg \
  ghcr.io/kreuzberg-dev/kreuzberg:latest

# Config file
docker run -p 8000:8000 \
  -v $(pwd)/kreuzberg.toml:/config/kreuzberg.toml \
  ghcr.io/kreuzberg-dev/kreuzberg:latest \
  serve --config /config/kreuzberg.toml

# Documents (read-only)
docker run -v $(pwd)/documents:/data:ro \
  ghcr.io/kreuzberg-dev/kreuzberg:latest \
  extract /data/document.pdf

Model Downloads

Embedding models download on first use (~90 MB – 1.2 GB depending on preset). Use a persistent volume for /app/.kreuzberg in production to avoid re-downloading on container restart. Outside Docker, models are cached in the platform-specific global cache directory (for example, ~/.cache/kreuzberg/ on Linux, ~/Library/Caches/kreuzberg/ on macOS).

Docker Compose

docker-compose.yaml
services:
  kreuzberg-api:
    image: ghcr.io/kreuzberg-dev/kreuzberg:latest
    ports:
      - "8000:8000"
    environment:
      - KREUZBERG_CORS_ORIGINS=https://myapp.com
      - KREUZBERG_MAX_UPLOAD_SIZE_MB=500
      - RUST_LOG=info
    volumes:
      - ./config:/config
      - cache-data:/app/.kreuzberg
    command: serve --host 0.0.0.0 --port 8000 --config /config/kreuzberg.toml
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "kreuzberg", "--version"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 5s

volumes:
  cache-data:

Security

Images run as non-root user kreuzberg (UID 1000). For hardened deployments:

Terminal
docker run --security-opt no-new-privileges \
  --read-only \
  --tmpfs /tmp \
  -p 8000:8000 \
  ghcr.io/kreuzberg-dev/kreuzberg:latest

Ensure mounted volumes have correct permissions:

Terminal
chown -R 1000:1000 /path/to/mounted/directory

Resource Allocation

Workload Memory CPU Notes
Light 512 MB 0.5 cores Small documents, low concurrency
Medium 1 GB 1 core Typical documents, moderate concurrency
Heavy 2 GB+ 2+ cores Large documents, OCR, high concurrency
Terminal
docker run -p 8000:8000 --memory=1g --cpus=1 \
  ghcr.io/kreuzberg-dev/kreuzberg:latest

Building Custom Images

Bash
docker build -f docker/Dockerfile.core -t kreuzberg:core .
Bash
docker build -f docker/Dockerfile.full -t kreuzberg:full .
Custom Dockerfile
FROM ghcr.io/kreuzberg-dev/kreuzberg:latest

USER root
RUN apt-get update && \
    apt-get install -y --no-install-recommends your-package-here && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

USER kreuzberg
COPY kreuzberg.toml /app/kreuzberg.toml
CMD ["serve", "--config", "/app/kreuzberg.toml"]

Other Image Variants

The published Core and Full images cover most use cases. For specialized needs, the docker/ directory has additional Dockerfiles:

Dockerfile What it builds
Dockerfile.cli Minimal image with just the kreuzberg binary — good for CI pipelines and batch jobs
Dockerfile.musl-build Fully static Linux binaries via MUSL — runs on any distro, no dynamic libs
Dockerfile.musl-ffi Static C FFI library for language bindings (Go, Ruby, R, PHP, Elixir)
Dockerfile.musl-rustler MUSL-based Rustler NIF for Elixir

CLI Image

A stripped-down image with only the CLI binary. No server, no API — just extraction:

Terminal
docker build -f docker/Dockerfile.cli -t kreuzberg-cli .

docker run -v $(pwd):/data kreuzberg-cli extract /data/document.pdf
docker run -v $(pwd):/data kreuzberg-cli batch /data/*.pdf --format json
docker run -v $(pwd):/data kreuzberg-cli detect /data/unknown-file.bin

MUSL Static Builds

These produce binaries with zero dynamic library dependencies. A single file that runs on any Linux — Alpine, scratch containers, bare EC2 instances, whatever.

Terminal
docker build -f docker/Dockerfile.musl-build -t kreuzberg-musl-build .
docker build -f docker/Dockerfile.musl-ffi -t kreuzberg-musl-ffi .

The FFI variant builds a shared library used by the Go, Ruby, R, PHP, and Elixir bindings for portable cross-platform distribution.

Troubleshooting

Container won't start

Check logs with docker logs <container-id>. Common causes: port conflict (change -p mapping), insufficient memory (increase --memory), volume permission errors.

Permission errors on mounted volumes

Images run as UID 1000. Fix with: chown -R 1000:1000 /path/to/mounted/directory

Large file processing fails

Increase memory limit (--memory=4g) and upload size (-e KREUZBERG_MAX_UPLOAD_SIZE_MB=1000).

Next Steps