AI Coding Assistants v4.2.15¶

Kreuzberg ships with an Agent Skill that teaches AI coding assistants how to use the library correctly — covering extraction, configuration, OCR, chunking, embeddings, batch processing, error handling, and plugins across Python, Node.js/TypeScript, Rust, and CLI.

Supported Assistants¶

Works with any tool supporting the Agent Skills standard: Claude Code, Codex, Gemini CLI, Cursor, Visual Studio Code (with AI extensions), Amp, Goose, and Roo Code.

Installing¶

Terminal

# Install into current project (recommended)
npx skills add kreuzberg-dev/kreuzberg

# Install globally
npx skills add kreuzberg-dev/kreuzberg -g

Or copy manually:

Terminal

cp -r path/to/kreuzberg/skills/kreuzberg .claude/skills/kreuzberg

What the Skill Provides¶

When your AI coding assistant discovers the skill, it knows:

All extraction functions and their correct signatures across languages
Configuration field names (for example, max_chars not max_characters in Python)
Rust feature gates (for example, tokio-runtime for sync functions)
Language-specific conventions (snake_case in Python/Rust, camelCase in Node.js)
Error handling patterns for each language

Skill Structure¶

skills/kreuzberg/
├── SKILL.md                        # Main skill (~400 lines)
└── references/
    ├── python-api.md               # Complete Python API
    ├── nodejs-api.md               # Complete Node.js API
    ├── rust-api.md                 # Complete Rust API
    ├── cli-reference.md            # All CLI commands and flags
    ├── configuration.md            # Config file formats and schema
    ├── supported-formats.md        # All 90+ supported formats
    ├── advanced-features.md        # Plugins, embeddings, MCP, security
    └── other-bindings.md           # Go, Ruby, Java, C#, PHP, Elixir

The main file stays under 500 lines for efficient AI consumption. Reference files load on demand.

Quick Examples¶

PythonNode.jsRustCLI

from kreuzberg import extract_file, extract_file_sync, ExtractionConfig, OcrConfig

result = await extract_file("document.pdf")
print(result.content)

config = ExtractionConfig(
    ocr=OcrConfig(backend="tesseract", language="eng"),
    output_format="markdown",
)
result = await extract_file("document.pdf", config=config)

import { extractFile, extractFileSync } from '@kreuzberg/node';

const result = await extractFile('document.pdf');
console.log(result.content);

use kreuzberg::{extract_file, ExtractionConfig};

let config = ExtractionConfig::default();
let result = extract_file("document.pdf", None, &config).await?;

kreuzberg extract document.pdf
kreuzberg extract document.pdf --format json --output-format markdown