AI Coding Assistants v4.2.15¶
Kreuzberg ships with an Agent Skill that teaches AI coding assistants how to use the library correctly — covering extraction, configuration, OCR, chunking, embeddings, batch processing, error handling, and plugins across Python, Node.js/TypeScript, Rust, and CLI.
Supported Assistants¶
Works with any tool supporting the Agent Skills standard: Claude Code, Codex, Gemini CLI, Cursor, VS Code (with AI extensions), Amp, Goose, and Roo Code.
Installing¶
Terminal
# Install into current project (recommended)
npx skills add kreuzberg-dev/kreuzberg
# Install globally
npx skills add kreuzberg-dev/kreuzberg -g
Or copy manually:
What the Skill Provides¶
When your AI coding assistant discovers the skill, it knows:
- All extraction functions and their correct signatures across languages
- Configuration field names (e.g.,
max_charsnotmax_charactersin Python) - Rust feature gates (e.g.,
tokio-runtimefor sync functions) - Language-specific conventions (snake_case in Python/Rust, camelCase in Node.js)
- Error handling patterns for each language
Skill Structure¶
skills/kreuzberg/
├── SKILL.md # Main skill (~400 lines)
└── references/
├── python-api.md # Complete Python API
├── nodejs-api.md # Complete Node.js API
├── rust-api.md # Complete Rust API
├── cli-reference.md # All CLI commands and flags
├── configuration.md # Config file formats and schema
├── supported-formats.md # All 91+ supported formats
├── advanced-features.md # Plugins, embeddings, MCP, security
└── other-bindings.md # Go, Ruby, Java, C#, PHP, Elixir
The main file stays under 500 lines for efficient AI consumption. Reference files load on demand.
Quick Examples¶
from kreuzberg import extract_file, extract_file_sync, ExtractionConfig, OcrConfig
result = await extract_file("document.pdf")
print(result.content)
config = ExtractionConfig(
ocr=OcrConfig(backend="tesseract", language="eng"),
output_format="markdown",
)
result = await extract_file("document.pdf", config=config)
Further Reading¶
- Agent Skills Standard — the open standard
- Extraction Basics — core extraction API
- Configuration — all configuration options
- Advanced Features — chunking, embeddings, language detection
- Plugin System — custom plugins