AI Coding Assistants¶
Kreuzberg ships with an Agent Skill that teaches AI coding assistants how to use the library correctly. The skill provides comprehensive API knowledge for Python, Node.js/TypeScript, Rust, and CLI, covering extraction, configuration, OCR, chunking, embeddings, batch processing, error handling, and plugins.
What Are Agent Skills?¶
Agent Skills are structured knowledge files that follow the open Agent Skills standard. They are automatically discovered by AI coding assistants and provide context about how to use a library, its APIs, and best practices. Unlike traditional documentation, skills are optimized for AI consumption with progressive disclosure: a concise main file for common tasks, with detailed reference files loaded on demand.
Supported AI Coding Assistants¶
The Kreuzberg skill works with any tool supporting the Agent Skills standard:
- Claude Code (Anthropic)
- Codex (OpenAI)
- Gemini CLI (Google)
- Cursor
- VS Code (with AI extensions)
- Amp
- Goose
- Roo Code
What the Skill Covers¶
The main skill file (skills/kreuzberg/SKILL.md) provides quick-start guidance for all four primary interfaces. Detailed reference files are available for deep dives into specific topics.
Extraction Flows¶
The skill covers all extraction patterns across languages:
from kreuzberg import extract_file, extract_file_sync
# Async extraction
result = await extract_file("document.pdf")
print(result.content) # Extracted text
print(result.metadata) # Document metadata
print(result.tables) # Structured tables
# Sync extraction
result = extract_file_sync("document.pdf")
use kreuzberg::{extract_file, extract_file_sync, ExtractionConfig};
// Async extraction
let config = ExtractionConfig::default();
let result = extract_file("document.pdf", None, &config).await?;
// Sync extraction (requires tokio-runtime feature)
let result = extract_file_sync("document.pdf", None, &config)?;
Configuration¶
The skill covers the full configuration system including OCR, chunking, output format, PDF options, and language detection:
Chunking and Embeddings¶
The skill covers text chunking for RAG pipelines and vector embedding generation:
from kreuzberg import ExtractionConfig, ChunkingConfig
config = ExtractionConfig(
chunking=ChunkingConfig(max_chars=1000, max_overlap=200),
)
result = await extract_file("document.pdf", config=config)
for chunk in result.chunks:
print(f"Chunk {chunk.metadata.chunk_index}: {chunk.content[:100]}...")
if chunk.embedding:
print(f" Embedding dimensions: {len(chunk.embedding)}")
Batch Processing¶
The skill covers batch extraction for processing multiple documents concurrently:
Error Handling¶
The skill provides error handling patterns for each language with specific error types for parsing, OCR, validation, and missing dependencies.
Plugin System¶
The skill covers the plugin architecture for custom post-processors, validators, and OCR backends.
Skill File Structure¶
skills/kreuzberg/
├── SKILL.md # Main skill (~400 lines)
└── references/
├── python-api.md # Complete Python API
├── nodejs-api.md # Complete Node.js API
├── rust-api.md # Complete Rust API
├── cli-reference.md # All CLI commands and flags
├── configuration.md # Config file formats and schema
├── supported-formats.md # All 75+ supported formats
├── advanced-features.md # Plugins, embeddings, MCP, security
└── other-bindings.md # Go, Ruby, Java, C#, PHP, Elixir
The main SKILL.md file is kept under 500 lines for efficient AI consumption. Reference files provide deep-dive details that AI tools load on demand when more context is needed.
How It Works¶
When you open a project that uses Kreuzberg (or a project with the skill files present), your AI coding assistant automatically discovers skills/kreuzberg/SKILL.md and loads it as context. This means the AI:
- Knows all available extraction functions and their correct signatures
- Uses the right field names for configuration (e.g.,
max_charsnotmax_charactersin Python) - Handles Rust feature gates correctly (e.g.,
tokio-runtimefor sync functions) - Follows language-specific conventions (snake_case in Python/Rust, camelCase in Node.js)
- Generates correct error handling patterns for each language
Installing the Skill¶
The easiest way to add the Kreuzberg skill to any project is with the Vercel Skills CLI:
# Install into current project (recommended)
npx skills add kreuzberg-dev/kreuzberg
# Install globally (available in all projects)
npx skills add kreuzberg-dev/kreuzberg -g
This places the skill files in the correct agent-specific directory (e.g., .claude/skills/kreuzberg/) so your AI coding assistant discovers them automatically.
Alternatively, you can copy the skill files manually:
# For Claude Code, Cursor, and GitHub Copilot
cp -r path/to/kreuzberg/skills/kreuzberg .claude/skills/kreuzberg
Further Reading¶
- Agent Skills Standard — The open standard for AI coding assistant skills
- Extraction Basics — Detailed extraction guide
- Advanced Features — Chunking, embeddings, language detection
- Configuration — Full configuration reference
- Plugin System — Creating custom plugins
- API Server & MCP — Server deployment and MCP integration