Contributing Guide¶
Thank you for your interest in contributing to Kreuzberg! This guide covers everything you need — from picking an issue to getting your pull request merged.
First time contributing?¶
Welcome! Here's how to get started:
- Pick an issue that matches your experience level:
- Good first issue — small, well-scoped tasks ideal for newcomers
- Help wanted — tasks where we'd especially appreciate community help
- Read through the issue and any existing comments
- Leave a comment letting maintainers know you'd like to work on it
- Ask questions — we're here to help!
Congratulations — that's really all it takes to start contributing! Fork, fix, and open a PR. We keep the process simple so you can focus on what matters: the code.
Tip
Start small. A focused contribution you understand well is more valuable than an ambitious one that stalls.
Want to propose a larger change or new feature? Open an issue to discuss it with maintainers first.
Prerequisites¶
You only need the toolchains for the areas you plan to work on.
Required for all contributions:
- Git
- Task — our task runner for all build and test workflows
- Rust stable (via
rustup) — required for core and all bindings. Thewasm32-unknown-unknowntarget is configured automatically viarust-toolchain.toml
Required for WASM builds:
- WASI SDK — provides a wasm-capable C/C++ compiler needed by tree-sitter, tesseract, and pdfium. Install to
$HOME/wasi-sdkor set theWASI_SDK_PATHenvironment variable to your install location
Language-specific toolchains (only install what you need):
| Language | Version | Tool |
|---|---|---|
| Python | 3.10+ | uv |
| Node.js | 20+ | pnpm |
| Ruby | 3.2+ | rbenv or rvm |
| Go | 1.26+ | Official installer |
| Java | 25+ | JDK (via sdkman) |
| .NET | 10+ | dotnet |
| PHP | 8.1+ | composer |
| Elixir | 1.14+ | mix (OTP 25+) |
| R | 4.1+ | CRAN |
For platform-specific build dependencies (compilers, OpenSSL, etc.), see the Installation guide.
Development setup¶
Set up your entire environment with a single command:
This installs all toolchains and dependencies. Safe to re-run anytime.
For building individual language bindings, use the namespace pattern:
Development workflow¶
1. Fork and clone¶
Fork the repository on GitHub, then clone your fork:
git clone git@github.com:<your-username>/kreuzberg.git
cd kreuzberg
git remote add upstream https://github.com/kreuzberg-dev/kreuzberg.git
2. Create a branch¶
Use a prefix that matches your change type: feat/, fix/, docs/, perf/, chore/, test/.
3. Make your changes¶
Keep commits small and focused.
4. Run checks¶
This runs both linting and formatting checks. For language-specific tests:
5. Commit with conventional messages¶
We use Conventional Commits. The pre-commit hook validates this.
feat: add PDF table extraction support
fix: handle empty MIME type in archive entries
docs: update Python API reference for v4.4
perf: parallelize layout inference
6. Update documentation¶
When adding user-facing features, add or update pages under docs/ and reference them in zensical.toml.
Issues¶
Finding issues¶
Browse the issue tracker and filter by labels: good first issue, help wanted, bug, or enhancement.
Reporting a bug¶
Include: what you expected, what happened (with error output), steps to reproduce, your environment (OS, language version, Kreuzberg version), and a minimal sample file if applicable.
Suggesting improvements¶
Search for existing issues first. Describe the use case and keep scope focused — break large ideas into smaller, actionable issues.
Filing great issues
Be specific: "PDF tables lose column alignment" is better than "PDF parsing is broken." Explain impact and link related issues with #123.
Submitting a pull request¶
PR checklist¶
Before opening a PR, verify locally:
-
task checkpasses - Targeted tests pass
- Docs updated (if applicable)
- Commits follow Conventional Commits
Writing a good PR description¶
Include what changed, why, and how you tested it. Use Fixes #123 to auto-close related issues.
Tip
Set your PR to Draft while it's in progress. Maintainers may leave early comments but won't do a full review until you mark it ready.
Review and merge¶
- CI runs — automated builds and tests across platforms
- Maintainers review — code correctness, style, and design
- Feedback rounds — make requested changes and push
- Merge — once approved with all checks passing
Merge requirements: all CI checks pass, at least one maintainer approval, no unresolved conversations, branch up to date with main.
Info
Don't worry about failing CI on your first PR. Maintainers will help you resolve issues.
CI/CD¶
Kreuzberg ships six GitHub Actions workflows under .github/workflows/. The first two run automatically on contributor PRs; the rest are manual or release-driven and contributors do not need to invoke them.
| Workflow | Trigger | What it does |
|---|---|---|
ci.yaml |
Push to main, every PR |
Clippy, fmt, unit + integration tests, type checks for the Python and TypeScript bindings. Runs on ubuntu-24.04-arm. This is the canonical "PR is mergeable" check. |
docs.yaml |
Push/PR touching docs/**, manual dispatch |
Builds the docs site in strict mode, validates --8<-- snippet includes, runs prose linting, and deploys to GitHub Pages from main. |
publish.yaml |
Manual dispatch, GitHub release event | Publishes to PyPI, npm, crates.io, Docker Hub, Homebrew, and other registries. Not run on PRs. |
publish-docker.yaml |
Manual dispatch, GitHub release event | Builds and publishes the Kreuzberg Docker images. |
benchmarks.yaml |
Manual dispatch only | Three-iteration performance run with quality metrics on ubuntu-24.04-arm. Used to compare proposed changes against main. |
profiling.yaml |
Manual dispatch only | Generates flamegraphs for six fixture types (small/medium PDFs, simple DOCX, and others) for performance investigations. |
Reading workflow failures¶
Note
Please run checks locally before you open a PR. For example task check plus tests for any language bindings you touched (see the Development Workflow guide for common commands). That catches most CI failures faster than iterating on GitHub alone.
Open the failing PR's Checks tab and click into the failing job to expand its log. The job name maps directly to the step in ci.yaml that failed (for example, clippy or python-test). To re-run after pushing a fix, GitHub Actions will pick the new commit up automatically; to re-run without a new commit (for flakes), use the Re-run failed jobs button at the top right of the workflow run page.
If a check is reporting "expected check missing" rather than failing outright, the workflow file probably wasn't reachable from your branch — rebase on main and the check will register on the next push.
Coding standards¶
- Rust: Edition 2024, no
unwrap()in production paths, document all public items,SAFETYcomments forunsafeblocks - Python:
frozen=True/slots=Truedataclasses, function-based pytest, follow Ruff and Mypy rules - TypeScript: Strict types, no
any, bindings inpackages/typescript/src - Ruby: No global state outside
Kreuzbergmodule, panic-free native bridge, follow RuboCop - Go / Java / C#: Follow standard language conventions and project linters
Testing: language-specific tests live in each package; shared E2E behavior belongs in e2e/ fixtures. When adding features, regenerate with task e2e:<lang>:generate.
Community and support¶
- Star the repo: Give us a star on GitHub — it helps others discover Kreuzberg!
- Discord: Join our community
- Issues: GitHub Issues
- License: Elastic License 2.0 (ELv2)
Thank you for contributing to Kreuzberg!