Layout Detection v4.5.0¶
Detect document layout regions (tables, figures, headers, text blocks, etc.) in PDFs using ONNX-based deep learning models. Enables table extraction, figure isolation, reading-order reconstruction, and selective OCR.
Feature gate
Requires the layout-detection Cargo feature. Not included in the default feature set.
Model¶
Layout detection uses the RT-DETR v2 model, an ONNX-based deep learning model that detects 17 layout element classes: text blocks, tables, figures, headers, footers, captions, code, lists, sections, formulas, footnotes, page headers/footers, titles, checkboxes, key-value regions, and document indexes.
When to Enable¶
Recommended for: complex multi-column PDFs, scanned documents, academic papers, business forms, and any document where layout understanding improves extraction accuracy.
Less beneficial for: simple single-column text documents, high-throughput pipelines where latency is critical (consider GPU acceleration), or documents already well-handled by PDF structure trees.
Performance Impact¶
| Pipeline | Structure F1 | Text F1 | Avg time/doc |
|---|---|---|---|
| Baseline | 33.9% | 87.4% | 447 ms |
| Layout | 41.1% | 90.1% | 1500 ms |
171-document PDF corpus, CPU only. GPU acceleration significantly reduces the time penalty.
Layout Detection Model
Kreuzberg uses only the RT-DETR v2 model for layout detection. The preset field is not available in LayoutDetectionConfig. Configure table structure recognition separately via table_model — see "Table Structure Models" below.
Configuration¶
# Enable layout detection with default settings
kreuzberg extract document.pdf --layout --content-format markdown
# Custom confidence threshold
kreuzberg extract document.pdf --layout-confidence 0.5 --content-format markdown
# Specific table model
kreuzberg extract document.pdf --layout --layout-table-model slanet_wired
# Combined with GPU acceleration
kreuzberg extract document.pdf --layout --acceleration coreml
Table Structure Models v4.5.3¶
When layout detection identifies a table region, a table structure model analyzes rows, columns, headers, and spanning cells.
| Model | Config value | Size | Speed | Best for |
|---|---|---|---|---|
| TATR | "tatr" (default) |
30 MB | Fast | General-purpose, consistent results |
| SLANeXT Wired | "slanet_wired" |
365 MB | Moderate | Bordered/gridlined tables |
| SLANeXT Wireless | "slanet_wireless" |
365 MB | Moderate | Borderless tables |
| SLANeXT Auto | "slanet_auto" |
~737 MB | Slower | Mixed documents (auto-classifies per page) |
| SLANet-plus | "slanet_plus" |
7.78 MB | Fastest | Resource-constrained environments |
Model Download
SLANeXT models are not downloaded by default. Use cache warm --all-table-models to pre-download, or they download automatically on first use.
GPU Acceleration¶
Layout detection uses ONNX Runtime with automatic provider selection:
| Provider | Platform | Notes |
|---|---|---|
| CPU | All | Default, no setup needed |
| CUDA | Linux, Windows | Requires CUDA toolkit + cuDNN |
| CoreML | macOS | Automatic on Apple Silicon |
| TensorRT | Linux | Requires TensorRT |
To override:
config = ExtractionConfig(
layout=LayoutDetectionConfig(),
acceleration=AccelerationConfig(provider="cuda", device_id=0)
)
See AccelerationConfig reference for details.
Layout Classes¶
The RT-DETR v2 model detects 17 layout classes:
| Class | Description |
|---|---|
Caption |
Figure or table caption |
Footnote |
Page footnote |
Formula |
Mathematical formula |
ListItem |
List item or bullet point |
PageFooter |
Running page footer |
PageHeader |
Running page header |
Picture |
Image, chart, or diagram |
SectionHeader |
Section heading |
Table |
Tabular data region |
Text |
Body text paragraph |
Title |
Document or page title |
DocumentIndex |
Table of contents |
Code |
Code block |
CheckboxSelected |
Checked checkbox |
CheckboxUnselected |
Unchecked checkbox |
Form |
Form region |
KeyValueRegion |
Key-value pair region |
Acknowledgments¶
- Docling — RT-DETR v2 model and layout classification approach
- TATR — Table structure recognition with ONNX
- PaddleOCR — SLANeXT table structure and PP-LCNet classifier models
Related¶
- Configuration Reference — full field reference
- Element-Based Output — using layout-aware results