Layout Detection v4.5.0¶
Detect document layout regions (tables, figures, headers, text blocks, etc.) in PDFs using ONNX-based deep learning models. Enables table extraction, figure isolation, reading-order reconstruction, and selective OCR.
Feature gate
Requires the layout-detection Cargo feature. Not included in the default feature set.
Model¶
Layout detection uses the RT-DETR v2 model (17 layout classes), an ONNX-based deep learning model for accurate document layout analysis.
When to Enable¶
Recommended for: complex multi-column PDFs, scanned documents, academic papers, business forms, documents where table extraction quality matters.
Less beneficial for: simple single-column text, high-throughput pipelines where latency is critical (consider GPU), documents already well-handled by the PDF structure tree.
Performance Impact¶
| Pipeline | Structure F1 | Text F1 | Avg time/doc |
|---|---|---|---|
| Baseline | 33.9% | 87.4% | 447 ms |
| Layout | 41.1% | 90.1% | 1500 ms |
171-document PDF corpus, CPU only. GPU acceleration significantly reduces the time penalty.
preset removed
The preset field ("fast" / "accurate") was removed from LayoutDetectionConfig. If it appears in a config file it is silently ignored. Only the RT-DETR v2 model is used for layout detection.
Configuration¶
Table Structure Models v4.5.3¶
When layout detection identifies a table region, a table structure model analyzes rows, columns, headers, and spanning cells.
| Model | Config value | Size | Speed | Best for |
|---|---|---|---|---|
| TATR | "tatr" (default) |
30 MB | Fast | General-purpose, consistent results |
| SLANeXT Wired | "slanet_wired" |
365 MB | Moderate | Bordered/gridlined tables |
| SLANeXT Wireless | "slanet_wireless" |
365 MB | Moderate | Borderless tables |
| SLANeXT Auto | "slanet_auto" |
~737 MB | Slower | Mixed documents (auto-classifies per page) |
| SLANet-plus | "slanet_plus" |
7.78 MB | Fastest | Resource-constrained environments |
Model Download
SLANeXT models are not downloaded by default. Use cache warm --all-table-models to pre-download, or they download automatically on first use.
GPU Acceleration¶
Layout detection uses ONNX Runtime with automatic provider selection:
| Provider | Platform | Notes |
|---|---|---|
| CPU | All | Default, no setup needed |
| CUDA | Linux, Windows | Requires CUDA toolkit + cuDNN |
| CoreML | macOS | Automatic on Apple Silicon |
| TensorRT | Linux | Requires TensorRT |
To override:
config = ExtractionConfig(
layout=LayoutDetectionConfig(),
acceleration=AccelerationConfig(provider="cuda", device_id=0)
)
See AccelerationConfig reference for details.
Layout Classes¶
The RT-DETR v2 model detects 17 layout classes:
| Class | Description |
|---|---|
Caption |
Figure or table caption |
Footnote |
Page footnote |
Formula |
Mathematical formula |
ListItem |
List item or bullet point |
PageFooter |
Running page footer |
PageHeader |
Running page header |
Picture |
Image, chart, or diagram |
SectionHeader |
Section heading |
Table |
Tabular data region |
Text |
Body text paragraph |
Title |
Document or page title |
DocumentIndex |
Table of contents |
Code |
Code block |
CheckboxSelected |
Checked checkbox |
CheckboxUnselected |
Unchecked checkbox |
Form |
Form region |
KeyValueRegion |
Key-value pair region |
Acknowledgments¶
- Docling — RT-DETR v2 model and layout classification approach
- TATR — Table structure recognition with ONNX
- PaddleOCR — SLANeXT table structure and PP-LCNet classifier models
Related¶
- Configuration Reference — full field reference
- Element-Based Output — using layout-aware results