Skip to content

Layout Detection v4.5.0

Detect document layout regions (tables, figures, headers, text blocks, etc.) in PDFs using ONNX-based deep learning models. Enables table extraction, figure isolation, reading-order reconstruction, and selective OCR.

Feature gate

Requires the layout-detection Cargo feature. Not included in the default feature set.

Model

Layout detection uses the RT-DETR v2 model, an ONNX-based deep learning model that detects 17 layout element classes: text blocks, tables, figures, headers, footers, captions, code, lists, sections, formulas, footnotes, page headers/footers, titles, checkboxes, key-value regions, and document indexes.

When to Enable

Recommended for: complex multi-column PDFs, scanned documents, academic papers, business forms, and any document where layout understanding improves extraction accuracy.

Less beneficial for: simple single-column text documents, high-throughput pipelines where latency is critical (consider GPU acceleration), or documents already well-handled by PDF structure trees.

Performance Impact

Pipeline Structure F1 Text F1 Avg time/doc
Baseline 33.9% 87.4% 447 ms
Layout 41.1% 90.1% 1500 ms

171-document PDF corpus, CPU only. GPU acceleration significantly reduces the time penalty.

Layout Detection Model

Kreuzberg uses only the RT-DETR v2 model for layout detection. The preset field is not available in LayoutDetectionConfig. Configure table structure recognition separately via table_model — see "Table Structure Models" below.

Configuration

from kreuzberg import ExtractionConfig, LayoutDetectionConfig, extract_file

config = ExtractionConfig(
    layout=LayoutDetectionConfig(
        confidence_threshold=0.5,
        apply_heuristics=True,
        table_model="tatr",
    )
)
result = await extract_file("document.pdf", config=config)
const result = await extract("document.pdf", {
  layout: {
    confidenceThreshold: 0.5,
    applyHeuristics: true,
    tableModel: "tatr",
  },
});
use kreuzberg::core::{ExtractionConfig, LayoutDetectionConfig};

let config = ExtractionConfig {
    layout: Some(LayoutDetectionConfig {
        confidence_threshold: Some(0.5),
        apply_heuristics: true,
        table_model: Some("tatr".to_string()),
        ..Default::default()
    }),
    ..Default::default()
};
kreuzberg.toml
[layout]
apply_heuristics = true
# table_model = "tatr"
Terminal
# Enable layout detection with default settings
kreuzberg extract document.pdf --layout --content-format markdown

# Custom confidence threshold
kreuzberg extract document.pdf --layout-confidence 0.5 --content-format markdown

# Specific table model
kreuzberg extract document.pdf --layout --layout-table-model slanet_wired

# Combined with GPU acceleration
kreuzberg extract document.pdf --layout --acceleration coreml

Table Structure Models v4.5.3

When layout detection identifies a table region, a table structure model analyzes rows, columns, headers, and spanning cells.

Model Config value Size Speed Best for
TATR "tatr" (default) 30 MB Fast General-purpose, consistent results
SLANeXT Wired "slanet_wired" 365 MB Moderate Bordered/gridlined tables
SLANeXT Wireless "slanet_wireless" 365 MB Moderate Borderless tables
SLANeXT Auto "slanet_auto" ~737 MB Slower Mixed documents (auto-classifies per page)
SLANet-plus "slanet_plus" 7.78 MB Fastest Resource-constrained environments

Model Download

SLANeXT models are not downloaded by default. Use cache warm --all-table-models to pre-download, or they download automatically on first use.

GPU Acceleration

Layout detection uses ONNX Runtime with automatic provider selection:

Provider Platform Notes
CPU All Default, no setup needed
CUDA Linux, Windows Requires CUDA toolkit + cuDNN
CoreML macOS Automatic on Apple Silicon
TensorRT Linux Requires TensorRT

To override:

config = ExtractionConfig(
    layout=LayoutDetectionConfig(),
    acceleration=AccelerationConfig(provider="cuda", device_id=0)
)

See AccelerationConfig reference for details.

Layout Classes

The RT-DETR v2 model detects 17 layout classes:

Class Description
Caption Figure or table caption
Footnote Page footnote
Formula Mathematical formula
ListItem List item or bullet point
PageFooter Running page footer
PageHeader Running page header
Picture Image, chart, or diagram
SectionHeader Section heading
Table Tabular data region
Text Body text paragraph
Title Document or page title
DocumentIndex Table of contents
Code Code block
CheckboxSelected Checked checkbox
CheckboxUnselected Unchecked checkbox
Form Form region
KeyValueRegion Key-value pair region

Acknowledgments

  • Docling — RT-DETR v2 model and layout classification approach
  • TATR — Table structure recognition with ONNX
  • PaddleOCR — SLANeXT table structure and PP-LCNet classifier models