Layout Detection v4.5.0¶

Detect document layout regions (tables, figures, headers, text blocks, etc.) in PDFs using ONNX-based deep learning models. Enables table extraction, figure isolation, reading-order reconstruction, and selective OCR.

Feature gate

Requires the layout-detection Cargo feature. Not included in the default feature set.

Model¶

Layout detection uses the RT-DETR v2 model (17 layout classes), an ONNX-based deep learning model for accurate document layout analysis.

When to Enable¶

Recommended for: complex multi-column PDFs, scanned documents, academic papers, business forms, documents where table extraction quality matters.

Less beneficial for: simple single-column text, high-throughput pipelines where latency is critical (consider GPU), documents already well-handled by the PDF structure tree.

Performance Impact¶

Pipeline	Structure F1	Text F1	Avg time/doc
Baseline	33.9%	87.4%	447 ms
Layout	41.1%	90.1%	1500 ms

171-document PDF corpus, CPU only. GPU acceleration significantly reduces the time penalty.

preset removed

The preset field ("fast" / "accurate") was removed from LayoutDetectionConfig. If it appears in a config file it is silently ignored. Only the RT-DETR v2 model is used for layout detection.

Configuration¶

PythonTypeScriptRustTOML

from kreuzberg import ExtractionConfig, LayoutDetectionConfig, extract_file

config = ExtractionConfig(
    layout=LayoutDetectionConfig(
        confidence_threshold=0.5,
        apply_heuristics=True,
        table_model="tatr",
    )
)
result = await extract_file("document.pdf", config=config)

const result = await extract("document.pdf", {
  layout: {
    confidenceThreshold: 0.5,
    applyHeuristics: true,
    tableModel: "tatr",
  },
});

use kreuzberg::core::{ExtractionConfig, LayoutDetectionConfig};

let config = ExtractionConfig {
    layout: Some(LayoutDetectionConfig {
        confidence_threshold: Some(0.5),
        apply_heuristics: true,
        table_model: Some("tatr".to_string()),
        ..Default::default()
    }),
    ..Default::default()
};

kreuzberg.toml

[layout]
apply_heuristics = true
# table_model = "tatr"

Table Structure Models v4.5.3¶

When layout detection identifies a table region, a table structure model analyzes rows, columns, headers, and spanning cells.

Model	Config value	Size	Speed	Best for
TATR	`"tatr"` (default)	30 MB	Fast	General-purpose, consistent results
SLANeXT Wired	`"slanet_wired"`	365 MB	Moderate	Bordered/gridlined tables
SLANeXT Wireless	`"slanet_wireless"`	365 MB	Moderate	Borderless tables
SLANeXT Auto	`"slanet_auto"`	~737 MB	Slower	Mixed documents (auto-classifies per page)
SLANet-plus	`"slanet_plus"`	7.78 MB	Fastest	Resource-constrained environments

Model Download

SLANeXT models are not downloaded by default. Use cache warm --all-table-models to pre-download, or they download automatically on first use.

GPU Acceleration¶

Layout detection uses ONNX Runtime with automatic provider selection:

Provider	Platform	Notes
CPU	All	Default, no setup needed
CUDA	Linux, Windows	Requires CUDA toolkit + cuDNN
CoreML	macOS	Automatic on Apple Silicon
TensorRT	Linux	Requires TensorRT

To override:

config = ExtractionConfig(
    layout=LayoutDetectionConfig(),
    acceleration=AccelerationConfig(provider="cuda", device_id=0)
)

See AccelerationConfig reference for details.

Layout Classes¶

The RT-DETR v2 model detects 17 layout classes:

Class	Description
`Caption`	Figure or table caption
`Footnote`	Page footnote
`Formula`	Mathematical formula
`ListItem`	List item or bullet point
`PageFooter`	Running page footer
`PageHeader`	Running page header
`Picture`	Image, chart, or diagram
`SectionHeader`	Section heading
`Table`	Tabular data region
`Text`	Body text paragraph
`Title`	Document or page title
`DocumentIndex`	Table of contents
`Code`	Code block
`CheckboxSelected`	Checked checkbox
`CheckboxUnselected`	Unchecked checkbox
`Form`	Form region
`KeyValueRegion`	Key-value pair region

Acknowledgments¶

Docling — RT-DETR v2 model and layout classification approach
TATR — Table structure recognition with ONNX
PaddleOCR — SLANeXT table structure and PP-LCNet classifier models

Configuration Reference — full field reference
Element-Based Output — using layout-aware results