Layout Detection v4.5.0¶

Detect document layout regions (tables, figures, headers, text blocks, etc.) in PDFs using ONNX-based deep learning models. Enables table extraction, figure isolation, reading-order reconstruction, and selective OCR.

!!! Note "Feature gate" Requires the layout-detection Cargo feature. Not included in the default feature set.

Model¶

Layout detection uses the RT-DETR v2 model, an ONNX-based deep learning model that detects 17 layout element classes: text blocks, tables, figures, headers, footers, captions, code, lists, sections, formulas, footnotes, page headers/footers, titles, checkboxes, key-value regions, and document indices.

When to Enable¶

Recommended for: complex multi-column PDFs, scanned documents, academic papers, business forms, and any document where layout understanding improves extraction accuracy.

Less beneficial for: simple single-column text documents, high-throughput pipelines where latency is critical (consider GPU acceleration), or documents already well-handled by PDF structure trees.

Performance Impact¶

Pipeline	Structure F1	Text F1	Avg time/doc
Baseline	33.9%	87.4%	447 ms
Layout	41.1%	90.1%	1500 ms

171-document PDF corpus, CPU only. GPU acceleration significantly reduces the time penalty.

!!! Note "Layout Detection Model" Kreuzberg uses only the RT-DETR v2 model for layout detection. The preset field is not available in LayoutDetectionConfig. Configure table structure recognition separately via table_model — see "Table Structure Models" below.

Configuration¶

PythonTypeScriptRustTOMLCLI

from kreuzberg import ExtractionConfig, LayoutDetectionConfig, extract_file

config = ExtractionConfig(
    layout=LayoutDetectionConfig(
        confidence_threshold=0.5,
        apply_heuristics=True,
        table_model="tatr",
    )
)
result = await extract_file("document.pdf", config=config)

const result = await extract("document.pdf", {
  layout: {
    confidenceThreshold: 0.5,
    applyHeuristics: true,
    tableModel: "tatr",
  },
});

use kreuzberg::core::{ExtractionConfig, LayoutDetectionConfig};

let config = ExtractionConfig {
    layout: Some(LayoutDetectionConfig {
        confidence_threshold: Some(0.5),
        apply_heuristics: true,
        table_model: Some("tatr".to_string()),
        ..Default::default()
    }),
    ..Default::default()
};

kreuzberg.toml

[layout]
apply_heuristics = true
# table_model = "tatr"

Terminal

# Enable layout detection with default settings
kreuzberg extract document.pdf --layout --content-format markdown

# Custom confidence threshold
kreuzberg extract document.pdf --layout-confidence 0.5 --content-format markdown

# Specific table model
kreuzberg extract document.pdf --layout --layout-table-model slanet_wired

# Combined with GPU acceleration
kreuzberg extract document.pdf --layout --acceleration coreml

See LayoutDetectionConfig for all fields.

Table Structure Models v4.5.3¶

When layout detection identifies a table region, a table structure model analyzes rows, columns, headers, and spanning cells. Set LayoutDetectionConfig.table_model to one of:

Value	Notes
`tatr`	Default. Fast (~30 MB). General-purpose.
`slanet_wired`	Higher accuracy for bordered/gridlined tables (~365 MB).
`slanet_wireless`	Higher accuracy for borderless tables (~365 MB).
`slanet_auto`	Auto-classifies per page (~737 MB). Slowest.
`slanet_plus`	Smallest (~7.78 MB). For resource-constrained environments.
`disabled`	Skip table structure recognition.

!!! Note "Model Download" SLANeXT models are not downloaded by default. Use cache warm --all-table-models to pre-download, or they download automatically on first use.

GPU Acceleration¶

Layout detection uses ONNX Runtime with automatic provider selection:

Provider	Platform	Notes
CPU	All	Default, no setup needed
CUDA	Linux, Windows	Requires CUDA toolkit + cuDNN
CoreML	macOS	Automatic on Apple Silicon
TensorRT	Linux	Requires TensorRT

To override:

config = ExtractionConfig(
    layout=LayoutDetectionConfig(),
    acceleration=AccelerationConfig(provider="cuda", device_id=0)
)

See AccelerationConfig reference for details.

Layout Classes¶

The RT-DETR v2 model detects 17 classes. Each LayoutRegion.class_name is one of:

caption, footnote, formula, list_item, page_footer, page_header, picture, section_header, table, text, title, document_index, code, checkbox_selected, checkbox_unselected, form, key_value_region.

See LayoutRegion in the types reference for the full field shape.

Accessing Layout Regions¶

When layout detection is enabled AND page extraction is enabled, each page in the result includes layout_regions — a list of detected regions with class, confidence score, bounding box, and area fraction. This enables programmatic filtering and analysis of specific layout elements.

PythonTypeScriptRust

from kreuzberg import extract_file, ExtractionConfig, LayoutDetectionConfig, PagesConfig

result = await extract_file(
    "document.pdf",
    config=ExtractionConfig(
        layout=LayoutDetectionConfig(),
        pages=PagesConfig(extract_pages=True),
    ),
)

for page in result.pages:
    if page.layout_regions:
        for region in page.layout_regions:
            if region.class_name == "picture" and region.confidence > 0.9:
                print(f"Page {page.page_number}: diagram detected "
                      f"(confidence={region.confidence:.2f}, "
                      f"area={region.area_fraction:.0%})")

const result = await extract("document.pdf", {
  layout: {},
  pages: { extractPages: true },
});

for (const page of result.pages ?? []) {
  if (page.layoutRegions) {
    for (const region of page.layoutRegions) {
      if (region.className === "picture" && region.confidence > 0.9) {
        console.log(
          `Page ${page.pageNumber}: diagram detected ` +
          `(confidence=${region.confidence.toFixed(2)}, ` +
          `area=${(region.areaFraction * 100).toFixed(0)}%)`
        );
      }
    }
  }
}

use kreuzberg::core::{ExtractionConfig, LayoutDetectionConfig, PagesConfig};

let result = extract_file(
    "document.pdf",
    ExtractionConfig {
        layout: Some(LayoutDetectionConfig::default()),
        pages: Some(PagesConfig {
            extract_pages: true,
            ..Default::default()
        }),
        ..Default::default()
    },
).await?;

for page in &result.pages {
    if let Some(regions) = &page.layout_regions {
        for region in regions {
            if region.class_name == "picture" && region.confidence > 0.9 {
                println!(
                    "Page {}: diagram detected (confidence={:.2}, area={:.0}%)",
                    page.page_number,
                    region.confidence,
                    region.area_fraction * 100.0
                );
            }
        }
    }
}

Tips¶

Use confidence to filter low-confidence detections — typically ≥ 0.8–0.9 for downstream operations
Use area_fraction to distinguish between inline images and full-page diagrams (e.g., area_fraction > 0.1 for significant figures)
Regions are independent of page extraction — enable both to access both content and layout structure
Available across all bindings (Python, TypeScript, Rust, Ruby, Java, Go, Elixir, C#, PHP)

Acknowledgments¶

Docling — RT-DETR v2 model and layout classification approach
TATR — Table structure recognition with ONNX
PaddleOCR — SLANeXT table structure and PP-LCNet classifier models

Configuration Reference — full field reference
Element-Based Output — using layout-aware results

Edit this page on GitHub