Skip to content

Font Configuration Breaking Change (v4.0)

Summary

Custom font provider is now enabled by default for improved PDF performance.

Breaking Change

Previous behavior (v3.x): - Font provider always enabled, not configurable - Used system fonts only - No user control over font loading

New behavior (v4.0): - Font provider enabled by default - Configurable via FontConfig in PdfConfig - Can disable or add custom font directories - ~12-13% faster PDF processing with font caching

Impact

Who is affected? - Users who rely on pdfium's default font fallback behavior - Users who want to disable the custom font provider - Users who need to add custom font directories

What changes? - Default: Custom font provider now active (breaking change) - Performance: PDF extraction 12-13% faster - API: New font_config option in PdfConfig

Migration

For most users, no changes needed. Default behavior provides performance improvements:

use kreuzberg::ExtractionConfig;

// Previous (v4.0) - no font configuration
let config = ExtractionConfig::default();

// Current (v4.0) - same code, now with font provider enabled
let config = ExtractionConfig::default();
// Font provider automatically enabled with system fonts
from kreuzberg import ExtractionConfig

# Previous (v4.0)
config = ExtractionConfig()

# Current (v4.0) - same code, now with font provider enabled
config = ExtractionConfig()
# Font provider automatically enabled with system fonts
import { ExtractionConfig } from 'kreuzberg';

// Previous (v4.0)
const config: ExtractionConfig = {};

// Current (v4.0) - same code, now with font provider enabled
const config: ExtractionConfig = {};
// Font provider automatically enabled with system fonts
import dev.kreuzberg.config.*;

// Previous (v4.0)
ExtractionConfig config = ExtractionConfig.builder().build();

// Current (v4.0) - same code, now with font provider enabled
ExtractionConfig config = ExtractionConfig.builder().build();
// Font provider automatically enabled with system fonts
import "github.com/kreuzberg-dev/kreuzberg/v4"

// Previous (v4.0)
config := &kreuzberg.ExtractionConfig{}

// Current (v4.0) - same code, now with font provider enabled
config := &kreuzberg.ExtractionConfig{}
// Font provider automatically enabled with system fonts
require 'kreuzberg'

# Previous (v4.0)
config = Kreuzberg::ExtractionConfig.new

# Current (v4.0) - same code, now with font provider enabled
config = Kreuzberg::ExtractionConfig.new
# Font provider automatically enabled with system fonts
using Kreuzberg;

// Previous (v4.0)
var config = new ExtractionConfig();

// Current (v4.0) - same code, now with font provider enabled
var config = new ExtractionConfig();
// Font provider automatically enabled with system fonts

Disable Font Provider

If you prefer pdfium's default font handling:

use kreuzberg::{ExtractionConfig, PdfConfig, FontConfig};

let config = ExtractionConfig {
    pdf_options: Some(PdfConfig {
        font_config: Some(FontConfig {
            enabled: false,
            custom_font_dirs: None,
        }),
        ..Default::default()
    }),
    ..Default::default()
};
from kreuzberg import ExtractionConfig, PdfConfig, FontConfig

config = ExtractionConfig(
    pdf_options=PdfConfig(
        font_config=FontConfig(enabled=False)
    )
)
import { ExtractionConfig } from 'kreuzberg';

const config: ExtractionConfig = {
  pdfOptions: {
    fontConfig: {
      enabled: false
    }
  }
};
import dev.kreuzberg.config.*;

FontConfig fontConfig = FontConfig.builder()
    .enabled(false)
    .build();

PdfConfig pdfConfig = PdfConfig.builder()
    .fontConfig(fontConfig)
    .build();

ExtractionConfig config = ExtractionConfig.builder()
    .pdfOptions(pdfConfig)
    .build();
import "github.com/kreuzberg-dev/kreuzberg/v4"

config := &kreuzberg.ExtractionConfig{
    PdfOptions: &kreuzberg.PdfConfig{
        FontConfig: &kreuzberg.FontConfig{
            Enabled: false,
        },
    },
}
require 'kreuzberg'

config = Kreuzberg::ExtractionConfig.new(
  pdf_options: Kreuzberg::PdfConfig.new(
    font_config: Kreuzberg::FontConfig.new(enabled: false)
  )
)
using Kreuzberg;

var fontConfig = new FontConfig { Enabled = false };
var pdfConfig = new PdfConfig { FontConfig = fontConfig };
var config = new ExtractionConfig { PdfOptions = pdfConfig };

Add Custom Font Directories

To use fonts from custom directories (in addition to system fonts):

use kreuzberg::{ExtractionConfig, PdfConfig, FontConfig};
use std::path::PathBuf;

let config = ExtractionConfig {
    pdf_options: Some(PdfConfig {
        font_config: Some(FontConfig {
            enabled: true,
            custom_font_dirs: Some(vec![
                PathBuf::from("/usr/share/fonts/custom"),
                PathBuf::from("~/my-fonts"),  // Tilde expanded automatically
            ]),
        }),
        ..Default::default()
    }),
    ..Default::default()
};
from kreuzberg import ExtractionConfig, PdfConfig, FontConfig

config = ExtractionConfig(
    pdf_options=PdfConfig(
        font_config=FontConfig(
            enabled=True,
            custom_font_dirs=[
                "/usr/share/fonts/custom",
                "~/my-fonts"  # Tilde expanded automatically
            ]
        )
    )
)
import { ExtractionConfig } from 'kreuzberg';

const config: ExtractionConfig = {
  pdfOptions: {
    fontConfig: {
      enabled: true,
      customFontDirs: [
        '/usr/share/fonts/custom',
        '~/my-fonts'  // Tilde expanded automatically
      ]
    }
  }
};
import dev.kreuzberg.config.*;
import java.nio.file.Paths;

FontConfig fontConfig = FontConfig.builder()
    .enabled(true)
    .customFontDirs(Arrays.asList(
        Paths.get("/usr/share/fonts/custom"),
        Paths.get("~/my-fonts")  // Tilde expanded automatically
    ))
    .build();

PdfConfig pdfConfig = PdfConfig.builder()
    .fontConfig(fontConfig)
    .build();

ExtractionConfig config = ExtractionConfig.builder()
    .pdfOptions(pdfConfig)
    .build();
import "github.com/kreuzberg-dev/kreuzberg/v4"

config := &kreuzberg.ExtractionConfig{
    PdfOptions: &kreuzberg.PdfConfig{
        FontConfig: &kreuzberg.FontConfig{
            Enabled: true,
            CustomFontDirs: []string{
                "/usr/share/fonts/custom",
                "~/my-fonts",  // Tilde expanded automatically
            },
        },
    },
}
require 'kreuzberg'

config = Kreuzberg::ExtractionConfig.new(
  pdf_options: Kreuzberg::PdfConfig.new(
    font_config: Kreuzberg::FontConfig.new(
      enabled: true,
      custom_font_dirs: [
        '/usr/share/fonts/custom',
        '~/my-fonts'  # Tilde expanded automatically
      ]
    )
  )
)
using Kreuzberg;

var fontConfig = new FontConfig
{
    Enabled = true,
    CustomFontDirs = new[]
    {
        "/usr/share/fonts/custom",
        "~/my-fonts"  // Tilde expanded automatically
    }
};

var pdfConfig = new PdfConfig { FontConfig = fontConfig };
var config = new ExtractionConfig { PdfOptions = pdfConfig };

Configuration Files

TOML Format

Font Configuration in TOML
[pdf_options.font_config]
enabled = true
custom_font_dirs = ["/usr/share/fonts/custom", "~/my-fonts"]

YAML Format

Font Configuration in YAML
pdf_options:
  font_config:
    enabled: true
    custom_font_dirs:
      - /usr/share/fonts/custom
      - ~/my-fonts

JSON Format

Font Configuration in JSON
{
  "pdf_options": {
    "font_config": {
      "enabled": true,
      "custom_font_dirs": ["/usr/share/fonts/custom", "~/my-fonts"]
    }
  }
}

Path Handling

The font configuration automatically handles:

  • Tilde expansion: ~/fonts/Users/username/fonts
  • Relative paths: ./fonts/absolute/path/to/fonts
  • Symlinks: Resolved to canonical paths (security measure)
  • Validation: Directories must exist; warnings logged if not found
  • Graceful degradation: Missing directories don't cause failures

Global Configuration

Important: Font configuration is global per process and must be set before the first PDF extraction.

// CORRECT: Set config before first extraction
let config = ExtractionConfig {
    pdf_options: Some(PdfConfig {
        font_config: Some(FontConfig {
            enabled: true,
            custom_font_dirs: Some(vec![
                PathBuf::from("/usr/share/fonts/custom"),
            ]),
        }),
        ..Default::default()
    }),
    ..Default::default()
};

let result = kreuzberg::extract_file("document.pdf", &config)?;

// INCORRECT: Attempting to change config after first extraction
let new_config = ExtractionConfig {
    pdf_options: Some(PdfConfig {
        font_config: Some(FontConfig {
            enabled: false,
            custom_font_dirs: None,
        }),
        ..Default::default()
    }),
    ..Default::default()
};
let result2 = kreuzberg::extract_file("document2.pdf", &new_config)?;
// Warning logged: "Font config already initialized"
# CORRECT: Set config before first extraction
config = ExtractionConfig(
    pdf_options=PdfConfig(
        font_config=FontConfig(
            enabled=True,
            custom_font_dirs=["/usr/share/fonts/custom"]
        )
    )
)
result = extract_file("document.pdf", config)

# INCORRECT: Attempting to change config after first extraction
new_config = ExtractionConfig(
    pdf_options=PdfConfig(
        font_config=FontConfig(enabled=False)
    )
)
result2 = extract_file("document2.pdf", new_config)
# Warning logged: "Font config already initialized"
// CORRECT: Set config before first extraction
const config: ExtractionConfig = {
  pdfOptions: {
    fontConfig: {
      enabled: true,
      customFontDirs: ['/usr/share/fonts/custom']
    }
  }
};
const result = await extractFile('document.pdf', config);

// INCORRECT: Attempting to change config after first extraction
const newConfig: ExtractionConfig = {
  pdfOptions: {
    fontConfig: { enabled: false }
  }
};
const result2 = await extractFile('document2.pdf', newConfig);
// Warning logged: "Font config already initialized"
// CORRECT: Set config before first extraction
FontConfig fontConfig = FontConfig.builder()
    .enabled(true)
    .customFontDirs(Arrays.asList(Paths.get("/usr/share/fonts/custom")))
    .build();
PdfConfig pdfConfig = PdfConfig.builder()
    .fontConfig(fontConfig)
    .build();
ExtractionConfig config = ExtractionConfig.builder()
    .pdfOptions(pdfConfig)
    .build();
ExtractionResult result = Kreuzberg.extractFile("document.pdf", config);

// INCORRECT: Attempting to change config after first extraction
FontConfig newFontConfig = FontConfig.builder()
    .enabled(false)
    .build();
PdfConfig newPdfConfig = PdfConfig.builder()
    .fontConfig(newFontConfig)
    .build();
ExtractionConfig newConfig = ExtractionConfig.builder()
    .pdfOptions(newPdfConfig)
    .build();
ExtractionResult result2 = Kreuzberg.extractFile("document2.pdf", newConfig);
// Warning logged: "Font config already initialized"
// CORRECT: Set config before first extraction
config := &kreuzberg.ExtractionConfig{
    PdfOptions: &kreuzberg.PdfConfig{
        FontConfig: &kreuzberg.FontConfig{
            Enabled: true,
            CustomFontDirs: []string{"/usr/share/fonts/custom"},
        },
    },
}
result, _ := kreuzberg.ExtractFile("document.pdf", config)

// INCORRECT: Attempting to change config after first extraction
newConfig := &kreuzberg.ExtractionConfig{
    PdfOptions: &kreuzberg.PdfConfig{
        FontConfig: &kreuzberg.FontConfig{
            Enabled: false,
        },
    },
}
result2, _ := kreuzberg.ExtractFile("document2.pdf", newConfig)
// Warning logged: "Font config already initialized"
# CORRECT: Set config before first extraction
config = Kreuzberg::ExtractionConfig.new(
  pdf_options: Kreuzberg::PdfConfig.new(
    font_config: Kreuzberg::FontConfig.new(
      enabled: true,
      custom_font_dirs: ['/usr/share/fonts/custom']
    )
  )
)
result = Kreuzberg.extract_file('document.pdf', config)

# INCORRECT: Attempting to change config after first extraction
new_config = Kreuzberg::ExtractionConfig.new(
  pdf_options: Kreuzberg::PdfConfig.new(
    font_config: Kreuzberg::FontConfig.new(enabled: false)
  )
)
result2 = Kreuzberg.extract_file('document2.pdf', new_config)
# Warning logged: "Font config already initialized"
// CORRECT: Set config before first extraction
var fontConfig = new FontConfig
{
    Enabled = true,
    CustomFontDirs = new[] { "/usr/share/fonts/custom" }
};
var pdfConfig = new PdfConfig { FontConfig = fontConfig };
var config = new ExtractionConfig { PdfOptions = pdfConfig };
var result = Kreuzberg.ExtractFile("document.pdf", config);

// INCORRECT: Attempting to change config after first extraction
var newFontConfig = new FontConfig { Enabled = false };
var newPdfConfig = new PdfConfig { FontConfig = newFontConfig };
var newConfig = new ExtractionConfig { PdfOptions = newPdfConfig };
var result2 = Kreuzberg.ExtractFile("document2.pdf", newConfig);
// Warning logged: "Font config already initialized"

Performance Impact

With default settings (enabled=true, system fonts):

  • PDF extraction: ~12-13% faster
  • Memory: Minimal increase (~100KB for font cache)
  • Startup: Lazy initialization (no overhead for non-PDF workloads)

Troubleshooting

Custom fonts not working

Symptom: PDF still uses fallback fonts

Solutions: 1. Verify directories exist and contain .ttf/.otf/.ttc files 2. Check logs for "Custom font directory not found" warnings 3. Ensure paths are absolute or properly expanded 4. Verify font files are readable

"Font config already initialized" warning

Symptom: Configuration changes ignored after first PDF extraction

Solution: Set FontConfig in the first ExtractionConfig used. Subsequent config changes are not supported (global limitation).

Performance regression

Symptom: PDF extraction slower after upgrade

Solution: This is unexpected. Please report as a bug with: - PDF sample (if shareable) - Benchmark comparison (before/after) - Configuration used

Questions?

  • Issue tracker: https://github.com/kreuzberg-dev/kreuzberg/issues
  • Discussions: https://github.com/kreuzberg-dev/kreuzberg/discussions