Font Configuration Breaking Change (v4.0)¶
Summary¶
Custom font provider is now enabled by default for improved PDF performance.
Breaking Change¶
Previous behavior (v3.x): - Font provider always enabled, not configurable - Used system fonts only - No user control over font loading
New behavior (v4.0): - Font provider enabled by default - Configurable via FontConfig in PdfConfig - Can disable or add custom font directories - ~12-13% faster PDF processing with font caching
Impact¶
Who is affected? - Users who rely on pdfium's default font fallback behavior - Users who want to disable the custom font provider - Users who need to add custom font directories
What changes? - Default: Custom font provider now active (breaking change) - Performance: PDF extraction 12-13% faster - API: New font_config option in PdfConfig
Migration¶
No Action Required (Recommended)¶
For most users, no changes needed. Default behavior provides performance improvements:
Disable Font Provider¶
If you prefer pdfium's default font handling:
Add Custom Font Directories¶
To use fonts from custom directories (in addition to system fonts):
use kreuzberg::{ExtractionConfig, PdfConfig, FontConfig};
use std::path::PathBuf;
let config = ExtractionConfig {
pdf_options: Some(PdfConfig {
font_config: Some(FontConfig {
enabled: true,
custom_font_dirs: Some(vec![
PathBuf::from("/usr/share/fonts/custom"),
PathBuf::from("~/my-fonts"), // Tilde expanded automatically
]),
}),
..Default::default()
}),
..Default::default()
};
import dev.kreuzberg.config.*;
import java.nio.file.Paths;
FontConfig fontConfig = FontConfig.builder()
.enabled(true)
.customFontDirs(Arrays.asList(
Paths.get("/usr/share/fonts/custom"),
Paths.get("~/my-fonts") // Tilde expanded automatically
))
.build();
PdfConfig pdfConfig = PdfConfig.builder()
.fontConfig(fontConfig)
.build();
ExtractionConfig config = ExtractionConfig.builder()
.pdfOptions(pdfConfig)
.build();
Configuration Files¶
TOML Format¶
[pdf_options.font_config]
enabled = true
custom_font_dirs = ["/usr/share/fonts/custom", "~/my-fonts"]
YAML Format¶
pdf_options:
font_config:
enabled: true
custom_font_dirs:
- /usr/share/fonts/custom
- ~/my-fonts
JSON Format¶
{
"pdf_options": {
"font_config": {
"enabled": true,
"custom_font_dirs": ["/usr/share/fonts/custom", "~/my-fonts"]
}
}
}
Path Handling¶
The font configuration automatically handles:
- Tilde expansion:
~/fonts→/Users/username/fonts - Relative paths:
./fonts→/absolute/path/to/fonts - Symlinks: Resolved to canonical paths (security measure)
- Validation: Directories must exist; warnings logged if not found
- Graceful degradation: Missing directories don't cause failures
Global Configuration¶
Important: Font configuration is global per process and must be set before the first PDF extraction.
// CORRECT: Set config before first extraction
let config = ExtractionConfig {
pdf_options: Some(PdfConfig {
font_config: Some(FontConfig {
enabled: true,
custom_font_dirs: Some(vec![
PathBuf::from("/usr/share/fonts/custom"),
]),
}),
..Default::default()
}),
..Default::default()
};
let result = kreuzberg::extract_file("document.pdf", &config)?;
// INCORRECT: Attempting to change config after first extraction
let new_config = ExtractionConfig {
pdf_options: Some(PdfConfig {
font_config: Some(FontConfig {
enabled: false,
custom_font_dirs: None,
}),
..Default::default()
}),
..Default::default()
};
let result2 = kreuzberg::extract_file("document2.pdf", &new_config)?;
// Warning logged: "Font config already initialized"
# CORRECT: Set config before first extraction
config = ExtractionConfig(
pdf_options=PdfConfig(
font_config=FontConfig(
enabled=True,
custom_font_dirs=["/usr/share/fonts/custom"]
)
)
)
result = extract_file("document.pdf", config)
# INCORRECT: Attempting to change config after first extraction
new_config = ExtractionConfig(
pdf_options=PdfConfig(
font_config=FontConfig(enabled=False)
)
)
result2 = extract_file("document2.pdf", new_config)
# Warning logged: "Font config already initialized"
// CORRECT: Set config before first extraction
const config: ExtractionConfig = {
pdfOptions: {
fontConfig: {
enabled: true,
customFontDirs: ['/usr/share/fonts/custom']
}
}
};
const result = await extractFile('document.pdf', config);
// INCORRECT: Attempting to change config after first extraction
const newConfig: ExtractionConfig = {
pdfOptions: {
fontConfig: { enabled: false }
}
};
const result2 = await extractFile('document2.pdf', newConfig);
// Warning logged: "Font config already initialized"
// CORRECT: Set config before first extraction
FontConfig fontConfig = FontConfig.builder()
.enabled(true)
.customFontDirs(Arrays.asList(Paths.get("/usr/share/fonts/custom")))
.build();
PdfConfig pdfConfig = PdfConfig.builder()
.fontConfig(fontConfig)
.build();
ExtractionConfig config = ExtractionConfig.builder()
.pdfOptions(pdfConfig)
.build();
ExtractionResult result = Kreuzberg.extractFile("document.pdf", config);
// INCORRECT: Attempting to change config after first extraction
FontConfig newFontConfig = FontConfig.builder()
.enabled(false)
.build();
PdfConfig newPdfConfig = PdfConfig.builder()
.fontConfig(newFontConfig)
.build();
ExtractionConfig newConfig = ExtractionConfig.builder()
.pdfOptions(newPdfConfig)
.build();
ExtractionResult result2 = Kreuzberg.extractFile("document2.pdf", newConfig);
// Warning logged: "Font config already initialized"
// CORRECT: Set config before first extraction
config := &kreuzberg.ExtractionConfig{
PdfOptions: &kreuzberg.PdfConfig{
FontConfig: &kreuzberg.FontConfig{
Enabled: true,
CustomFontDirs: []string{"/usr/share/fonts/custom"},
},
},
}
result, _ := kreuzberg.ExtractFile("document.pdf", config)
// INCORRECT: Attempting to change config after first extraction
newConfig := &kreuzberg.ExtractionConfig{
PdfOptions: &kreuzberg.PdfConfig{
FontConfig: &kreuzberg.FontConfig{
Enabled: false,
},
},
}
result2, _ := kreuzberg.ExtractFile("document2.pdf", newConfig)
// Warning logged: "Font config already initialized"
# CORRECT: Set config before first extraction
config = Kreuzberg::ExtractionConfig.new(
pdf_options: Kreuzberg::PdfConfig.new(
font_config: Kreuzberg::FontConfig.new(
enabled: true,
custom_font_dirs: ['/usr/share/fonts/custom']
)
)
)
result = Kreuzberg.extract_file('document.pdf', config)
# INCORRECT: Attempting to change config after first extraction
new_config = Kreuzberg::ExtractionConfig.new(
pdf_options: Kreuzberg::PdfConfig.new(
font_config: Kreuzberg::FontConfig.new(enabled: false)
)
)
result2 = Kreuzberg.extract_file('document2.pdf', new_config)
# Warning logged: "Font config already initialized"
// CORRECT: Set config before first extraction
var fontConfig = new FontConfig
{
Enabled = true,
CustomFontDirs = new[] { "/usr/share/fonts/custom" }
};
var pdfConfig = new PdfConfig { FontConfig = fontConfig };
var config = new ExtractionConfig { PdfOptions = pdfConfig };
var result = Kreuzberg.ExtractFile("document.pdf", config);
// INCORRECT: Attempting to change config after first extraction
var newFontConfig = new FontConfig { Enabled = false };
var newPdfConfig = new PdfConfig { FontConfig = newFontConfig };
var newConfig = new ExtractionConfig { PdfOptions = newPdfConfig };
var result2 = Kreuzberg.ExtractFile("document2.pdf", newConfig);
// Warning logged: "Font config already initialized"
Performance Impact¶
With default settings (enabled=true, system fonts):
- PDF extraction: ~12-13% faster
- Memory: Minimal increase (~100KB for font cache)
- Startup: Lazy initialization (no overhead for non-PDF workloads)
Troubleshooting¶
Custom fonts not working¶
Symptom: PDF still uses fallback fonts
Solutions: 1. Verify directories exist and contain .ttf/.otf/.ttc files 2. Check logs for "Custom font directory not found" warnings 3. Ensure paths are absolute or properly expanded 4. Verify font files are readable
"Font config already initialized" warning¶
Symptom: Configuration changes ignored after first PDF extraction
Solution: Set FontConfig in the first ExtractionConfig used. Subsequent config changes are not supported (global limitation).
Performance regression¶
Symptom: PDF extraction slower after upgrade
Solution: This is unexpected. Please report as a bug with: - PDF sample (if shareable) - Benchmark comparison (before/after) - Configuration used
Questions?¶
- Issue tracker: https://github.com/kreuzberg-dev/kreuzberg/issues
- Discussions: https://github.com/kreuzberg-dev/kreuzberg/discussions