Font Configuration Breaking Change (v4.0)¶
Summary¶
Custom font provider is now enabled by default for improved PDF performance.
Breaking Change¶
Previous behavior (v3.x):
- Font provider always enabled, not configurable
- Used system fonts only
- No user control over font loading
New behavior (v4.0):
- Font provider enabled by default
- Configurable via
FontConfiginPdfConfig - Can disable or add custom font directories
- ~12-13% faster PDF processing with font caching
Impact¶
Who is affected?
- Users who rely on pdfium's default font fallback behavior
- Users who want to disable the custom font provider
- Users who need to add custom font directories
What changes?
- Default: Custom font provider now active (breaking change)
- Performance: PDF extraction 12-13% faster
- API: New
font_configoption inPdfConfig
Migration¶
No Action Required (Recommended)¶
For most users, no changes needed. Default behavior provides performance improvements:
Disable Font Provider¶
If you prefer pdfium's default font handling:
Add Custom Font Directories¶
To use fonts from custom directories (in addition system fonts):
use kreuzberg::{ExtractionConfig, PdfConfig, FontConfig};
use std::path::PathBuf;
let config = ExtractionConfig {
pdf_options: Some(PdfConfig {
font_config: Some(FontConfig {
enabled: true,
custom_font_dirs: Some(vec![
PathBuf::from("/usr/share/fonts/custom"),
PathBuf::from("~/my-fonts"), // Tilde expanded automatically
]),
}),
..Default::default()
}),
..Default::default()
};
import dev.kreuzberg.config.*;
import java.nio.file.Paths;
FontConfig fontConfig = FontConfig.builder()
.enabled(true)
.customFontDirs(Arrays.asList(
Paths.get("/usr/share/fonts/custom"),
Paths.get("~/my-fonts") // Tilde expanded automatically
))
.build();
PdfConfig pdfConfig = PdfConfig.builder()
.fontConfig(fontConfig)
.build();
ExtractionConfig config = ExtractionConfig.builder()
.pdfOptions(pdfConfig)
.build();
Configuration Files¶
TOML Format¶
[pdf_options.font_config]
enabled = true
custom_font_dirs = ["/usr/share/fonts/custom", "~/my-fonts"]
YAML Format¶
pdf_options:
font_config:
enabled: true
custom_font_dirs:
- /usr/share/fonts/custom
- ~/my-fonts
JSON Format¶
{
"pdf_options": {
"font_config": {
"enabled": true,
"custom_font_dirs": ["/usr/share/fonts/custom", "~/my-fonts"]
}
}
}
Path Handling¶
The font configuration automatically handles:
- Tilde expansion:
~/fonts→/Users/username/fonts - Relative paths:
./fonts→/absolute/path/to/fonts - Symlinks: Resolved to canonical paths (security measure)
- Validation: Directories must exist; warnings logged if not found
- Graceful degradation: Missing directories don't cause failures
Global Configuration¶
Important: Font configuration is global per process and must be set before the first PDF extraction.
// CORRECT: Set config before first extraction
let config = ExtractionConfig {
pdf_options: Some(PdfConfig {
font_config: Some(FontConfig {
enabled: true,
custom_font_dirs: Some(vec![
PathBuf::from("/usr/share/fonts/custom"),
]),
}),
..Default::default()
}),
..Default::default()
};
let result = kreuzberg::extract_file("document.pdf", &config)?;
// INCORRECT: Attempting to change config after first extraction
let new_config = ExtractionConfig {
pdf_options: Some(PdfConfig {
font_config: Some(FontConfig {
enabled: false,
custom_font_dirs: None,
}),
..Default::default()
}),
..Default::default()
};
let result2 = kreuzberg::extract_file("document2.pdf", &new_config)?;
// Warning logged: "Font config already initialized"
# CORRECT: Set config before first extraction
config = ExtractionConfig(
pdf_options=PdfConfig(
font_config=FontConfig(
enabled=True,
custom_font_dirs=["/usr/share/fonts/custom"]
)
)
)
result = extract_file("document.pdf", config)
# INCORRECT: Attempting to change config after first extraction
new_config = ExtractionConfig(
pdf_options=PdfConfig(
font_config=FontConfig(enabled=False)
)
)
result2 = extract_file("document2.pdf", new_config)
# Warning logged: "Font config already initialized"
// CORRECT: Set config before first extraction
const config: ExtractionConfig = {
pdfOptions: {
fontConfig: {
enabled: true,
customFontDirs: ['/usr/share/fonts/custom']
}
}
};
const result = await extractFile('document.pdf', config);
// INCORRECT: Attempting to change config after first extraction
const newConfig: ExtractionConfig = {
pdfOptions: {
fontConfig: { enabled: false }
}
};
const result2 = await extractFile('document2.pdf', newConfig);
// Warning logged: "Font config already initialized"
// CORRECT: Set config before first extraction
FontConfig fontConfig = FontConfig.builder()
.enabled(true)
.customFontDirs(Arrays.asList(Paths.get("/usr/share/fonts/custom")))
.build();
PdfConfig pdfConfig = PdfConfig.builder()
.fontConfig(fontConfig)
.build();
ExtractionConfig config = ExtractionConfig.builder()
.pdfOptions(pdfConfig)
.build();
ExtractionResult result = Kreuzberg.extractFile("document.pdf", config);
// INCORRECT: Attempting to change config after first extraction
FontConfig newFontConfig = FontConfig.builder()
.enabled(false)
.build();
PdfConfig newPdfConfig = PdfConfig.builder()
.fontConfig(newFontConfig)
.build();
ExtractionConfig newConfig = ExtractionConfig.builder()
.pdfOptions(newPdfConfig)
.build();
ExtractionResult result2 = Kreuzberg.extractFile("document2.pdf", newConfig);
// Warning logged: "Font config already initialized"
// CORRECT: Set config before first extraction
config := &kreuzberg.ExtractionConfig{
PdfOptions: &kreuzberg.PdfConfig{
FontConfig: &kreuzberg.FontConfig{
Enabled: true,
CustomFontDirs: []string{"/usr/share/fonts/custom"},
},
},
}
result, _ := kreuzberg.ExtractFile("document.pdf", config)
// INCORRECT: Attempting to change config after first extraction
newConfig := &kreuzberg.ExtractionConfig{
PdfOptions: &kreuzberg.PdfConfig{
FontConfig: &kreuzberg.FontConfig{
Enabled: false,
},
},
}
result2, _ := kreuzberg.ExtractFile("document2.pdf", newConfig)
// Warning logged: "Font config already initialized"
# CORRECT: Set config before first extraction
config = Kreuzberg::ExtractionConfig.new(
pdf_options: Kreuzberg::PdfConfig.new(
font_config: Kreuzberg::FontConfig.new(
enabled: true,
custom_font_dirs: ['/usr/share/fonts/custom']
)
)
)
result = Kreuzberg.extract_file('document.pdf', config)
# INCORRECT: Attempting to change config after first extraction
new_config = Kreuzberg::ExtractionConfig.new(
pdf_options: Kreuzberg::PdfConfig.new(
font_config: Kreuzberg::FontConfig.new(enabled: false)
)
)
result2 = Kreuzberg.extract_file('document2.pdf', new_config)
# Warning logged: "Font config already initialized"
// CORRECT: Set config before first extraction
var fontConfig = new FontConfig
{
Enabled = true,
CustomFontDirs = new[] { "/usr/share/fonts/custom" }
};
var pdfConfig = new PdfConfig { FontConfig = fontConfig };
var config = new ExtractionConfig { PdfOptions = pdfConfig };
var result = Kreuzberg.ExtractFile("document.pdf", config);
// INCORRECT: Attempting to change config after first extraction
var newFontConfig = new FontConfig { Enabled = false };
var newPdfConfig = new PdfConfig { FontConfig = newFontConfig };
var newConfig = new ExtractionConfig { PdfOptions = newPdfConfig };
var result2 = Kreuzberg.ExtractFile("document2.pdf", newConfig);
// Warning logged: "Font config already initialized"
Performance Impact¶
With default settings (enabled=true, system fonts):
- PDF extraction: ~12-13% faster
- Memory: Minimal increase (~100KB for font cache)
- Startup: Lazy initialization (no overhead for non-PDF workloads)
Troubleshooting¶
Custom fonts not working¶
Symptom: PDF still uses fallback fonts
Solutions:
- Verify directories exist and contain .ttf/.otf/.ttc files
- Check logs for "Custom font directory not found" warnings
- Ensure paths are absolute or properly expanded
- Verify font files are readable
"Font config already initialized" warning¶
Symptom: Configuration changes ignored after first PDF extraction
Solution: Set FontConfig in the first ExtractionConfig used. Subsequent config changes are not supported (global limitation).
Performance regression¶
Symptom: PDF extraction slower after upgrade
Solution: This is unexpected. Please report as a bug with:
- PDF sample (if shareable)
- Benchmark comparison (before/after)
- Configuration used
Questions?¶
- Issue tracker: https://github.com/kreuzberg-dev/kreuzberg/issues
- Discussions: https://github.com/kreuzberg-dev/kreuzberg/discussions