File Size Limits¶
Kreuzberg enforces size limits on file uploads and API requests to manage server resources effectively. This page documents the default limits, how to configure them, and recommendations for optimal performance.
Overview¶
File size limits protect your server from resource exhaustion and unexpected memory spikes. The Kreuzberg API implements two complementary limit types:
| Limit Type | Purpose | Default |
|---|---|---|
| Request Body Limit | Total size of all files in a single request | 100 MB |
| Multipart Field Limit | Maximum size of an individual file | 100 MB |
Both limits are configurable via environment variables (KREUZBERG_MAX_REQUEST_BODY_BYTES, KREUZBERG_MAX_MULTIPART_FIELD_BYTES, or legacy KREUZBERG_MAX_UPLOAD_SIZE_MB) or programmatically via the ApiSizeLimits type.
Default Configuration¶
Default Limits: 100 MB¶
The default configuration allows:
- Total request body: 100 MB (104,857,600 bytes)
- Individual file: 100 MB (104,857,600 bytes)
These defaults are suitable for typical document processing workloads including: - Standard PDF documents and scanned pages - Office documents (Word, Excel, PowerPoint) - High-resolution images - Single document uploads and small batches
When to Increase¶
Increase limits if you need to process:
- Large scanned document archives (200+ MB)
- High-resolution images (50+ MB each)
- Video presentations (500+ MB)
- Bulk batch uploads (multiple 50 MB documents)
When to Decrease¶
Decrease limits if:
- You want to enforce strict file size policies
- Your server has limited memory
- You're processing only small structured documents
- You need to rate-limit aggressive clients
Configuration Methods¶
1. Environment Variable (Simplest)¶
Set the KREUZBERG_MAX_UPLOAD_SIZE_MB environment variable to specify limits in megabytes:
# Set to 200 MB
export KREUZBERG_MAX_UPLOAD_SIZE_MB=200
kreuzberg serve -H 0.0.0.0 -p 8000
# Set to 500 MB for large documents
export KREUZBERG_MAX_UPLOAD_SIZE_MB=500
kreuzberg serve -H 0.0.0.0 -p 8000
2. Docker Compose¶
Configure limits in your Docker Compose setup:
version: '3.8'
services:
kreuzberg-api:
image: goldziher/kreuzberg:latest
ports:
- "8000:8000"
environment:
# Set maximum upload size to 500 MB
KREUZBERG_MAX_UPLOAD_SIZE_MB: "500"
# Configure CORS for production
KREUZBERG_CORS_ORIGINS: "https://myapp.com,https://api.myapp.com"
volumes:
- ./cache:/app/.kreuzberg
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
3. Kubernetes Deployment¶
Configure size limits in your Kubernetes deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: kreuzberg-api
spec:
replicas: 3
template:
spec:
containers:
- name: kreuzberg
image: goldziher/kreuzberg:latest
env:
- name: KREUZBERG_MAX_UPLOAD_SIZE_MB
value: "500"
- name: KREUZBERG_CORS_ORIGINS
value: "https://myapp.com"
resources:
limits:
memory: "2Gi"
cpu: "2000m"
4. Programmatic Configuration¶
using Kreuzberg;
// Create limits: 50 MB for both request body and individual files
var limits = ApiSizeLimits.FromMB(50, 50);
// Or create with custom byte values
var customLimits = new ApiSizeLimits
{
MaxRequestBodyBytes = 100 * 1024 * 1024, // 100 MB
MaxMultipartFieldBytes = 100 * 1024 * 1024 // 100 MB
};
import "kreuzberg"
// Create limits: 200 MB for both request body and individual files
limits := kreuzberg.NewApiSizeLimits(
200 * 1024 * 1024, // max_request_body_bytes
200 * 1024 * 1024, // max_multipart_field_bytes
)
// Or use convenience method with MB values
limits := kreuzberg.ApiSizeLimitsFromMB(200, 200)
import com.kreuzberg.api.ApiSizeLimits;
// Create limits: 200 MB for both request body and individual files
ApiSizeLimits limits = new ApiSizeLimits(
200 * 1024 * 1024, // maxRequestBodyBytes
200 * 1024 * 1024 // maxMultipartFieldBytes
);
// Or use convenience method with MB values
ApiSizeLimits limits = ApiSizeLimits.fromMB(200, 200);
from kreuzberg.api import ApiSizeLimits, create_router_with_limits
from kreuzberg import ExtractionConfig
# Create limits: 200 MB for both request body and individual files
limits = ApiSizeLimits.from_mb(200, 200)
# Or create with custom byte values
limits = ApiSizeLimits(
max_request_body_bytes=200 * 1024 * 1024,
max_multipart_field_bytes=200 * 1024 * 1024
)
# Create router with custom limits
config = ExtractionConfig()
router = create_router_with_limits(config, limits)
require 'kreuzberg'
# Create limits: 200 MB for both request body and individual files
limits = Kreuzberg::Api::ApiSizeLimits.from_mb(200, 200)
# Or create with custom byte values
limits = Kreuzberg::Api::ApiSizeLimits.new(
max_request_body_bytes: 200 * 1024 * 1024,
max_multipart_field_bytes: 200 * 1024 * 1024
)
use kreuzberg::{ExtractionConfig, api::{create_router_with_limits, ApiSizeLimits}};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create limits: 200 MB for both request body and individual files
let limits = ApiSizeLimits::from_mb(200, 200);
// Or create with custom byte values
let limits = ApiSizeLimits::new(
200 * 1024 * 1024, // max_request_body_bytes
200 * 1024 * 1024, // max_multipart_field_bytes
);
let config = ExtractionConfig::default();
let router = create_router_with_limits(config, limits);
Ok(())
}
import { ApiSizeLimits, createRouterWithLimits } from 'kreuzberg';
// Create limits: 200 MB for both request body and individual files
const limits = ApiSizeLimits.fromMb(200, 200);
// Or create with custom byte values
const limits = new ApiSizeLimits({
maxRequestBodyBytes: 200 * 1024 * 1024,
maxMultipartFieldBytes: 200 * 1024 * 1024
});
// Create router with custom limits
const router = createRouterWithLimits(config, limits);
Configuration Scenarios¶
Small Documents (Default)¶
For standard business documents and PDFs under 50 MB:
Medium Documents¶
For typical scanned document batches and office files up to 200 MB:
Large Scans and Archives¶
For high-resolution scans, video content, and large archives up to 1 GB:
Constrained Environments¶
For development environments or memory-limited servers:
Performance Considerations¶
Memory Usage¶
File size limits directly impact memory consumption:
- Larger limits require more RAM to buffer request bodies
- Streaming extraction processes files incrementally, reducing peak memory
- Batch requests with multiple files consume memory for all files simultaneously
Memory Impact Examples¶
| Upload Limit | Memory Impact | Recommended RAM |
|---|---|---|
| 50 MB | ~50-100 MB per request | 512 MB |
| 100 MB (default) | ~100-200 MB per request | 1 GB |
| 500 MB | ~500 MB-1 GB per request | 2-4 GB |
| 1000 MB | ~1-2 GB per request | 4-8 GB |
Handling Large Files¶
When processing very large files (multi-GB):
- Allocate adequate RAM - Use the memory impact table above as a guideline
- Increase timeouts - Large files take longer to upload and process
- Monitor concurrency - Limit concurrent uploads to prevent memory exhaustion
- Use streaming - Where possible, process files streaming to reduce memory peaks
Docker Memory Limits¶
Configure Docker resource limits appropriately:
services:
kreuzberg-api:
image: goldziher/kreuzberg:latest
environment:
KREUZBERG_MAX_UPLOAD_SIZE_MB: "500"
deploy:
resources:
limits:
memory: 4G # Limit container to 4 GB
cpus: '2' # Limit to 2 CPU cores
reservations:
memory: 2G # Reserve 2 GB minimum
cpus: '1' # Reserve 1 CPU core
Reverse Proxy Configuration¶
When using a reverse proxy (Nginx, Caddy), ensure proxy limits match or exceed Kreuzberg's limits:
Nginx:
server {
listen 443 ssl http2;
server_name api.example.com;
# Match or exceed Kreuzberg's limit
client_max_body_size 500M;
location / {
proxy_pass http://kreuzberg;
# Extended timeouts for large file processing
proxy_read_timeout 300s;
proxy_send_timeout 300s;
proxy_request_buffering off; # Stream instead of buffer
}
}
Caddy:
api.example.com {
reverse_proxy localhost:8000 {
# Match Kreuzberg's limit
max_body_size 500MB
# Enable streaming for large files
flush_interval -1
}
}
Error Handling¶
Exceeding Limits¶
When a request exceeds configured limits, the server returns a 413 Payload Too Large error:
# Try to upload a 500 MB file with 100 MB default limit
curl -F "files=@large_file_500mb.zip" http://localhost:8000/extract
# Response (HTTP 413)
HTTP/1.1 413 Payload Too Large
Content-Type: application/json
{
"error_type": "ValidationError",
"message": "Request body exceeds maximum allowed size",
"status_code": 413
}
Client-Side Validation¶
Validate file sizes before upload to provide better user experience:
import os
from pathlib import Path
def validate_file_size(file_path: str, max_size_mb: int) -> bool:
"""Check if file size is within limits."""
file_size_bytes = os.path.getsize(file_path)
file_size_mb = file_size_bytes / (1024 * 1024)
if file_size_mb > max_size_mb:
print(f"File {Path(file_path).name} exceeds {max_size_mb} MB limit")
return False
return True
# Validate before upload
if validate_file_size("document.pdf", max_size_mb=100):
# Proceed with upload
pass
function validateFileSize(file: File, maxSizeMB: number): boolean {
const fileSizeMB = file.size / (1024 * 1024);
if (fileSizeMB > maxSizeMB) {
console.error(`File ${file.name} exceeds ${maxSizeMB} MB limit`);
return false;
}
return true;
}
// Validate before upload
const fileInput = document.getElementById('fileInput') as HTMLInputElement;
fileInput.addEventListener('change', (e) => {
const file = (e.target as HTMLInputElement).files?.[0];
if (file && validateFileSize(file, 100)) {
// Proceed with upload
}
});
import "os"
import "fmt"
func validateFileSize(filePath string, maxSizeMB int64) bool {
fileInfo, err := os.Stat(filePath)
if err != nil {
return false
}
fileSizeMB := fileInfo.Size() / (1024 * 1024)
if fileSizeMB > maxSizeMB {
fmt.Printf("File exceeds %d MB limit\n", maxSizeMB)
return false
}
return true
}
// Validate before upload
if validateFileSize("document.pdf", 100) {
// Proceed with upload
}
Troubleshooting¶
"Request body exceeds maximum allowed size"¶
Problem: Upload fails with HTTP 413 error
Solutions:
-
Increase limit:
-
Check reverse proxy limits:
-
Validate file size before upload:
Server crashes with large files¶
Problem: Memory exhaustion when processing large files
Solutions:
-
Increase container memory:
-
Reduce upload limit:
-
Process files sequentially:
- Send one file per request instead of batch uploads
-
Implement request queuing at the application level
-
Monitor memory usage:
Slow uploads¶
Problem: Large file uploads timeout
Solutions:
-
Increase reverse proxy timeouts:
-
Enable streaming:
-
Check network bandwidth:
- For a 500 MB file over a 10 Mbps connection: 500 MB × 8 bits/byte ÷ 10 Mbps = ~400 seconds
Best Practices¶
- Match limits to use case: Set limits based on your actual file sizes, not theoretical maximums
- Monitor and adjust: Track actual file sizes and adjust limits quarterly
- Use reverse proxy buffering: Configure reverse proxies to handle buffering, not Kreuzberg
- Implement client-side validation: Validate file sizes before sending to server
- Plan for scaling: Run multiple Kreuzberg instances behind a load balancer for high-throughput scenarios
- Set appropriate timeouts: Increase timeouts for large files (5-10 minutes recommended)
- Document your limits: Keep configuration in version control with clear documentation
- Test with real files: Test with actual document types you'll process in production
- Monitor disk space: Large files consume both RAM and disk (if streaming to disk)
- Consider compression: If applicable, compress large document batches before upload
See Also¶
- Configuration Guide - Extraction configuration options
- API Server Guide - Complete API server documentation
- Docker Deployment - Docker setup and configuration
- Performance Tuning - Advanced performance optimization