Skip to content

C API Reference

Complete reference for the Kreuzberg C FFI library. The kreuzberg-ffi crate exposes a C-compatible API (extern "C") for text extraction from documents, with thread-local error storage and explicit memory management.

All functions are synchronous. Error state is stored per-thread, so multiple threads can call extraction concurrently without interference.

Requirements

  • C11 or later compiler (GCC, Clang, MSVC)
  • libkreuzberg_ffi static or shared library (built from the kreuzberg-ffi crate)
  • kreuzberg.h header file (auto-generated by cbindgen)
  • Tesseract OCR (optional, for image and scanned document extraction)

Installation

Build from source

Terminal
# Build the FFI crate (produces static library)
cargo build --release -p kreuzberg-ffi

# The static library is at target/release/libkreuzberg_ffi.a (Unix)
# or target/release/kreuzberg_ffi.lib (Windows)
Terminal
# Linux
cc -I/path/to/kreuzberg-ffi \
   -o myapp myapp.c \
   -L/path/to/target/release \
   -lkreuzberg_ffi -lpthread -ldl -lm

# macOS
cc -I/path/to/kreuzberg-ffi \
   -o myapp myapp.c \
   -L/path/to/target/release \
   -lkreuzberg_ffi -lpthread \
   -framework CoreFoundation -framework Security

# Windows (MSVC)
cl /I path\to\kreuzberg-ffi myapp.c /link /LIBPATH:path\to\target\release kreuzberg_ffi.lib

Static linking

For static linking, define KREUZBERG_STATIC before including the header:

C
#define KREUZBERG_STATIC
#include "kreuzberg.h"

Quick Start

Basic file extraction

extract_basic.c
#include <stdio.h>
#include "kreuzberg.h"

int main(void) {
    CExtractionResult *result = kreuzberg_extract_file_sync("document.pdf");
    if (result == NULL) {
        const char *error = kreuzberg_last_error();
        fprintf(stderr, "Extraction failed: %s\n", error);
        return 1;
    }

    if (result->success) {
        printf("MIME type: %s\n", result->mime_type);
        printf("Content length: %zu\n", strlen(result->content));
        printf("Content: %.200s...\n", result->content);
    }

    kreuzberg_free_result(result);
    return 0;
}

Extraction with error handling

extract_with_errors.c
#include <stdio.h>
#include "kreuzberg.h"

int main(void) {
    const char *config = "{\"force_ocr\": true, \"ocr\": {\"language\": \"eng\"}}";
    CExtractionResult *result = kreuzberg_extract_file_sync_with_config(
        "scanned.pdf", config
    );

    if (result == NULL) {
        CErrorDetails details = kreuzberg_get_error_details();
        fprintf(stderr, "Error [%s]: %s (code=%u)\n",
                details.error_type, details.message, details.error_code);
        if (details.source_file != NULL) {
            fprintf(stderr, "  at %s:%u in %s\n",
                    details.source_file, details.source_line,
                    details.source_function);
        }
        kreuzberg_free_string(details.message);
        kreuzberg_free_string(details.error_type);
        if (details.source_file) kreuzberg_free_string(details.source_file);
        if (details.source_function) kreuzberg_free_string(details.source_function);
        if (details.context_info) kreuzberg_free_string(details.context_info);
        return 1;
    }

    printf("Extracted %zu characters\n", strlen(result->content));
    kreuzberg_free_result(result);
    return 0;
}

Core Extraction Functions

Kreuzberg_batch_extract_bytes_sync

Batch extract text from multiple in-memory documents.

Signature:

CBatchResult *kreuzberg_batch_extract_bytes_sync(
    const CBytesWithMime *items,
    uintptr_t count,
    const char *config_json
);

Parameters:

  • items (const CBytesWithMime*): Array of byte-buffer/MIME-type pairs
  • count (uintptr_t): Number of items
  • config_json (const char*): JSON configuration, or NULL

Returns:

  • CBatchResult*: Batch result; free with kreuzberg_free_batch_result
  • NULL on error

Kreuzberg_batch_extract_files_sync

Batch extract text from multiple files in a single call.

Signature:

CBatchResult *kreuzberg_batch_extract_files_sync(
    const char *const *file_paths,
    uintptr_t count,
    const char *config_json
);

Parameters:

  • file_paths (const char const): Array of null-terminated file path strings
  • count (uintptr_t): Number of file paths in the array
  • config_json (const char*): JSON configuration applied to all files, or NULL

Returns:

  • CBatchResult*: Batch result containing an array of individual results; free with kreuzberg_free_batch_result
  • NULL on error (for example, invalid arguments)

Kreuzberg_extract_batch_parallel

Parallel variant of streaming batch extraction using a thread pool.

Signature:

int kreuzberg_extract_batch_parallel(
    const char *const *files,
    uintptr_t count,
    const char *config_json,
    ResultCallback result_callback,
    void *user_data,
    Option_ErrorCallback error_callback,
    uintptr_t max_parallel
);

Parameters:

  • files (const char const): Array of file paths
  • count (uintptr_t): Number of files
  • config_json (const char*): JSON config or NULL
  • result_callback (ResultCallback): Called for each successful extraction
  • user_data (void*): Opaque pointer passed through to callbacks
  • error_callback (Option_ErrorCallback): Called on per-file failures
  • max_parallel (uintptr_t): Maximum concurrent extractions (0 = number of CPUs)

Returns:

  • 0 on success, -1 on error

Kreuzberg_extract_batch_streaming

Stream-process multiple files with a callback for each result, avoiding memory accumulation.

Signature:

int kreuzberg_extract_batch_streaming(
    const char *const *files,
    uintptr_t count,
    const char *config_json,
    ResultCallback result_callback,
    void *user_data,
    Option_ErrorCallback error_callback
);

Parameters:

  • files (const char const): Array of file paths
  • count (uintptr_t): Number of files
  • config_json (const char*): JSON config or NULL
  • result_callback (ResultCallback): Called for each successful extraction
  • user_data (void*): Opaque pointer passed through to callbacks
  • error_callback (Option_ErrorCallback): Called on per-file failures (optional)

Returns:

  • 0 on success, -1 on error

Kreuzberg_extract_bytes_sync

Extract text from an in-memory byte buffer with a specified MIME type.

Signature:

CExtractionResult *kreuzberg_extract_bytes_sync(
    const uint8_t *data,
    uintptr_t data_len,
    const char *mime_type
);

Parameters:

  • data (const uint8_t*): Pointer to the document bytes
  • data_len (uintptr_t): Length of the byte buffer
  • mime_type (const char*): Null-terminated MIME type string (for example, "application/pdf")

Returns:

  • CExtractionResult*: Populated result on success; caller must free with kreuzberg_free_result
  • NULL on error; check kreuzberg_last_error() or kreuzberg_get_error_details() for details

Kreuzberg_extract_bytes_sync_with_config

Extract text from bytes with a JSON configuration string.

Signature:

CExtractionResult *kreuzberg_extract_bytes_sync_with_config(
    const uint8_t *data,
    uintptr_t data_len,
    const char *mime_type,
    const char *config_json
);

Parameters:

  • data (const uint8_t*): Document bytes
  • data_len (uintptr_t): Length of data
  • mime_type (const char*): MIME type string
  • config_json (const char*): JSON configuration, or NULL for defaults

Returns:

  • CExtractionResult*: Result on success; free with kreuzberg_free_result
  • NULL on error

Kreuzberg_extract_file_sync

Extract text and metadata from a file.

Signature:

CExtractionResult *kreuzberg_extract_file_sync(const char *file_path);

Parameters:

  • file_path (const char*): Null-terminated path to the document file

Returns:

  • CExtractionResult*: Populated result on success; caller must free with kreuzberg_free_result
  • NULL on error; check kreuzberg_last_error() or kreuzberg_get_error_details() for details

Kreuzberg_extract_file_sync_with_config

Extract text and metadata from a file with custom JSON configuration.

Signature:

CExtractionResult *kreuzberg_extract_file_sync_with_config(
    const char *file_path,
    const char *config_json
);

Parameters:

  • file_path (const char*): Null-terminated path to the document
  • config_json (const char*): Null-terminated JSON configuration string, or NULL for defaults

Returns:

  • CExtractionResult*: Result on success; free with kreuzberg_free_result
  • NULL on error

Configuration

Config Builder

Construct an ExtractionConfig programmatically using the builder pattern.

Signature:

ConfigBuilder *kreuzberg_config_builder_new(void);
int32_t kreuzberg_config_builder_set_chunking(ConfigBuilder *builder, const char *chunking_json);
int32_t kreuzberg_config_builder_set_image_extraction(ConfigBuilder *builder, const char *image_json);
int32_t kreuzberg_config_builder_set_include_document_structure(ConfigBuilder *builder, int32_t include);
int32_t kreuzberg_config_builder_set_language_detection(ConfigBuilder *builder, const char *ld_json);
int32_t kreuzberg_config_builder_set_layout(ConfigBuilder *builder, const char *layout_json);
int32_t kreuzberg_config_builder_set_ocr(ConfigBuilder *builder, const char *ocr_json);
int32_t kreuzberg_config_builder_set_pdf(ConfigBuilder *builder, const char *pdf_json);
int32_t kreuzberg_config_builder_set_post_processor(ConfigBuilder *builder, const char *pp_json);
int32_t kreuzberg_config_builder_set_use_cache(ConfigBuilder *builder, int32_t use_cache);
ExtractionConfig *kreuzberg_config_builder_build(ConfigBuilder *builder);
void kreuzberg_config_builder_free(ConfigBuilder *builder);

Returns: Setter functions return 0 on success, -1 on error.

Example:

C
ConfigBuilder *builder = kreuzberg_config_builder_new();
kreuzberg_config_builder_set_use_cache(builder, 1);
kreuzberg_config_builder_set_output_format(builder, "Markdown");
kreuzberg_config_builder_set_ocr(builder,
    "{\"backend\": \"tesseract\", \"languages\": [\"eng\"]}");
kreuzberg_config_builder_set_chunking(builder,
    "{\"max_chars\": 1000, \"max_overlap\": 200}");

ExtractionConfig *config = kreuzberg_config_builder_build(builder);
/* builder is consumed -- do NOT call kreuzberg_config_builder_free */

/* Use config... */
kreuzberg_config_free(config);

Important: After calling kreuzberg_config_builder_build, the builder is consumed. Do not call kreuzberg_config_builder_free on a consumed builder. To discard a builder without building, call kreuzberg_config_builder_free instead.


JSON Configuration

Parse, serialize, validate, and discover configuration from JSON.

Signature:

ExtractionConfig *kreuzberg_config_from_json(const char *json_config);
char *kreuzberg_config_get_field(const ExtractionConfig *config, const char *field_name);
int32_t kreuzberg_config_is_valid(const char *json_config);
int32_t kreuzberg_config_merge(ExtractionConfig *base, const ExtractionConfig *override_config);
char *kreuzberg_config_to_json(const ExtractionConfig *config);

Parameters/Returns:

  • kreuzberg_config_from_json: Parses JSON into ExtractionConfig*; free with kreuzberg_config_free. Returns NULL on error.
  • kreuzberg_config_get_field: Returns a specific field as JSON; free with kreuzberg_free_string.
  • kreuzberg_config_is_valid: Returns 1 if valid, 0 if invalid.
  • kreuzberg_config_merge: Merges override into base (in-place). Returns 1 on success, 0 on error.
  • kreuzberg_config_to_json: Serializes config to JSON string; free with kreuzberg_free_string.

Embeddings

Retrieve information about available embedding presets and models.

Signature:

char *kreuzberg_get_embedding_preset(const char *preset_name);
char *kreuzberg_list_embedding_presets(void);
  • kreuzberg_get_embedding_preset: Returns JSON for a specific preset; free with kreuzberg_free_string.
  • kreuzberg_list_embedding_presets: Returns JSON array of all preset names; free with kreuzberg_free_string.

File-based Configuration

char *kreuzberg_config_discover(void);
ExtractionConfig *kreuzberg_config_from_file(const char *path);
char *kreuzberg_load_extraction_config_from_file(const char *file_path);
  • kreuzberg_config_discover: Searches parent directories for a config file. Returns JSON string or NULL; free with kreuzberg_free_string.
  • kreuzberg_config_from_file: Loads config from a file. Returns ExtractionConfig*; free with kreuzberg_config_free.
  • kreuzberg_load_extraction_config_from_file: Loads config and returns raw JSON string; free with kreuzberg_free_string.

PDF Rendering

Added in v4.6.2

Kreuzberg_render_pdf_page

Render a single page of a PDF as a PNG image.

Signature:

C
CRenderPageResult* kreuzberg_render_pdf_page(const char* file_path, size_t page_index, int dpi);

Parameters:

  • file_path (const char*): Path to the PDF file (UTF-8 encoded)
  • page_index (size_t): Zero-based page index to render
  • dpi (int): Resolution for rendering (for example 150)

Returns:

  • CRenderPageResult*: Pointer to a single page render result, or NULL on error. Free with kreuzberg_free_render_page_result.

Kreuzberg_free_render_page_result

Free a single page result returned by kreuzberg_render_pdf_page.

C
void kreuzberg_free_render_page_result(CRenderPageResult* page);

Error Handling

Kreuzberg_last_error

Get the last error message for the current thread.

Signature:

const char *kreuzberg_last_error(void);

Returns:

  • Pointer to a null-terminated error message string
  • The returned pointer should NOT be freed

Kreuzberg_last_error_code

Get the numeric error code for the last error.

Signature:

int32_t kreuzberg_last_error_code(void);

Returns:

  • Numeric error code (0-7), or -1 if no error occurred

Kreuzberg_last_panic_context

Get the panic context if the last error was a caught panic.

Signature:

const char *kreuzberg_last_panic_context(void);

Returns:

  • Static string with panic details, or NULL. Do NOT free.

Kreuzberg_get_error_details

Retrieve structured error information from thread-local storage.

Signature:

CErrorDetails kreuzberg_get_error_details(void);

Returns:

  • CErrorDetails struct with all fields populated. Non-NULL string fields must be freed with kreuzberg_free_string.

Example:

C
CExtractionResult *result = kreuzberg_extract_file_sync("bad_file.xyz");
if (result == NULL) {
    CErrorDetails details = kreuzberg_get_error_details();
    fprintf(stderr, "Error: %s (code=%u, type=%s)\n",
            details.message, details.error_code, details.error_type);
    if (details.source_file != NULL) {
        fprintf(stderr, "  at %s:%u in %s\n",
                details.source_file, details.source_line, details.source_function);
    }
    if (details.is_panic) {
        fprintf(stderr, "  [PANIC]\n");
    }
    /* Free all non-NULL string fields */
    kreuzberg_free_string(details.message);
    kreuzberg_free_string(details.error_type);
    if (details.source_file) kreuzberg_free_string(details.source_file);
    if (details.source_function) kreuzberg_free_string(details.source_function);
    if (details.context_info) kreuzberg_free_string(details.context_info);
}

For language bindings that have trouble returning structs by value, use the heap-allocated variant:

CErrorDetails *kreuzberg_get_error_details_ptr(void);
void kreuzberg_free_error_details(CErrorDetails *details);

Error Codes

Code Name Description
0 validation Input validation error
1 parsing Document parsing error
2 ocr OCR processing error
3 missing_dependency Required library not found
4 io File I/O error
5 plugin Plugin registration or execution
6 unsupported_format MIME type not supported
7 internal Internal/unexpected error

Error Code Functions

uint32_t kreuzberg_error_code_count(void);             /* returns 8 */
uint32_t kreuzberg_error_code_internal(void);          /* returns 7 */
uint32_t kreuzberg_error_code_io(void);                /* returns 4 */
uint32_t kreuzberg_error_code_missing_dependency(void); /* returns 3 */
uint32_t kreuzberg_error_code_ocr(void);               /* returns 2 */
uint32_t kreuzberg_error_code_parsing(void);           /* returns 1 */
uint32_t kreuzberg_error_code_plugin(void);            /* returns 5 */
uint32_t kreuzberg_error_code_unsupported_format(void); /* returns 6 */
uint32_t kreuzberg_error_code_validation(void);       /* returns 0 */

Error Introspection

uint32_t kreuzberg_classify_error(const char *error_message);
const char *kreuzberg_error_code_description(uint32_t code);
const char *kreuzberg_error_code_name(uint32_t code);
  • kreuzberg_error_code_name: Returns a static string like "validation", "ocr". Do NOT free.
  • kreuzberg_error_code_description: Returns a static description. Do NOT free.
  • kreuzberg_classify_error: Classifies an arbitrary error message string into one of the error codes (0-7).

Example:

C
uint32_t code = kreuzberg_classify_error("Failed to open file: permission denied");
if (code == kreuzberg_error_code_io()) {
    printf("This is an I/O error: %s\n", kreuzberg_error_code_description(code));
}

Memory Management

Correct memory management is critical when using the C API. Every allocation has a specific free function. Mixing allocators (for example, calling free() on a Kreuzberg-allocated string) causes undefined behavior.

Rules

Allocated by Free with
kreuzberg_extract_* (returns CExtractionResult*) kreuzberg_free_result
kreuzberg_batch_extract_* (returns CBatchResult*) kreuzberg_free_batch_result
Functions returning char* kreuzberg_free_string
kreuzberg_config_from_json / kreuzberg_config_from_file / kreuzberg_config_builder_build kreuzberg_config_free
kreuzberg_config_builder_new kreuzberg_config_builder_free (only if NOT built)
kreuzberg_get_error_details_ptr kreuzberg_free_error_details
kreuzberg_result_pool_new kreuzberg_result_pool_free

Free Functions

void kreuzberg_config_builder_free(ConfigBuilder *builder);
void kreuzberg_config_free(ExtractionConfig *config);
void kreuzberg_free_batch_result(CBatchResult *batch_result);
void kreuzberg_free_error_details(CErrorDetails *details);
void kreuzberg_free_result(CExtractionResult *result);
void kreuzberg_free_string(char *s);

All free functions accept NULL (no-op).

Kreuzberg_clone_string

Duplicate a null-terminated string using the Kreuzberg allocator.

Signature:

char *kreuzberg_clone_string(const char *s);

Parameters:

  • s (const char*): Null-terminated UTF-8 string to clone

Returns:

  • char*: Cloned string; free with kreuzberg_free_string
  • NULL on error

Example:

C
CExtractionResult *result = kreuzberg_extract_file_sync("doc.pdf");
if (result != NULL && result->success) {
    /* Clone content before freeing result */
    char *saved_content = kreuzberg_clone_string(result->content);
    kreuzberg_free_result(result);

    /* Use saved_content... */
    printf("%s\n", saved_content);
    kreuzberg_free_string(saved_content);
}

MIME Type Utilities

Kreuzberg_detect_mime_type

Detect MIME type from a file path, optionally checking that the file exists.

Signature:

char *kreuzberg_detect_mime_type(const char *file_path, bool check_exists);

Parameters:

  • file_path (const char*): Path to file
  • check_exists (bool): If true, verifies the file exists before detection

Returns:

  • char*: Detected MIME type string; free with kreuzberg_free_string
  • NULL on error

Kreuzberg_detect_mime_type_from_bytes

Detect MIME type from raw byte content.

Signature:

char *kreuzberg_detect_mime_type_from_bytes(const uint8_t *bytes, uintptr_t len);

Example:

C
uint8_t data[512];
/* ... read data ... */
char *mime = kreuzberg_detect_mime_type_from_bytes(data, 512);
if (mime != NULL) {
    printf("Detected: %s\n", mime);
    kreuzberg_free_string(mime);
}

Kreuzberg_detect_mime_type_from_path

Detect MIME type by reading both the file extension and file content.

Signature:

char *kreuzberg_detect_mime_type_from_path(const char *file_path);

Kreuzberg_get_extensions_for_mime

Get file extensions for a given MIME type.

Signature:

char *kreuzberg_get_extensions_for_mime(const char *mime_type);

Returns:

  • char*: Comma-separated list of extensions; free with kreuzberg_free_string
  • NULL on error

Kreuzberg_validate_mime_type

Validate that a MIME type is supported by Kreuzberg.

Signature:

char *kreuzberg_validate_mime_type(const char *mime_type);

Returns:

  • char*: Validated MIME type if supported; free with kreuzberg_free_string
  • NULL if unsupported or on error

Returns:

  • char*: JSON array of extensions (for example, ["pdf"]); free with kreuzberg_free_string
  • NULL on error

Example:

C
char *exts = kreuzberg_get_extensions_for_mime("application/pdf");
if (exts != NULL) {
    printf("Extensions: %s\n", exts); /* ["pdf"] */
    kreuzberg_free_string(exts);
}

Plugin System

Document Extractors

Register custom document extractors to handle new or proprietary formats.

bool kreuzberg_register_document_extractor(
    const char *name,
    DocumentExtractorCallback callback,
    const char *mime_types,
    int32_t priority
);
bool kreuzberg_unregister_document_extractor(const char *name);
char *kreuzberg_list_document_extractors(void);
bool kreuzberg_clear_document_extractors(void);

Callback signature:

typedef char *(*DocumentExtractorCallback)(
    const uint8_t *content,
    uintptr_t content_len,
    const char *mime_type,
    const char *config_json
);

The callback must return a null-terminated JSON string containing the extraction result, or NULL on error. The returned string must be freeable by kreuzberg_free_string.

Example:

C
char *my_extractor(const uint8_t *content, size_t len,
                   const char *mime_type, const char *config) {
    /* Process content, return JSON ExtractionResult */
    return strdup("{\"content\":\"extracted text\",\"mime_type\":\"text/plain\",\"metadata\":{}}");
}

bool ok = kreuzberg_register_document_extractor(
    "my-extractor", my_extractor,
    "application/x-custom,text/x-custom", 100
);
if (!ok) {
    fprintf(stderr, "Registration failed: %s\n", kreuzberg_last_error());
}

OCR Backends

Register custom OCR backends for image text recognition.

bool kreuzberg_register_ocr_backend(const char *name, OcrBackendCallback callback);
bool kreuzberg_register_ocr_backend_with_languages(
    const char *name,
    OcrBackendCallback callback,
    const char *languages_json
);
bool kreuzberg_unregister_ocr_backend(const char *name);
char *kreuzberg_list_ocr_backends(void);
bool kreuzberg_clear_ocr_backends(void);
char *kreuzberg_get_ocr_languages(const char *backend);
int32_t kreuzberg_is_language_supported(const char *backend, const char *language);
char *kreuzberg_list_ocr_backends_with_languages(void);

Callback signature:

typedef char *(*OcrBackendCallback)(
    const uint8_t *image_bytes,
    uintptr_t image_length,
    const char *config_json
);

Example:

C
char *my_ocr(const uint8_t *image_bytes, size_t image_length,
             const char *config_json) {
    /* Run custom OCR on image data */
    return strdup("Recognized text from image");
}

kreuzberg_register_ocr_backend("my-ocr", my_ocr);

/* Check language support */
char *langs = kreuzberg_get_ocr_languages("tesseract");
if (langs != NULL) {
    printf("Tesseract languages: %s\n", langs);
    kreuzberg_free_string(langs);
}

Post-Processors

Register custom post-processing steps to modify extraction results.

bool kreuzberg_register_post_processor(
    const char *name,
    PostProcessorCallback callback,
    int32_t priority
);
bool kreuzberg_register_post_processor_with_stage(
    const char *name,
    PostProcessorCallback callback,
    int32_t priority,
    const char *stage       /* "early", "middle", or "late" */
);
bool kreuzberg_unregister_post_processor(const char *name);
bool kreuzberg_clear_post_processors(void);
char *kreuzberg_list_post_processors(void);

Callback signature:

typedef char *(*PostProcessorCallback)(const char *result_json);

The callback receives the current result as JSON, modifies it, and returns a new JSON string.


Validators

Register custom validation logic that runs after extraction.

bool kreuzberg_register_validator(
    const char *name,
    ValidatorCallback callback,
    int32_t priority
);
bool kreuzberg_unregister_validator(const char *name);
bool kreuzberg_clear_validators(void);
char *kreuzberg_list_validators(void);

Callback signature:

typedef char *(*ValidatorCallback)(const char *result_json);

Return NULL if validation passes. Return an error message string if validation fails.

Example:

C
char *check_not_empty(const char *result_json) {
    /* Simple check: reject if content is empty */
    if (strstr(result_json, "\"content\":\"\"") != NULL) {
        return strdup("Validation failed: extracted content is empty");
    }
    return NULL; /* passes */
}

kreuzberg_register_validator("not-empty", check_not_empty, 100);

Result Pool

Pre-allocate memory for extraction results to reduce allocation overhead in batch scenarios.

ResultPool *kreuzberg_result_pool_new(uintptr_t capacity);
void kreuzberg_result_pool_reset(ResultPool *pool);
void kreuzberg_result_pool_free(ResultPool *pool);
CResultPoolStats kreuzberg_result_pool_stats(const ResultPool *pool);

Example:

C
ResultPool *pool = kreuzberg_result_pool_new(100);
if (pool == NULL) {
    fprintf(stderr, "Pool creation failed: %s\n", kreuzberg_last_error());
    return;
}

/* Process batches, resetting between them */
kreuzberg_result_pool_reset(pool);

/* Extract directly into the pool */
kreuzberg_extract_file_into_pool(pool, "doc.pdf", NULL);

/* Check pool efficiency */
CResultPoolStats stats = kreuzberg_result_pool_stats(pool);
printf("Pool: %zu/%zu results, %zu allocs, %zu bytes\n",
       stats.current_count, stats.capacity,
       stats.total_allocations, stats.estimated_memory_bytes);

kreuzberg_result_pool_free(pool);

Pool Functions

int32_t kreuzberg_extract_file_into_pool(ResultPool *pool, const char *file_path, const char *config_json);
CExtractionResultView *kreuzberg_extract_file_into_pool_view(ResultPool *pool, const char *file_path, const char *config_json);
void kreuzberg_result_pool_free(ResultPool *pool);
ResultPool *kreuzberg_result_pool_new(uintptr_t capacity);
void kreuzberg_result_pool_reset(ResultPool *pool);
CResultPoolStats kreuzberg_result_pool_stats(ResultPool *pool);

Utility Functions

Kreuzberg_version

Get the library version string.

Signature:

const char *kreuzberg_version(void);

Returns:

  • Pointer to a static null-terminated string (for example, "4.3.8"). Do NOT free this pointer.

Example:

C
printf("Kreuzberg version: %s\n", kreuzberg_version());

Version Macros

The header also defines compile-time version macros:

#define KREUZBERG_VERSION_MAJOR 4
#define KREUZBERG_VERSION_MINOR 3
#define KREUZBERG_VERSION_PATCH 8
#define KREUZBERG_VERSION "4.3.8"

Data Types

CExtractionResult

The primary result structure returned by extraction functions.

typedef struct CExtractionResult {
    char *annotations_json;         /* JSON PDF annotations (may be NULL) */
    char *chunks_json;              /* JSON array of text chunks (may be NULL) */
    char *content;                  /* Extracted text (UTF-8, null-terminated) */
    char *date;                     /* Document date (may be NULL) */
    char *detected_languages_json;  /* JSON array of detected languages (may be NULL) */
    char *djot_content_json;        /* JSON Djot content (may be NULL) */
    char *document_json;            /* JSON document structure (may be NULL) */
    char *elements_json;            /* JSON semantic elements (may be NULL) */
    char *extracted_keywords_json;  /* JSON keywords (may be NULL) */
    char *images_json;              /* JSON array of extracted images (may be NULL) */
    char *language;                 /* Document language (may be NULL) */
    char *metadata_json;            /* JSON object with metadata (may be NULL) */
    char *mime_type;                /* Detected MIME type */
    char *ocr_elements_json;        /* JSON OCR elements (may be NULL) */
    char *page_structure_json;      /* JSON page structure (may be NULL) */
    char *pages_json;               /* JSON per-page content (may be NULL) */
    char *processing_warnings_json; /* JSON warnings array (may be NULL) */
    char *quality_score_json;       /* JSON quality score (may be NULL) */
    char *subject;                  /* Document subject (may be NULL) */
    char *tables_json;              /* JSON array of tables (may be NULL) */
    bool success;                   /* true if extraction succeeded */
    uint8_t _padding1[7];           /* Internal padding */
} CExtractionResult;

All char* fields are null-terminated UTF-8 strings. Fields marked "may be NULL" are optional and depend on the document type and configuration. Free the entire struct with kreuzberg_free_result (which frees all string fields automatically).


CBatchResult

Container for batch extraction results.

typedef struct CBatchResult {
    uintptr_t count;              /* Number of results */
    CExtractionResult **results;  /* Array of result pointers */
    bool success;                 /* true if batch operation succeeded */
    uint8_t _padding2[7];         /* Internal padding */
} CBatchResult;

Free with kreuzberg_free_batch_result (frees all individual results and the array).


CBytesWithMime

Input structure for byte-based batch extraction.

typedef struct CBytesWithMime {
    const uint8_t *data;     /* Pointer to document bytes */
    uintptr_t data_len;      /* Length in bytes */
    const char *mime_type;   /* MIME type as null-terminated string */
} CBytesWithMime;

The caller retains ownership of data and mime_type pointers.


CErrorDetails

Structured error information.

typedef struct CErrorDetails {
    char *context_info;      /* Additional context (may be NULL; free if non-NULL) */
    uint32_t error_code;     /* Numeric error code (0-7) */
    char *error_type;        /* Human-readable type name (free with kreuzberg_free_string) */
    char *message;           /* Error message (free with kreuzberg_free_string) */
    char *source_file;       /* Source file (may be NULL; free if non-NULL) */
    char *source_function;   /* Source function (may be NULL; free if non-NULL) */
    uint32_t source_line;    /* Line number (0 if unknown) */
    int32_t is_panic;        /* 1 if from a panic, 0 otherwise */
} CErrorDetails;

CExtractionResultView

Zero-copy view into an ExtractionResult. Provides direct pointers to string data without allocation. All pointers are UTF-8 byte slices (not null-terminated).

typedef struct CExtractionResultView {
    uintptr_t chunk_count;
    uintptr_t content_len;
    const uint8_t *content_ptr;
    uintptr_t date_len;
    const uint8_t *date_ptr;
    uintptr_t detected_language_count;
    uintptr_t image_count;
    uintptr_t language_len;
    const uint8_t *language_ptr;
    uintptr_t mime_type_len;
    const uint8_t *mime_type_ptr;
    uintptr_t page_count;
    uintptr_t subject_len;
    const uint8_t *subject_ptr;
    uintptr_t table_count;
    uintptr_t title_len;
    const uint8_t *title_ptr;
} CExtractionResultView;

---

### View Functions

```c
int32_t kreuzberg_get_result_view(const CExtractionResult *result, CExtractionResultView *out_view);
int32_t kreuzberg_view_get_content(const CExtractionResultView *view, const uint8_t **out_ptr, uintptr_t *out_len);
int32_t kreuzberg_view_get_mime_type(const CExtractionResultView *view, const uint8_t **out_ptr, uintptr_t *out_len);
Views are used in streaming callbacks. They are valid only during the callback invocation. Copy any data you need to keep.

---

### CMetadataField

Returned by `kreuzberg_result_get_metadata_field`.

```c
typedef struct CMetadataField {
    const char *name;       /* Field name (do NOT free) */
    char *json_value;       /* JSON value string (free with kreuzberg_free_string if non-NULL) */
    int32_t is_null;        /* 1 if field does not exist, 0 if it does */
} CMetadataField;

Standard Metadata Fields

The following fields are standard across all document types and can be queried via kreuzberg_result_get_metadata_field:

Field Description Type (JSON)
author Document author string
characters Character count number
created_at Creation timestamp string (ISO 8601)
creator Document creator/application string
description Document description string
keywords Document keywords array<string>
language Primary language string (ISO 639-1)
modified_at Modification timestamp string (ISO 8601)
pages Page count info object
producer PDF producer string
subject Document subject string
title Document title string
version Document version string
---

## Result Retrieval

Functions for retrieving specific data from `CExtractionResult`.

### kreuzberg_result_get_chunk_count

Get the number of text chunks in the result.

**Signature:**

```c
uintptr_t kreuzberg_result_get_chunk_count(const CExtractionResult *result);

Kreuzberg_result_get_detected_language

Get a detected language at a specific index.

Signature:

char *kreuzberg_result_get_detected_language(const CExtractionResult *result, uintptr_t index);

Kreuzberg_result_get_metadata_field

Get a specific metadata field by name.

Signature:

CMetadataField kreuzberg_result_get_metadata_field(const CExtractionResult *result, const char *name);

Kreuzberg_result_get_page_count

Get the number of pages in the result.

Signature:

uintptr_t kreuzberg_result_get_page_count(const CExtractionResult *result);

CResultPoolStats

Statistics for result pool tracking.

typedef struct CResultPoolStats {
    uintptr_t capacity;               /* Maximum capacity */
    uintptr_t current_count;          /* Current results in pool */
    uintptr_t estimated_memory_bytes; /* Estimated memory usage */
    uintptr_t growth_events;          /* Number of capacity growths */
    uintptr_t total_allocations;      /* Total allocations made */
} CResultPoolStats;

CStringInternStats

Statistics for string interning efficiency.

typedef struct CStringInternStats {
    uintptr_t cache_hits;             /* Cache hits */
    uintptr_t cache_misses;           /* Cache misses */
    uintptr_t estimated_memory_saved; /* Memory saved by deduplication */
    uintptr_t total_memory_bytes;     /* Total memory used */
    uintptr_t total_requests;         /* Total intern requests */
    uintptr_t unique_count;           /* Unique strings interned */
} CStringInternStats;

---

### Intern Functions

```c
void kreuzberg_free_interned_string(char *s);
char *kreuzberg_intern_string(const char *s);
void kreuzberg_string_intern_reset(void);
CStringInternStats kreuzberg_string_intern_stats(void);

Format Conversion Utilities

Utilities for parsing and serializing configuration enums.

Kreuzberg_code_block_style_to_string

const char *kreuzberg_code_block_style_to_string(int32_t style);

Kreuzberg_heading_style_to_string

const char *kreuzberg_heading_style_to_string(int32_t style);

Kreuzberg_highlight_style_to_string

const char *kreuzberg_highlight_style_to_string(int32_t style);

Kreuzberg_list_indent_type_to_string

const char *kreuzberg_list_indent_type_to_string(int32_t type);

Kreuzberg_newline_style_to_string

const char *kreuzberg_newline_style_to_string(int32_t style);

Kreuzberg_parse_code_block_style

int32_t kreuzberg_parse_code_block_style(const char *s);

Kreuzberg_parse_heading_style

int32_t kreuzberg_parse_heading_style(const char *s);

Kreuzberg_parse_highlight_style

int32_t kreuzberg_parse_highlight_style(const char *s);

Kreuzberg_parse_list_indent_type

int32_t kreuzberg_parse_list_indent_type(const char *s);

Kreuzberg_parse_newline_style

int32_t kreuzberg_parse_newline_style(const char *s);

Kreuzberg_parse_preprocessing_preset

int32_t kreuzberg_parse_preprocessing_preset(const char *s);

Kreuzberg_parse_whitespace_mode

int32_t kreuzberg_parse_whitespace_mode(const char *s);

Kreuzberg_preprocessing_preset_to_string

const char *kreuzberg_preprocessing_preset_to_string(int32_t preset);

Kreuzberg_whitespace_mode_to_string

const char *kreuzberg_whitespace_mode_to_string(int32_t mode);
Access via:

```c
CStringInternStats kreuzberg_string_intern_stats(void);
void kreuzberg_string_intern_reset(void);

Thread Safety

All extraction functions are thread-safe. Error state is stored in thread-local storage, so each thread maintains its own independent error context. Multiple threads can call extraction functions concurrently without interference.

Key points:

  • kreuzberg_last_error() and kreuzberg_get_error_details() return per-thread state
  • CExtractionResultView structs are NOT thread-safe (used in streaming callbacks)
  • ResultPool uses internal mutex synchronization; safe for concurrent access but may serialize operations
  • For maximum parallel throughput, use separate pools per thread

LLM Integration

Kreuzberg integrates with LLMs via the liter-llm crate for structured extraction and VLM-based OCR. The C FFI layer accepts LLM configuration as JSON strings serialized into the config parameter. See the LLM Integration Guide for full details.

Structured Extraction

Pass a JSON config string containing structured_extraction to the extraction function:

structured_extraction.c
#include "kreuzberg.h"
#include <stdio.h>

int main(void) {
    const char *config_json =
        "{"
        "  \"structured_extraction\": {"
        "    \"schema\": {"
        "      \"type\": \"object\","
        "      \"properties\": {"
        "        \"title\": {\"type\": \"string\"},"
        "        \"authors\": {\"type\": \"array\", \"items\": {\"type\": \"string\"}},"
        "        \"date\": {\"type\": \"string\"}"
        "      },"
        "      \"required\": [\"title\", \"authors\", \"date\"],"
        "      \"additionalProperties\": false"
        "    },"
        "    \"llm\": {\"model\": \"openai/gpt-4o-mini\"},"
        "    \"strict\": true"
        "  }"
        "}";

    CExtractionResult *result = kreuzberg_extract_file_sync("paper.pdf", NULL, config_json);
    if (result) {
        const char *structured = kreuzberg_result_structured_output(result);
        if (structured) {
            printf("Structured output: %s\n", structured);
        }
        kreuzberg_result_free(result);
    }
    return 0;
}

VLM OCR

Pass OCR config with backend: "vlm" and a vlm_config in the JSON config string:

vlm_ocr.c
const char *config_json =
    "{"
    "  \"force_ocr\": true,"
    "  \"ocr\": {"
    "    \"backend\": \"vlm\","
    "    \"vlm_config\": {\"model\": \"openai/gpt-4o-mini\"}"
    "  }"
    "}";

CExtractionResult *result = kreuzberg_extract_file_sync("scan.pdf", NULL, config_json);

For configuration details including API keys, model selection, and provider setup, see the LLM Integration Guide.