Searchable PDF¶

Client.make_searchable_pdf(...) returns a PDF with an invisible OCR text layer aligned to the original page geometry. Selectable, copyable, full-text-searchable. Bundled glyphless font covers every BMP codepoint; no setup needed for non-Latin scripts. Thread-safe.

`Client.make_searchable_pdf`¶

See Client.make_searchable_pdf on the Clients page for the full signature.

`turboocr.searchable_pdf` module¶

make_searchable_pdf ¶

make_searchable_pdf(original: bytes, response: PdfResponse | OcrResponse, *, dpi: int | None = None, font_path: str | None = None) -> bytes

Overlay an invisible OCR text layer on the input.

Accepts a PDF or any single-page image. Tested input formats: PDF, PNG, JPEG, BMP, TIFF, GIF, WebP. Image inputs are wrapped into a single-page PDF first, sized to the image's pixel dimensions at dpi. The detection is by magic bytes, so the caller does not have to tell the function which format the input is in.

By default uses a bundled glyphless font that covers every Basic Multilingual Plane codepoint, so non-Latin scans (CJK, Arabic, Cyrillic, …) work out of the box with zero configuration.

Pass font_path=<my.ttf> only if you have a specific reason to embed a real visible font instead.

Font errors¶

FontError ¶

Bases: RuntimeError

Raised when a caller-supplied font cannot render the OCR text.

The default code path never raises this — the bundled glyphless font covers every Basic Multilingual Plane codepoint. You can only hit it by passing font_path=<my.ttf> to a font that lacks glyphs the OCR text needs.

FontGlyphMissing ¶

Bases: FontError

A user-supplied font has no glyph for some OCR character.