Searchable PDF¶
Client.make_searchable_pdf(...) returns a PDF with an invisible OCR
text layer aligned to the original page geometry. Selectable, copyable,
full-text-searchable. Bundled glyphless font covers every BMP codepoint;
no setup needed for non-Latin scripts. Thread-safe.
Client.make_searchable_pdf¶
See Client.make_searchable_pdf on the Clients page for the full signature.
turboocr.searchable_pdf module¶
make_searchable_pdf
¶
make_searchable_pdf(original: bytes, response: PdfResponse | OcrResponse, *, dpi: int | None = None, font_path: str | None = None) -> bytes
Overlay an invisible OCR text layer on the input.
Accepts a PDF or any single-page image. Tested input formats: PDF,
PNG, JPEG, BMP, TIFF, GIF, WebP. Image inputs are wrapped into a
single-page PDF first, sized to the image's pixel dimensions at
dpi. The detection is by magic bytes, so the caller does not
have to tell the function which format the input is in.
By default uses a bundled glyphless font that covers every Basic Multilingual Plane codepoint, so non-Latin scans (CJK, Arabic, Cyrillic, …) work out of the box with zero configuration.
Pass font_path=<my.ttf> only if you have a specific reason to embed a
real visible font instead.
Font errors¶
FontError
¶
Bases: RuntimeError
Raised when a caller-supplied font cannot render the OCR text.
The default code path never raises this — the bundled glyphless font
covers every Basic Multilingual Plane codepoint. You can only hit it
by passing font_path=<my.ttf> to a font that lacks glyphs the OCR
text needs.