Examples¶
Every example below is a runnable script in
docs/.
Each runs end-to-end against the bundled ACME invoice fixture — no extra
config needed beyond a running TurboOCR server reachable at
http://localhost:8000 (override with TURBO_OCR_BASE_URL).
export TURBO_OCR_BASE_URL=http://localhost:8000 # optional, this is the default
python docs/00_quickstart.py
python docs/02_pdf_to_markdown.py
00 Quickstart¶
The smallest useful script — sync image OCR with Client.
01 Image OCR with layout¶
Image OCR with layout, reading_order, include_blocks. Dumps both text
items (with bbox + confidence) and reading-order-grouped blocks.
examples/01_image_ocr_with_layout.py
02 PDF → Markdown¶
recognize_pdf + render_to_markdown on a multi-page invoice.
examples/02_pdf_to_markdown.py
03 Searchable PDF¶
PDF → invisible-text-overlay PDF, verified via pypdf.extract_text().
04 Async client¶
AsyncClient + asyncio.gather for concurrent OCR.
05 Batch¶
recognize_batch over multiple images, walked via iter_results() so
per-slot failures surface cleanly.
06 gRPC¶
GrpcClient — same surface as Client, gRPC transport. Requires
pip install 'turboocr[grpc]'.
07 Retry and timeout¶
Custom RetryPolicy (5 attempts, bounded backoff) plus per-request
timeout=.
examples/07_retry_and_timeout.py
08 Custom httpx.Client¶
Pass your own httpx.Client for custom TLS / mTLS / connection limits /
proxies.
examples/08_custom_httpx_client.py
09 Markdown style¶
Register a custom layout label + renderer on MarkdownStyle to extend the
default label-to-Markdown mapping.
10 Tables and formulas¶
Iterate response.tables / response.formulas. See Layout →
Tables/Formulas for the partial-support caveat.
examples/10_tables_and_formulas.py
11 Folder pipeline¶
AsyncClient + asyncio.Semaphore for a bounded-concurrency PDF→Markdown
folder pipeline.
examples/11_folder_pipeline.py
12 Hooks and logging¶
httpx on_request / on_response event hooks plus the SDK's stdlib logger.