Markdown rendering¶
render_to_markdown(...) walks the reading order and maps each layout class to
a Markdown construct (doc_title → # H1, display_formula → $$ ... $$,
table → HTML or fenced block, etc.). The label-to-construct map is pluggable
via MarkdownStyle.
Two shorter forms are available too:
render_ocr_to_markdown(ocr_response)— single imagerender_pdf_to_markdown(pdf_response)— multi-page PDFClient.to_markdown(...)— one-call convenience that doesrecognize_image(orrecognize_pdf) +render_to_markdownin one step
MarkdownDocument.structured() returns the parsed tree (a list of
MarkdownNode objects) for programmatic inspection.
render_to_markdown¶
render_to_markdown
¶
render_to_markdown(response: OcrResponse | PdfResponse, *, style: MarkdownStyle | None = None) -> MarkdownDocument
Render an OCR or PDF response to a MarkdownDocument.
Dispatches on the response type: PdfResponse
is rendered page-by-page with --- page-break separators;
OcrResponse is rendered as a single page.
| PARAMETER | DESCRIPTION |
|---|---|
response
|
Either an
TYPE:
|
style
|
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
MarkdownDocument
|
A |
MarkdownDocument
|
structured |
MarkdownDocument
|
( |
| RAISES | DESCRIPTION |
|---|---|
ProtocolError
|
A layout-enabled response is internally
inconsistent — e.g. a text item without |
MarkdownStyle¶
MarkdownStyle
¶
MarkdownStyle(*, rules: Mapping[str, StyleRule] | None = None, default: StyleRule | None = None, renderers: Mapping[NodeKind, RenderFn] | None = None)
Configurable label-to-Markdown classifier plus per-kind renderers.
Holds two mappings used by
render_to_markdown: a label →
StyleRule dict and a
NodeKind → renderer-function dict.
Override either via register / register_renderer to customise the
output for project-specific layouts.
Build a MarkdownStyle from optional rule/renderer overrides.
| PARAMETER | DESCRIPTION |
|---|---|
rules
|
Replacement label →
TYPE:
|
default
|
Fallback rule for labels not present in
TYPE:
|
renderers
|
Replacement
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
A new |
| METHOD | DESCRIPTION |
|---|---|
register |
Register or override a label-to-kind classification rule. |
register_renderer |
Register or override the render function for a node kind. |
classify |
|
render |
|
copy |
|
register
¶
register(label: str, kind: NodeKind, level: int = 0) -> None
Register or override a label-to-kind classification rule.
| PARAMETER | DESCRIPTION |
|---|---|
label
|
Layout label as it appears on
TYPE:
|
kind
|
Target
TYPE:
|
level
|
Heading level (1-6) when
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
None. Mutates the style in place. |
register_renderer
¶
register_renderer(kind: NodeKind, renderer: RenderFn) -> None
Register or override the render function for a node kind.
| PARAMETER | DESCRIPTION |
|---|---|
kind
|
The
TYPE:
|
renderer
|
A callable
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
None. Mutates the style in place. |
MarkdownDocument¶
MarkdownDocument
¶
Bases: BaseModel
| METHOD | DESCRIPTION |
|---|---|
structured |
Return the parsed nodes as plain dicts, for programmatic inspection. |
| ATTRIBUTE | DESCRIPTION |
|---|---|
model_config |
|
source |
TYPE:
|
pages |
TYPE:
|
nodes |
TYPE:
|
markdown |
TYPE:
|
structured
¶
Return the parsed nodes as plain dicts, for programmatic inspection.
Equivalent to [n.model_dump(exclude_none=True) for n in self.nodes].
Useful when you want to walk the document tree without depending on
the MarkdownNode type — e.g. when handing
the structure to a templating engine or another language runtime.
MarkdownNode¶
MarkdownNode
¶
Bases: BaseModel
| ATTRIBUTE | DESCRIPTION |
|---|---|
model_config |
|
kind |
TYPE:
|
text |
TYPE:
|
level |
TYPE:
|
label |
TYPE:
|
bounding_box |
TYPE:
|
layout_id |
TYPE:
|
order_index |
TYPE:
|