Skip to content

Markdown rendering

render_to_markdown(...) walks the reading order and maps each layout class to a Markdown construct (doc_title# H1, display_formula$$ ... $$, table → HTML or fenced block, etc.). The label-to-construct map is pluggable via MarkdownStyle.

Two shorter forms are available too:

  • render_ocr_to_markdown(ocr_response) — single image
  • render_pdf_to_markdown(pdf_response) — multi-page PDF
  • Client.to_markdown(...) — one-call convenience that does recognize_image (or recognize_pdf) + render_to_markdown in one step

MarkdownDocument.structured() returns the parsed tree (a list of MarkdownNode objects) for programmatic inspection.

render_to_markdown

render_to_markdown

render_to_markdown(response: OcrResponse | PdfResponse, *, style: MarkdownStyle | None = None) -> MarkdownDocument

Render an OCR or PDF response to a MarkdownDocument.

Dispatches on the response type: PdfResponse is rendered page-by-page with --- page-break separators; OcrResponse is rendered as a single page.

PARAMETER DESCRIPTION
response

Either an OcrResponse (single image) or a PdfResponse (multi-page). Must have been produced with include_blocks=True for richest output; bare token-only responses fall back to one paragraph per token line.

TYPE: OcrResponse | PdfResponse

style

MarkdownStyle controlling label-to-node classification and per-kind renderers. None uses DEFAULT_STYLE.

TYPE: MarkdownStyle | None DEFAULT: None

RETURNS DESCRIPTION
MarkdownDocument

A MarkdownDocument with the rendered .markdown string, the

MarkdownDocument

structured .nodes list, the .source discriminator

MarkdownDocument

("image" / "pdf"), and .pages.

RAISES DESCRIPTION
ProtocolError

A layout-enabled response is internally inconsistent — e.g. a text item without layout_id, or a reading_order index pointing at a layout box that is not in the response.

MarkdownStyle

MarkdownStyle

MarkdownStyle(*, rules: Mapping[str, StyleRule] | None = None, default: StyleRule | None = None, renderers: Mapping[NodeKind, RenderFn] | None = None)

Configurable label-to-Markdown classifier plus per-kind renderers.

Holds two mappings used by render_to_markdown: a label → StyleRule dict and a NodeKind → renderer-function dict. Override either via register / register_renderer to customise the output for project-specific layouts.

Build a MarkdownStyle from optional rule/renderer overrides.

PARAMETER DESCRIPTION
rules

Replacement label → StyleRule mapping. When None, the SDK's built-in defaults are used (doc_title → heading level 1, paragraph_title → heading level 2, table → table, …).

TYPE: Mapping[str, StyleRule] | None DEFAULT: None

default

Fallback rule for labels not present in rules. Defaults to StyleRule(NodeKind.paragraph).

TYPE: StyleRule | None DEFAULT: None

renderers

Replacement NodeKind → render-function mapping. When None, the SDK's built-in renderers are used.

TYPE: Mapping[NodeKind, RenderFn] | None DEFAULT: None

RETURNS DESCRIPTION
None

A new MarkdownStyle instance.

METHOD DESCRIPTION
register

Register or override a label-to-kind classification rule.

register_renderer

Register or override the render function for a node kind.

classify
render
copy

register

register(label: str, kind: NodeKind, level: int = 0) -> None

Register or override a label-to-kind classification rule.

PARAMETER DESCRIPTION
label

Layout label as it appears on LayoutBox.class_name or Block.class_name (e.g. "doc_title", "table", or a custom server-emitted label).

TYPE: str

kind

Target NodeKind.

TYPE: NodeKind

level

Heading level (1-6) when kind=NodeKind.heading. Ignored for other kinds. Default 0.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
None

None. Mutates the style in place.

register_renderer

register_renderer(kind: NodeKind, renderer: RenderFn) -> None

Register or override the render function for a node kind.

PARAMETER DESCRIPTION
kind

The NodeKind to customise.

TYPE: NodeKind

renderer

A callable (MarkdownNode) -> str that produces the Markdown for one node. Returning an empty string skips the node.

TYPE: RenderFn

RETURNS DESCRIPTION
None

None. Mutates the style in place.

classify

classify(label: str) -> StyleRule

render

render(node: MarkdownNode) -> str

copy

copy() -> MarkdownStyle

MarkdownDocument

MarkdownDocument

Bases: BaseModel

METHOD DESCRIPTION
structured

Return the parsed nodes as plain dicts, for programmatic inspection.

ATTRIBUTE DESCRIPTION
model_config

source

TYPE: Literal['image', 'pdf']

pages

TYPE: int

nodes

TYPE: list[MarkdownNode]

markdown

TYPE: str

model_config class-attribute instance-attribute

model_config = ConfigDict(frozen=True)

source class-attribute instance-attribute

source: Literal['image', 'pdf'] = 'image'

pages class-attribute instance-attribute

pages: int = 1

nodes instance-attribute

nodes: list[MarkdownNode]

markdown instance-attribute

markdown: str

structured

structured() -> list[dict[str, object]]

Return the parsed nodes as plain dicts, for programmatic inspection.

Equivalent to [n.model_dump(exclude_none=True) for n in self.nodes]. Useful when you want to walk the document tree without depending on the MarkdownNode type — e.g. when handing the structure to a templating engine or another language runtime.

MarkdownNode

MarkdownNode

Bases: BaseModel

ATTRIBUTE DESCRIPTION
model_config

kind

TYPE: NodeKind

text

TYPE: str

level

TYPE: int

label

TYPE: str | None

bounding_box

TYPE: BoundingBox | None

layout_id

TYPE: int | None

order_index

TYPE: int | None

model_config class-attribute instance-attribute

model_config = ConfigDict(frozen=True)

kind instance-attribute

kind: NodeKind

text instance-attribute

text: str

level class-attribute instance-attribute

level: int = 0

label class-attribute instance-attribute

label: str | None = None

bounding_box class-attribute instance-attribute

bounding_box: BoundingBox | None = None

layout_id class-attribute instance-attribute

layout_id: int | None = None

order_index class-attribute instance-attribute

order_index: int | None = None

NodeKind

NodeKind

Bases: StrEnum

ATTRIBUTE DESCRIPTION
heading

paragraph

list_item

table

figure

formula

inline_formula

code

footer

header

aside

page_break

heading class-attribute instance-attribute

heading = 'heading'

paragraph class-attribute instance-attribute

paragraph = 'paragraph'

list_item class-attribute instance-attribute

list_item = 'list_item'

table class-attribute instance-attribute

table = 'table'

figure class-attribute instance-attribute

figure = 'figure'

formula class-attribute instance-attribute

formula = 'formula'

inline_formula class-attribute instance-attribute

inline_formula = 'inline_formula'

code class-attribute instance-attribute

code = 'code'

footer class-attribute instance-attribute

footer = 'footer'

header class-attribute instance-attribute

header = 'header'

aside class-attribute instance-attribute

aside = 'aside'

page_break class-attribute instance-attribute

page_break = 'page_break'