mdka

mdka is a HTML to Markdown written in Rust. “ka” means “化 (か)” pointing to conversion.

It aims to strike a practical balance between conversion quality and runtime efficiency — readable output from real-world HTML, without sacrificing speed or memory.

At a Glance

What you give it	What you get back
Any HTML string — a full page, a snippet, CMS output, SPA-rendered DOM	Clean, readable Markdown
A list of HTML files	Parallel Markdown output via rayon
A conversion mode (`minimal`, `semantic`, …)	Pre-processed output tuned for your use case

Key Properties

Parser foundation: scraper, which is built on html5ever — the same battle-tested parser used by the Servo browser engine. It handles malformed, deeply-nested, and real-world HTML gracefully.
Crash-resistant: a non-recursive DFS traversal means even 10,000 levels of nesting will not overflow the stack.
Configurable: five conversion modes let you tune the pre-processing pipeline — from noise-free LLM input to lossless archiving.
Multi-language: available as a Rust library, a Node.js package (napi-rs), and a Python package (PyO3).

When to Choose mdka

mdka is a good fit if you need:

Stable, predictable output from diverse HTML sources (CMS, SPA, scraped pages)
Mode-based pre-processing to strip navigation, preserve ARIA, or retain attributes
Memory efficiency at scale (bulk file conversion, streaming pipelines)
Multi-language access from a single underlying Rust implementation

If raw speed on simple, well-formed HTML is the only concern, a streaming rewriter will be faster.

New to mdka? Start with Installation.
Ready to integrate? Jump to Usage & Examples.
Evaluating? Read Design Philosophy and Performance Characteristics.

mdka — HTML to Markdown converter

mdka

At a Glance

Key Properties

When to Choose mdka

Quick Navigation

Keyboard shortcuts

mdka — HTML to Markdown converter

mdka

At a Glance

Key Properties

When to Choose mdka

Quick Navigation