mdka
mdka is a HTML to Markdown written in Rust. “ka” means “化 (か)” pointing to conversion.
It aims to strike a practical balance between conversion quality and runtime efficiency — readable output from real-world HTML, without sacrificing speed or memory.
At a Glance
| What you give it | What you get back |
|---|---|
| Any HTML string — a full page, a snippet, CMS output, SPA-rendered DOM | Clean, readable Markdown |
| A list of HTML files | Parallel Markdown output via rayon |
A conversion mode (minimal, semantic, …) | Pre-processed output tuned for your use case |
Key Properties
- Parser foundation: scraper, which is built on html5ever — the same battle-tested parser used by the Servo browser engine. It handles malformed, deeply-nested, and real-world HTML gracefully.
- Crash-resistant: a non-recursive DFS traversal means even 10,000 levels of nesting will not overflow the stack.
- Configurable: five conversion modes let you tune the pre-processing pipeline — from noise-free LLM input to lossless archiving.
- Multi-language: available as a Rust library, a Node.js package (napi-rs), and a Python package (PyO3).
When to Choose mdka
mdka is a good fit if you need:
- Stable, predictable output from diverse HTML sources (CMS, SPA, scraped pages)
- Mode-based pre-processing to strip navigation, preserve ARIA, or retain attributes
- Memory efficiency at scale (bulk file conversion, streaming pipelines)
- Multi-language access from a single underlying Rust implementation
If raw speed on simple, well-formed HTML is the only concern, a streaming rewriter will be faster.
Quick Navigation
- New to mdka? Start with Installation.
- Ready to integrate? Jump to Usage & Examples.
- Evaluating? Read Design Philosophy and Performance Characteristics.