Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

mdka

mdka is a HTML to Markdown written in Rust. “ka” means “化 (か)” pointing to conversion.

It aims to strike a practical balance between conversion quality and runtime efficiency — readable output from real-world HTML, without sacrificing speed or memory.

At a Glance

What you give itWhat you get back
Any HTML string — a full page, a snippet, CMS output, SPA-rendered DOMClean, readable Markdown
A list of HTML filesParallel Markdown output via rayon
A conversion mode (minimal, semantic, …)Pre-processed output tuned for your use case

Key Properties

  • Parser foundation: scraper, which is built on html5ever — the same battle-tested parser used by the Servo browser engine. It handles malformed, deeply-nested, and real-world HTML gracefully.
  • Crash-resistant: a non-recursive DFS traversal means even 10,000 levels of nesting will not overflow the stack.
  • Configurable: five conversion modes let you tune the pre-processing pipeline — from noise-free LLM input to lossless archiving.
  • Multi-language: available as a Rust library, a Node.js package (napi-rs), and a Python package (PyO3).

When to Choose mdka

mdka is a good fit if you need:

  • Stable, predictable output from diverse HTML sources (CMS, SPA, scraped pages)
  • Mode-based pre-processing to strip navigation, preserve ARIA, or retain attributes
  • Memory efficiency at scale (bulk file conversion, streaming pipelines)
  • Multi-language access from a single underlying Rust implementation

If raw speed on simple, well-formed HTML is the only concern, a streaming rewriter will be faster.

Quick Navigation