Features

Crash Resistance

mdka uses non-recursive DFS traversal throughout. An explicit Vec stack replaces the call stack, so documents with arbitrarily deep nesting will not cause a stack overflow. This has been tested with 10,000 levels of nested <div> elements.

Some fast converters use recursive tree traversal and will crash on deeply nested input. If your input source is not fully controlled, crash resistance matters.

Five Conversion Modes

Rather than a single fixed conversion strategy, mdka offers five named modes that tune the pre-processing pipeline:

Balanced — readable output for general use
Strict — maximum attribute retention for debugging
Minimal — body text only; good for LLM input preparation
Semantic — preserves ARIA and document structure
Preserve — maximum fidelity for archiving

Each mode can be further customised with per-call option flags. See Conversion Modes and ConversionOptions.

Parallel File Conversion

html_files_to_markdown and html_files_to_markdown_with use rayon to convert multiple files in parallel. Each file’s result is independent — one failed file does not stop the batch.

The Node.js and Python bindings expose this as an async function (htmlFilesToMarkdown, html_files_to_markdown) so the thread pool work does not block the caller’s event loop or hold the GIL.

Multi-Language API

The same Rust implementation is accessible from three languages:

Language	Package	Mechanism
Rust	`mdka` on crates.io	native library
Node.js	`mdka` on npm	napi-rs native module
Python	`mdka` on PyPI	PyO3 extension module

All three call the same underlying conversion code and produce identical output for identical input.

html5ever Parser Foundation

The HTML parser is scraper, which is built on html5ever. html5ever implements the HTML5 parsing algorithm, the same one that web browsers use.

This means:

Missing closing tags are inferred correctly
Unknown elements are preserved (not silently dropped)
Malformed attribute syntax is normalised
The result is always a valid DOM tree, no matter the input

Predictable, Deterministic Output

For a given HTML input and ConversionOptions, mdka always produces the same Markdown string. There is no randomisation, no date-stamping, and no version-dependent output variation within a semver major version.

Minimal Dependencies

The runtime dependencies of the mdka library crate are:

Crate	Purpose
`scraper`	HTML parsing (html5ever wrapper)
`ego-tree`	DOM tree traversal
`rayon`	Parallel file conversion
`tikv-jemallocator`, `tikv-jemalloc-ctl`	Ensures fragmentation avoidance and scalable concurrency
`thiserror`	`MdkaError` derive macro

Benchmark and comparison dependencies (criterion, competitors) are [dev-dependencies] and do not affect library consumers.

Keyboard shortcuts

mdka — HTML to Markdown converter