Benchmark results

This page is the reader’s view: a curated summary of matten’s benchmark results so they are readable from inside the book. It is a small representative selection, not the full matrix — the complete numbers, environment details, and regeneration steps live in the reports under benchmarks/reports/. If you want to run the benchmarks, see the methodology and the harness README.md.

These numbers are workload-specific and environment-specific. They were produced on one virtualized machine with microbenchmark methodology. They are a positioning and regression-visibility reference — not a ranking, and not a “faster than X” claim. matten optimizes for time to a runnable PoC, not benchmark leadership.

The numbers below are the v0.2 maintainer refresh at workspace 0.28.3, produced under the unchanged RFC-049 methodology. The architect-accepted reference baseline is v0.1 (see the reports); the relative positioning matches v0.1. Absolute timings drift run-to-run with VM load — all libraries move together — so the shape of the results is the signal, not the exact microseconds.

Phase 1 — internal baseline

matten measured against itself, to establish a reference point and make future regressions visible (RFC-049 Phase 1).

Baseline ID: matten-rfc049-internal-baseline-v0.2 — maintainer refresh at v0.28.3 (reference: …-v0.1, accepted 2026-06-24).
Environment: Ubuntu 26.04, 8 vCPU AMD (virtualized), rustc 1.93.1, profile bench (opt-level 3), Criterion defaults; git 5953c9f, workspace 0.28.3. Not comparable across machines.

Representative medians (full table in the report):

Workload	Time (median)
construction (4096-element vector)	~1.0 µs
elementwise add (4096 elements)	~10.3 µs
`matmul` (64×64)	~78 µs
`sum_axis` + `mean_axis` (64×64, combined)	~1.30 ms
cosine similarity (len 512)	~803 ns
linear-regression GD step (m=256)	~2.23 µs

Peak RSS was not captured in this refresh (the VM lacked GNU /usr/bin/time); it is informative-only and never a gate. The accepted v0.1 baseline recorded ~44 MiB for the full scenario run under the same methodology, dominated by Criterion’s own footprint rather than the small tensors.

The clearest signal is that axis reductions are currently matten’s most expensive core path — the combined sum_axis/mean_axis workload (~1.30 ms) is roughly 400× the whole-tensor sum/mean (~3.23 µs) and ~17× a 64×64 matmul. This is recorded as positioning / regression-visibility information, not a defect: it is the natural first place to look if axis-reduction cost ever matters for your workload.

Phase 2 — Rust peer comparison

The same small problems placed next to two established Rust numeric crates, ndarray and nalgebra, each in its native type (RFC-049 Phase 2). This shows where matten’s approachable Tensor API sits — including where it is slower but acceptable — not a ranking of libraries.

Report ID: matten-rfc049-rust-peer-comparison-v0.2 — maintainer refresh at v0.28.3 (reference: …-v0.1, accepted 2026-06-25).
Environment: same machine class as the baseline; git 5953c9f, workspace 0.28.3, ndarray 0.17.2, nalgebra 0.33.3. Peer tasks are opt-in behind the peers feature (off by default). This run was taken at ndarray 0.17.2, so the harness now matches the matten-ndarray bridge’s supported ndarray version. Not comparable across machines.

Representative Criterion medians (full six-task table in the report):

Task	matten	ndarray	nalgebra
markov step (v·P, n=64)	~924 ns	~1.16 µs	~2.15 µs
cosine similarity (len 512)	~626 ns	~175 ns	~138 ns
`matmul` (64×64)	~80.8 µs	~10.8 µs	~10.7 µs
heat step (operator·u, n=64)	~6.77 µs	~752 ns	~741 ns

On these small dense kernels the production-oriented peers generally carry less overhead than matten’s Tensor API — expected, and consistent with matten’s DX-first role. The size of the gap is the useful part, and it is not uniform: a vector×matrix step (markov) is competitive here — ahead of both peers at this size — while dense matmul and matrix×vector steps (heat, pagerank) show the widest gaps (~7.5–9×). A consistent internal pattern is that matten’s matrix×vector path is its widest gap while its vector×matrix path is competitive — echoing the axis-reduction signal from Phase 1.

Keyboard shortcuts

matten

Benchmark results

Phase 1 — internal baseline

Phase 2 — Rust peer comparison

Read next