Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Benchmark results

This page is the reader’s view: a curated summary of matten’s benchmark results so they are readable from inside the book. It is a small representative selection, not the full matrix — the complete numbers, environment details, and regeneration steps live in the reports under benchmarks/reports/. If you want to run the benchmarks, see the methodology and the harness README.md.

These numbers are workload-specific and environment-specific. They were produced on one virtualized machine with microbenchmark methodology. They are a positioning and regression-visibility reference — not a ranking, and not a “faster than X” claim. matten optimizes for time to a runnable PoC, not benchmark leadership.

The numbers below are the v0.2 maintainer refresh at workspace 0.28.3, produced under the unchanged RFC-049 methodology. The architect-accepted reference baseline is v0.1 (see the reports); the relative positioning matches v0.1. Absolute timings drift run-to-run with VM load — all libraries move together — so the shape of the results is the signal, not the exact microseconds.

Phase 1 — internal baseline

matten measured against itself, to establish a reference point and make future regressions visible (RFC-049 Phase 1).

  • Baseline ID: matten-rfc049-internal-baseline-v0.2 — maintainer refresh at v0.28.3 (reference: …-v0.1, accepted 2026-06-24).
  • Environment: Ubuntu 26.04, 8 vCPU AMD (virtualized), rustc 1.93.1, profile bench (opt-level 3), Criterion defaults; git 5953c9f, workspace 0.28.3. Not comparable across machines.

Representative medians (full table in the report):

WorkloadTime (median)
construction (4096-element vector)~1.0 µs
elementwise add (4096 elements)~10.3 µs
matmul (64×64)~78 µs
sum_axis + mean_axis (64×64, combined)~1.30 ms
cosine similarity (len 512)~803 ns
linear-regression GD step (m=256)~2.23 µs

Peak RSS was not captured in this refresh (the VM lacked GNU /usr/bin/time); it is informative-only and never a gate. The accepted v0.1 baseline recorded ~44 MiB for the full scenario run under the same methodology, dominated by Criterion’s own footprint rather than the small tensors.

The clearest signal is that axis reductions are currently matten’s most expensive core path — the combined sum_axis/mean_axis workload (~1.30 ms) is roughly 400× the whole-tensor sum/mean (~3.23 µs) and ~17× a 64×64 matmul. This is recorded as positioning / regression-visibility information, not a defect: it is the natural first place to look if axis-reduction cost ever matters for your workload.

Phase 2 — Rust peer comparison

The same small problems placed next to two established Rust numeric crates, ndarray and nalgebra, each in its native type (RFC-049 Phase 2). This shows where matten’s approachable Tensor API sits — including where it is slower but acceptable — not a ranking of libraries.

  • Report ID: matten-rfc049-rust-peer-comparison-v0.2 — maintainer refresh at v0.28.3 (reference: …-v0.1, accepted 2026-06-25).
  • Environment: same machine class as the baseline; git 5953c9f, workspace 0.28.3, ndarray 0.17.2, nalgebra 0.33.3. Peer tasks are opt-in behind the peers feature (off by default). This run was taken at ndarray 0.17.2, so the harness now matches the matten-ndarray bridge’s supported ndarray version. Not comparable across machines.

Representative Criterion medians (full six-task table in the report):

Taskmattenndarraynalgebra
markov step (v·P, n=64)~924 ns~1.16 µs~2.15 µs
cosine similarity (len 512)~626 ns~175 ns~138 ns
matmul (64×64)~80.8 µs~10.8 µs~10.7 µs
heat step (operator·u, n=64)~6.77 µs~752 ns~741 ns

On these small dense kernels the production-oriented peers generally carry less overhead than matten’s Tensor API — expected, and consistent with matten’s DX-first role. The size of the gap is the useful part, and it is not uniform: a vector×matrix step (markov) is competitive here — ahead of both peers at this size — while dense matmul and matrix×vector steps (heat, pagerank) show the widest gaps (~7.5–9×). A consistent internal pattern is that matten’s matrix×vector path is its widest gap while its vector×matrix path is competitive — echoing the axis-reduction signal from Phase 1.

  • Methodology — what is measured, what is not, and the rules that keep the program honest.
  • Full reports with complete tables, environment, and regeneration commands: benchmarks/reports/internal-baseline-v0.2.md and benchmarks/reports/peer-comparison-v0.2.md (and the accepted v0.1 references alongside them).

Phases 3 (NumPy/Pandas reference) and 4 (regression gates) are designed in RFC-049 but deferred and not yet measured.