Methodology

How the numbers are produced

Published benchmark results are only credible with a documented, reproducible methodology. All measurements on this site satisfy the four non-negotiable commitments below.

Reference machine

CPU AMD Ryzen 7 3800X (Zen 2) — 8C/16T silicon, SMT disabled, 8 logical CPUs exposed during benchmarks
RAM 32 GB DDR4-3200
Board ASUS ROG STRIX B550-F GAMING
OS Ubuntu Server LTS (dual-boot)
Mode Headless boot — no display server, no graphical desktop. Eliminates compositor and display-server noise from measurements.
Boot isolcpus=1-7 nohz_full=1-7 rcu_nocbs=1-7; benchmarks pin to cores 4–7 via taskset
BIOS Core Performance Boost disabled, SMT disabled
ISA SSE4.2 · AVX · AVX2 · FMA · no AVX-512

Zen 2 implements 256-bit AVX2 as two 128-bit µops — called out explicitly in any SIMD post. Full lscpu --extended output, kernel version, and compiler version are committed to the repo alongside each benchmark result.

Four non-negotiable commitments

1

CPU governor pinned to performance

Every benchmark run begins with cpupower frequency-set -g performance. Frequency scaling during a measurement run is a leading cause of variance that makes numbers look better or worse than they really are.
2

Turbo Boost disabled

Core Performance Boost is disabled in BIOS. State is verified by run_one.sh via cpupower frequency-info before any benchmark binary runs; the result is exported as CRUCIBLE_TURBO=off and recorded in the JSON machine.turbo field. If turbo state cannot be determined the script exits non-zero rather than silently recording a wrong value. Boost obscures the true steady-state throughput the predictor and cache hierarchy deliver at nominal frequency.
3

Core isolation

Cores 1–7 are isolated at the kernel level via isolcpus=1-7 nohz_full=1-7 rcu_nocbs=1-7 boot parameters — scoped to a dedicated GRUB entry (“Ubuntu (benchmark — cores 0-7 isolated)”) distinct from the standard development entry. Within that isolated set, benchmarks are additionally pinned to cores 4–7 via taskset (invoked by the per-demo wrapper scripts), with cores 0–3 absorbing any residual kernel housekeeping the isolation directives cannot redirect. SMT is disabled at the BIOS level — verified via /sys/devices/system/cpu/smt/active returning 0 and lscpu reporting 8 CPUs — to remove SMT-sibling resource sharing (L1, L2, execution ports, frontend) from all measurements. Isolated CPU IDs are recorded in each demo’s JSON machine.isolated_cpus field.

Cross-CCX results. Cpu 0 still carries the system timer and other unmovable kernel work that isolcpus= cannot fully evict, so cross-CCX measurements (cores 0–3 and 4–7 both in the isolated set) carry slightly higher ambient noise than intra-CCX measurements and are labelled accordingly in every post.
4

Statistical reporting

Each benchmark uses one of three rep-count conventions, depending on what kind of statistic it reports. Every post’s footer states which convention it used.
  • Throughput / steady-state median (demos 1, 2, 3): ≥20 outer repetitions (Google Benchmark --benchmark_repetitions); aggregates computed across those repetitions.
  • Tail-latency distribution (demos 4, 5): 5 outer runs × 1 M timed samples per run through a custom latency pipeline; percentiles computed from histograms merged across runs.
  • Working-set sweep (demos 6, 7): 5 outer repetitions per cell; median ns_per_op reported. Sweep coverage substitutes for higher per-cell rep count.
Every chart states which statistic it shows:
  • Median — typical-case latency
  • Min — best the hardware can do (cache warm, predictor trained)
  • p99 / p99.9 — tail-latency claims
  • IQR — spread around the median

Additional best-practice items

Building and reproducing

Each demo lives under bench/demos/<NN-slug>/ with its own README.md documenting the harness, inputs, and any demo-specific build flags. Headline captures use the per-demo orchestration script under bench/scripts/.

git clone https://github.com/GarethCooke/Crucible
cd Crucible/bench
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release
cmake --build build --target bench_<NN>_<slug>
./bench/scripts/run_one.sh <NN-slug>

run_one.sh requires sudo and the cpuset package (sudo apt install cpuset on Ubuntu); it runs the benchmark binary inside a cset shield on cores 4–7 and tears the shield down automatically.

Source on GitHub: GarethCooke/Crucible ↗. Each demo’s directory has its own README with demo-specific notes; this page covers the conventions that apply across all of them.

References