Physarium results — reconcile (errata-grade)

Date: 2026-04-27 Status: This document is a mandatory errata insert for every Physarium-v1 claim. Read it before reading any other Physarium number. Companion audit: PHYSARIUM_COVERAGE_AUDIT.md (denominator forensics).

The two rules

Two distinct experiments — never mix.
Every kill-percentage must state its denominator.

Experiment A — `final_results` / organic surgery line

Source: surgery binary's running totals on ~/gigachad_native/Physarium-7B-Native/ and ~/Physarum-05B-Organic/.

| Metric | Value | Denominator | |-----------------------------------------|----------------------|------------------------------| | Killed weights (7B) | 1,450,103,613 | 6,525,288,448 target proj | | Kill rate over target proj weights | 22.22 % | 100 % of target proj covered | | Kill rate over full 7B model | 19.04 % | 7,615,616,512 total weights | | Held-out perplexity delta (organic 0.5B run) | +15.3 % | held-out test set |

processed / target = 100 % per PHYSARIUM_COVERAGE_AUDIT.md — non-overlap 256×256 tiling sees every weight in every target tensor, so the 22.22 % denominator is whole-target, not a sub-window.

Per-projection 7B kill range:

min 21.79 % (o_proj), max 23.19 % (k_proj).
Layer-1 MLP projections were the densest pruned (43–46 %).

Experiment B — `lm-eval metric_results`

Source: lm-eval-harness external probe on a separately-hosted run.

| Metric | Value | |-----------------------------------------|----------------------| | Perplexity | 19.62 → 19.94 | | MMLU machine_learning subset | +0.9 pp |

Not directly comparable to Experiment A's PPL because of different test sets, tokenization, and a different surgery snapshot. Cite it as "lm-eval B" in any joint report.

What Physarium v1 actually is

physarum_block() operates on the magnitude of each weight:

input feature: D_i = |w_i| (or a smooth function of magnitude)
dynamics: nutrient flow + decay biased toward high-magnitude paths
output: a binary keep / kill mask over each tile's weights

It does not see activations. It does not see gradients. It does not see contribution to the loss or to specific output logits. The slime-mold geometry is honest, but the food signal it eats is static weight magnitude inside each non-overlapping 256×256 tile.

Physarium v1 = static magnitude-flow surgery, tile-local.

What Physarium v2 needs

A proper activation-aware Physarium-v2 must compute per-weight importance as

importance(w_ij) = act_norm(input_i) · stability(w_ij) · contribution(w_ij → output)

…where:

act_norm(input_i) = per-feature activation magnitude over a calibration set
stability(w_ij) = variance of w_ij's saliency under input noise
contribution(w_ij → output) = signed influence on downstream loss (e.g. Hessian-based)

Physarium v2 = activation-aware flow.

Until v2 exists, every v1 number must travel with this errata insert.

Wording template (mandatory in every report that quotes v1)

Physarium-v1 numbers must be read through PHYSARIUM_RESULTS_RECONCILE.md:

two different experiments (organic surgery vs lm-eval), v1 is

magnitude-flow (not activation-aware), and every killed-% must state its

denominator. Tile coverage of target proj tensors is 100 % (audit:

PHYSARIUM_COVERAGE_AUDIT.md); the 22.22 % figure is over those target

weights, not a sub-sample.

Where the numbers live

7B surgery raw log: reports/physarium7b_surgery_run.log
7B surgery summary: reports/PHYSARIUM7B_SURGERY_REPORT.md
Coverage audit (denominator forensics): reports/PHYSARIUM_COVERAGE_AUDIT.md + reports/physarium_coverage_audit.json
Phase-7 consolidated: reports/GIGACHAD_PHASE7_CONSOLIDATED.md
0.5B organic source: ~/Physarum-05B-Organic/
Surgery code: src/physarium/physarium_engine.cpp::prune_matrix()

TL;DR

Two experiments, not one.
v1 is magnitude-flow, not activation-aware.
Tile coverage of target proj = 100 % (not 6 %).
22.22 % kills relative to all target proj weights (denominator stated).
19.04 % kills relative to the full 7B model (denominator stated).
v2 is the next research line, requires real activations.

Physarium results — reconcile (errata-grade)

Physarium results — reconcile (errata-grade)

The two rules

Experiment A — final_results / organic surgery line

Experiment B — lm-eval metric_results

What Physarium v1 actually is

What Physarium v2 needs

Wording template (mandatory in every report that quotes v1)

Where the numbers live

TL;DR

Experiment A — `final_results` / organic surgery line

Experiment B — `lm-eval metric_results`