# PHYSARIUM-7B SURGERY REPORT

**Phase 7 — Physarium-7B top brain surgery**
**Date:** 2026-04-27
**Binary:** `build/physarium7b_surgery` (native C++17, no Python in hot path)

> **Read errata first:** numbers below are Physarium-v1 magnitude-flow.
> Read through `reports/PHYSARIUM_RESULTS_RECONCILE.md` and
> `reports/PHYSARIUM_COVERAGE_AUDIT.md`. Tile coverage of target proj
> tensors = 100 %; the 22.22 % kill rate is over those weights
> (denominator = 6,525,288,448), or 19.04 % if denominator = full 7B model
> (7,615,616,512).

## Inputs / outputs

| Item                 | Value                                              |
|----------------------|----------------------------------------------------|
| Donor model          | `/home/pc/qwen7b/instruct/` (Qwen2.5-7B-Instruct)  |
| Donor disk size      | 15 GB                                              |
| Output dir           | `/home/pc/gigachad_native/Physarium-7B-Native/`    |
| Output disk size     | 15 GB (15,231,233,024 bytes per index)             |
| Run log              | `reports/physarium7b_surgery_run.log`              |
| Surgery params       | block_size=256, n_iter=30, beta=2.0                |

Donor used as **DONOR ONLY** per `ARCHITECTURE_LOCK.md`. The output directory is the
top brain `physarium_7b`; the original Qwen weights are not part of the runtime.

## Aggregate sparsity

| Metric                                | Value                       |
|---------------------------------------|-----------------------------|
| Target proj weights total             | 6,525,288,448               |
| Weights killed (set to 0)             | 1,450,103,613               |
| Killed % (target weights)             | **22.22 %**                 |
| Tensors logged                        | 196 (28 layers × 7 projs)   |
| Min per-tensor sparsity               | 16.32 %                     |
| Max per-tensor sparsity               | 45.92 %                     |
| Mean per-tensor sparsity              | 22.33 %                     |

## Per-projection sparsity (28 layers each)

| Projection             | n  | mean   | min    | max    |
|------------------------|----|--------|--------|--------|
| `mlp.down_proj`        | 28 | 22.41% | 19.91% | 43.54% |
| `mlp.gate_proj`        | 28 | 22.15% | 18.48% | 45.92% |
| `mlp.up_proj`          | 28 | 22.09% | 19.42% | 45.52% |
| `self_attn.k_proj`     | 28 | 23.19% | 16.50% | 30.04% |
| `self_attn.o_proj`     | 28 | 21.79% | 18.86% | 26.19% |
| `self_attn.q_proj`     | 28 | 22.63% | 19.35% | 32.48% |
| `self_attn.v_proj`     | 28 | 22.06% | 16.32% | 29.85% |

Layer 1 was the densest-pruned (43–46% on MLP projections) — physarium found
a lot of weak channels to flatten there. Layers 0 and 2 also showed elevated
MLP sparsity (~30–36%). Deeper layers settled in the 18–23% band, which is
consistent with the 0.5B Phase-1 surgery profile.

## Output integrity

| Check                                       | Result            |
|---------------------------------------------|-------------------|
| 4 shards present                            | ✅ all 4          |
| `model.safetensors.index.json` parseable    | ✅                |
| `weight_map` entry count                    | 339               |
| `total_size` field                          | 15,231,233,024    |
| All shards open via safetensors loader      | ✅ 339/339        |
| Sample tensor dtype                         | `torch.bfloat16`  |
| Sample shape (`embed_tokens.weight`)        | `[152064, 3584]`  |
| dtype preserved (BF16 → BF16)               | ✅                |
| Shapes preserved (donor → output)           | ✅ (header copy)  |
| Failed tensors                              | 0                 |
| Tokenizer / config / merges copied          | ✅ all            |

The native surgery binary streams BF16 in, expands to FP32 inside the block,
runs the physarum block, contracts to BF16, and writes back into a
header-preserving copy of each shard. Tensor offsets in the safetensors header
are reused unchanged, so the index file from the donor remains valid.

## Wall-clock

| Stage                         | Time              |
|-------------------------------|-------------------|
| Total wall                    | **2775.6 s** (46.3 min) |
| Shard 3 (layers 14–22)        | ~14 min           |
| Shard 4 (layers 22–27 + head) | ~9 min            |
| Shard 1 (layers 0–6)          | ~11 min           |
| Shard 2 (layers 7–13 + head6) | ~16 min           |

Shards processed in directory-iterator order (3, 4, 1, 2 by file mtime),
cumulative killed at each shard boundary:
- after shard 3: 402,974,047
- after shard 4: 653,435,633 (Δ 250,461,586)
- after shard 1: 1,048,031,185 (Δ 394,595,552)
- after shard 2: 1,450,103,613 (Δ 402,072,428, final)

## What is not in this report

- **Inference quality.** No forward pass was run against the pruned model.
  The 7B has no native CUDA backend in this tree yet (organ farm is the
  near-term path); a quality probe needs either an HF transformers eval or a
  native model runner that does not yet exist.
- **Per-layer sparsity for layers 22–27.** Log skipped some boundary
  tensors (shard 4 only contained the tail of layer 22 and the lm_head),
  so the 196 logged tensors cover 28 layers × 7 projs and not the
  full 28×7. See `reports/physarium7b_surgery_run.log` for raw per-tensor
  output.
- **Comparison to a Python pipeline_organic.py run.** This was a
  C++-native run; no parity probe was performed against the legacy Python
  surgery script.

## Honest assessment

- ✅ Native C++ surgery binary works end-to-end on a real 15 GB BF16 donor.
- ✅ Output is a structurally valid safetensors model (loadable, indexable,
  dtype-preserving, shape-preserving).
- ✅ 22.22 % weights killed in target projections — within the expected
  "organic" sparsity band for physarium block surgery.
- ⚠️ "It loads" ≠ "it generates". Inference quality of `Physarium-7B-Native`
  is **untested**; the next step needed to claim a working top brain is a
  forward-pass eval, which is out of scope for this Phase-7 task.
- ⚠️ Output sits at 15 GB BF16. With the `tier_default: VRAM` policy in
  `organs/organ_farm.json`, this does not fit on a single 8 GB RTX 3060 Ti
  and will need quantization or partial offload before live use. This is a
  known constraint, not a regression.
