# GIGACHAD TRUTH LEDGER
*2026-04-27. Honest categorization of all claimed work.*

> Purpose of this document: stop conflating **measured** with **nicely named**.

> **Physarium-v1 errata mandatory:** every kill-%, PPL, and sparsity number
> here is v1 magnitude-flow. Read with `reports/PHYSARIUM_RESULTS_RECONCILE.md`
> + `reports/PHYSARIUM_COVERAGE_AUDIT.md`. Tile coverage = 100 %, denominator
> must be stated (target proj weights vs full model).

---

## A. MEASURED_NATIVE — actually working in C++/CUDA, measured

| What | Where | Metric | Source |
|---|---|---|---|
| sparse_attn_v4 CUDA kernel | `~/v4flash/metal_native/attention_full/sparse_attn_v4.cu` | cosine **1.000** vs Python flash_mvp | bench_sparse_attn_v4 |
| Compressor decode (CSA+HCA) | `compressor_decode.cu` + `compressor_hca.cu` | cosine **0.9997**, top-K exact 9/9 | bench_compressor_parity |
| Indexer top-K | `indexer_csa.cu`, `indexer_decode.cu` | top-K exact 9/9 | bench_indexer_parity |
| Real-weight loader | `real_weight_loader.cpp` (FP8 e8m0 → FP32, BF16, e4m3) | bytes-exact match safetensors | bench_decode_native_real |
| PLANCK_PACK v1 | `planck_pack.cpp` + `planck_pack_build.cpp` (147 GB) | byte-for-byte verify 50/50 PASS | planck_pack_verify |
| Native synthetic chain (16 tok) | `bench_decode_native` | r4 10.14 ms, r128 19.76 ms, **648 ms/tok = 1.54 tok/s** | log |
| Real-weight chain (16 tok) | `bench_decode_native_real` | **536 ms/tok = 1.86 tok/s = 14× Python warm** | log |
| 18-layer text decode after PLANCK_PACK | `bench_decode_text` | **2.08 sec/tok** vs 12.4 sec/tok mmap (6× speedup) | log |
| Full 43-layer V4 text loop | `bench_decode_text 43L` | 0.16 tok/s, 8.59 GB VRAM, real PHP/Laravel scaffold output | snake_qwen.log |
| MMLU candidate scorer | `bench_mmlu_native` + `candidate_head_logits_launch` | 5/5 synthetic correct, 200× faster than full GEMV | log |
| ffn_out parity after MoE fix | `bench_decode_text` PARITY_L0=1 | cos 0.41 → **0.99293** (5× drift → 1.06× drift) | parity_l0_compare.py |
| Hot RAM cache (top-500) | `expert_pool.cpp` | 56% RAM hits, 126→100 sec/q **(-21%)**, 5/5 retained | log |
| Hot 1000 page-cache thrash | same, hot1000 | 380 sec/q vs 100 sec/q **(3.8× WORSE)** — pinning >7 GB breaks the page cache | measured |
| Roofline trace (PHASE_TRACE=1) | `block_orchestrator_streaming.cpp` cudaEvents | **89% wall = expert_req+moe**, 60% of that = pure disk wait at 48 ms/expert | bench_roofline_q1.log |

**No embellishment.** These numbers are reproducible.

---

## B. MEASURED_MODEL_SURGERY — actual work on weights

| What | Where | Metric | Honest delta |
|---|---|---|---|
| C++ Physarum engine (energy-conserving softmax) | `folder/physarum_engine.cpp` (137 lines) + `.exe` | killed=20.6%, block_size=256, n_iter=30, threshold "D < mean·0.1" | `final_results.json` |
| `Physarum-05B-Organic` artifact | `folder/Physarum-05B-Organic/model.safetensors` (988 MB) | File is **the same size** as the original (200-byte header difference). 14.94% global / ~22% per-tensor zeroed | `weight_diff.json` |
| Pruning pattern verification | Phase-1 forensic | 168 / 290 tensors modified (24 layers × 7 projections); zeros scattered ACROSS ALL rows/columns (corner_128 share = 0.4-4.7%) — block-tiled, not corner-only | `sparsity_pattern.json` |
| PPL / accuracy delta vs original | `bench_speed_quality.py` | sample PPL **+12.5%** (9.38 → 10.55), MMLU-mini **−22%** (90→70%), GSM8K-mini **−20%** (100→80%), JSON-repair smoke 100→100%, code-skeleton 100→100% | `bench_*.json` |
| Speed delta | same | disk size 0%, VRAM +1%, decode_128 +14% (within noise) | same |
| pipeline_organic.py orig run | `folder/pipeline_organic.py` | "killed_pct: 20.6, ppl_delta_pct: +15.3, baseline_tps 27.15 / pruned_tps 27.55" | `final_results.json` |

**Real:** the 0.5B weights are Physarum-pruned, ~20% damaged on hard tasks, undamaged on simple ones (JSON, code skeleton).

---

## C. PROTOTYPE_SCAFFOLD — Python scaffold, marked as such

| Artefact | What it actually is | Not intelligence — just |
|---|---|---|
| `gigachad_chat.py` (~280 lines Python) | CLI wrapper around organ_runner | dispatcher harness |
| `runtime/manual_dispatcher.py` | regex routing | rule-based switcher |
| `runtime/organ_runner.py` | transformers.AutoModel + greedy decode | inference call wrapper |
| `runtime/dag_logger.py` | JSON dump per call | event log |
| `runtime/hologram_store.py` | saves `{form_id, anchors, output_hash}` JSON | metadata accumulator (never reused) |
| `runtime/hologram_retriever.py` | ngram + anchor jaccard | sparse text similarity |
| `runtime/topological_memory.py` | the same jaccards with different weights across 4 "shapes" | a brand on top of ngram, not geometry |
| `runtime/physarium_field.py` | food/poison counters with arbitrary coefficients | a counter without predictive validation |
| `runtime/anti_hallucination.py` | regex "perhaps/maybe" + label | text-pattern marker |
| `runtime/raw_archive.py` | enumerate files + line lookup | filesystem index |
| `runtime/micro_scroll_builder.py` | chunk + regex entity extraction | log summarizer |
| 9 organ JSON specs | prompt templates + expected schemas | spec |
| `regression/*.json` (140 tasks) | I/O pairs + expected fields | dataset |
| 366 DAG entries / 160 hologram forms | accumulated JSON logs | no read-back — never reused |

**All of this is useful as protocol and test harness, but it is not intelligence and not the runtime.**

---

## D. UNSAFE_CLAIMS / SOFT_CLAIMS — claims that were made but never proven

| Claim | Reality |
|---|---|
| "Phase-4 regression 132/140 = 94.3%" | Verifiers became more lenient over time. contradiction_extractor's regex-fallback accepts any text ≥30 chars containing the word "contradiction". triz_operator accepts any text with "principle" or a number. Strict JSON parsing would give ~70-80%. |
| "contradiction_extractor 100% chain pass" | In Phase-2 honest was orig 80% / phys 0%. In Phase-3+ the verifier was weakened to a 3-level fallback (strict JSON → regex extract → "contradiction word + 30 chars"). The model did not improve, the test did. |
| "TRIZ operator 100%" | evaluator in `schemas.py::eval_triz_operator` falls back to "any output mentioning principle/numbers". The real quality of 40-principles mapping is unverified. |
| "Topological memory works" | A/B/C across 200 queries: A (old jaccard) 67% hit, B (topology) **15.5% hit**, C (hybrid) 36.5%. Topology **lost** at retrieval — relabelled "advisory" to soften it. |
| "Hologram retrieval = live memory" | 160 forms on disk. `HOLOGRAM_REUSE=log-only`. No organ-call is ever bypassed. It is a metadata pile, not reuse. |
| "Anti-hallucination layer" | Not a model-level check. `re.search("perhaps\|maybe\|might")` + label string. The `hallucination_risk` number is never read downstream. |
| "Physarium field food/poison" | The weights `+3.0 / +1.5 / -3.0 / -2.0` are made up. The "food-score correlates with future test_pass" test was never run. |
| "Authority advisory_only_until_regression>100" | We have 291 passing DAGs (far >100). But "advisory" stays because hard-mode does not work. The phrasing masks "does not work". |
| "Shadow router 12.9%" → "wait until 70%" | Correctly flagged 🔴, but the 0.5B model is structurally incapable of multi-class JSON output from prompt tuning alone. This is an architectural blocker, not a temporary one. |
| "9 organs ready" | On a damaged 0.5B (MMLU 70% vs orig 90%). Every organ that requires reasoning (router, contradiction, claim_extractor) leans on a damaged base. |
| "Multi-topology resonant memory" | square/hexagon/star8/trident are **different weights in one formula** `w·jaccard_ngram + w·jaccard_anchor + w·route_match`. The "shape" of memory = a brand, not geometry. |
| "Phase 1-5 as proof of architecture" | Phase-1 was an honest audit. Phase 2-5 = scaffolding around the same 0.5B. The architecture is not proven — it is rolled out as a Python skeleton. |

---

## Summary

```
A. MEASURED_NATIVE           → REAL  (V4-Flash C++/CUDA, parity proven, 1.86 tok/s)
B. MEASURED_MODEL_SURGERY    → REAL  (Physarum-0.5B exists, damaged ~20% on hard, healthy on easy)
C. PROTOTYPE_SCAFFOLD        → USEFUL AS HARNESS, not as runtime
D. UNSAFE_CLAIMS             → MARKETING. To be excluded from baseline references.
```

**Decision:** A + B = the core. C = protocol. D = removed from "proven" metrics.

---

## What to treat as ground truth going forward

1. **C++/CUDA inference** in `~/v4flash/metal_native/` — the only native runtime that actually works.
2. **`folder/physarum_engine.cpp`** — the only real C++ weight-surgery code; must be ported into `gigachad_native/src/physarium/`.
3. **regression I/O pairs** — a useful dataset, but **under strict verifiers**, not lenient ones.
4. **DAG schema + hologram schema + organ spec** — kept as protocol.
5. **Physarium-05B-Organic** — a real artefact for tests.

**What we drop as "architecture proof":**
- 132/140 regression — **re-evaluate under hard verifiers**.
- Topology retrieval — **no longer position as "memory intelligence"**.
- Anti-hallucination regex — **not to be conflated with real fact-claim verification**.
- Physarium food/poison — **not to drive decisions until validated**.
