GIGACHAD TRUTH LEDGER
2026-04-27. Honest categorization of all claimed work.
Purpose of this document: stop conflating measured with nicely named.
Physarium-v1 errata mandatory: every kill-%, PPL, and sparsity number
here is v1 magnitude-flow. Read with reports/PHYSARIUM_RESULTS_RECONCILE.md
+ reports/PHYSARIUM_COVERAGE_AUDIT.md. Tile coverage = 100 %, denominator
must be stated (target proj weights vs full model).
A. MEASURED_NATIVE — actually working in C++/CUDA, measured
| What | Where | Metric | Source | |---|---|---|---| | sparse_attn_v4 CUDA kernel | ~/v4flash/metal_native/attention_full/sparse_attn_v4.cu | cosine 1.000 vs Python flash_mvp | bench_sparse_attn_v4 | | Compressor decode (CSA+HCA) | compressor_decode.cu + compressor_hca.cu | cosine 0.9997, top-K exact 9/9 | bench_compressor_parity | | Indexer top-K | indexer_csa.cu, indexer_decode.cu | top-K exact 9/9 | bench_indexer_parity | | Real-weight loader | real_weight_loader.cpp (FP8 e8m0 → FP32, BF16, e4m3) | bytes-exact match safetensors | bench_decode_native_real | | PLANCK_PACK v1 | planck_pack.cpp + planck_pack_build.cpp (147 GB) | byte-for-byte verify 50/50 PASS | planck_pack_verify | | Native synthetic chain (16 tok) | bench_decode_native | r4 10.14 ms, r128 19.76 ms, 648 ms/tok = 1.54 tok/s | log | | Real-weight chain (16 tok) | bench_decode_native_real | 536 ms/tok = 1.86 tok/s = 14× Python warm | log | | 18-layer text decode after PLANCK_PACK | bench_decode_text | 2.08 sec/tok vs 12.4 sec/tok mmap (6× speedup) | log | | Full 43-layer V4 text loop | bench_decode_text 43L | 0.16 tok/s, 8.59 GB VRAM, real PHP/Laravel scaffold output | snake_qwen.log | | MMLU candidate scorer | bench_mmlu_native + candidate_head_logits_launch | 5/5 synthetic correct, 200× faster than full GEMV | log | | ffn_out parity after MoE fix | bench_decode_text PARITY_L0=1 | cos 0.41 → 0.99293 (5× drift → 1.06× drift) | parity_l0_compare.py | | Hot RAM cache (top-500) | expert_pool.cpp | 56% RAM hits, 126→100 sec/q (-21%), 5/5 retained | log | | Hot 1000 page-cache thrash | same, hot1000 | 380 sec/q vs 100 sec/q (3.8× WORSE) — pinning >7 GB breaks the page cache | measured | | Roofline trace (PHASE_TRACE=1) | block_orchestrator_streaming.cpp cudaEvents | 89% wall = expert_req+moe, 60% of that = pure disk wait at 48 ms/expert | bench_roofline_q1.log |
No embellishment. These numbers are reproducible.
B. MEASURED_MODEL_SURGERY — actual work on weights
| What | Where | Metric | Honest delta | |---|---|---|---| | C++ Physarum engine (energy-conserving softmax) | folder/physarum_engine.cpp (137 lines) + .exe | killed=20.6%, block_size=256, n_iter=30, threshold "D < mean·0.1" | final_results.json | | Physarum-05B-Organic artifact | folder/Physarum-05B-Organic/model.safetensors (988 MB) | File is the same size as the original (200-byte header difference). 14.94% global / ~22% per-tensor zeroed | weight_diff.json | | Pruning pattern verification | Phase-1 forensic | 168 / 290 tensors modified (24 layers × 7 projections); zeros scattered ACROSS ALL rows/columns (corner_128 share = 0.4-4.7%) — block-tiled, not corner-only | sparsity_pattern.json | | PPL / accuracy delta vs original | bench_speed_quality.py | sample PPL +12.5% (9.38 → 10.55), MMLU-mini −22% (90→70%), GSM8K-mini −20% (100→80%), JSON-repair smoke 100→100%, code-skeleton 100→100% | bench_*.json | | Speed delta | same | disk size 0%, VRAM +1%, decode_128 +14% (within noise) | same | | pipeline_organic.py orig run | folder/pipeline_organic.py | "killed_pct: 20.6, ppl_delta_pct: +15.3, baseline_tps 27.15 / pruned_tps 27.55" | final_results.json |
Real: the 0.5B weights are Physarum-pruned, ~20% damaged on hard tasks, undamaged on simple ones (JSON, code skeleton).
C. PROTOTYPE_SCAFFOLD — Python scaffold, marked as such
| Artefact | What it actually is | Not intelligence — just | |---|---|---| | gigachad_chat.py (~280 lines Python) | CLI wrapper around organ_runner | dispatcher harness | | runtime/manual_dispatcher.py | regex routing | rule-based switcher | | runtime/organ_runner.py | transformers.AutoModel + greedy decode | inference call wrapper | | runtime/dag_logger.py | JSON dump per call | event log | | runtime/hologram_store.py | saves {form_id, anchors, output_hash} JSON | metadata accumulator (never reused) | | runtime/hologram_retriever.py | ngram + anchor jaccard | sparse text similarity | | runtime/topological_memory.py | the same jaccards with different weights across 4 "shapes" | a brand on top of ngram, not geometry | | runtime/physarium_field.py | food/poison counters with arbitrary coefficients | a counter without predictive validation | | runtime/anti_hallucination.py | regex "perhaps/maybe" + label | text-pattern marker | | runtime/raw_archive.py | enumerate files + line lookup | filesystem index | | runtime/micro_scroll_builder.py | chunk + regex entity extraction | log summarizer | | 9 organ JSON specs | prompt templates + expected schemas | spec | | regression/*.json (140 tasks) | I/O pairs + expected fields | dataset | | 366 DAG entries / 160 hologram forms | accumulated JSON logs | no read-back — never reused |
All of this is useful as protocol and test harness, but it is not intelligence and not the runtime.
D. UNSAFE_CLAIMS / SOFT_CLAIMS — claims that were made but never proven
| Claim | Reality | |---|---| | "Phase-4 regression 132/140 = 94.3%" | Verifiers became more lenient over time. contradiction_extractor's regex-fallback accepts any text ≥30 chars containing the word "contradiction". triz_operator accepts any text with "principle" or a number. Strict JSON parsing would give ~70-80%. | | "contradiction_extractor 100% chain pass" | In Phase-2 honest was orig 80% / phys 0%. In Phase-3+ the verifier was weakened to a 3-level fallback (strict JSON → regex extract → "contradiction word + 30 chars"). The model did not improve, the test did. | | "TRIZ operator 100%" | evaluator in schemas.py::eval_triz_operator falls back to "any output mentioning principle/numbers". The real quality of 40-principles mapping is unverified. | | "Topological memory works" | A/B/C across 200 queries: A (old jaccard) 67% hit, B (topology) 15.5% hit, C (hybrid) 36.5%. Topology lost at retrieval — relabelled "advisory" to soften it. | | "Hologram retrieval = live memory" | 160 forms on disk. HOLOGRAM_REUSE=log-only. No organ-call is ever bypassed. It is a metadata pile, not reuse. | | "Anti-hallucination layer" | Not a model-level check. re.search("perhaps\|maybe\|might") + label string. The hallucination_risk number is never read downstream. | | "Physarium field food/poison" | The weights +3.0 / +1.5 / -3.0 / -2.0 are made up. The "food-score correlates with future test_pass" test was never run. | | "Authority advisory_only_until_regression>100" | We have 291 passing DAGs (far >100). But "advisory" stays because hard-mode does not work. The phrasing masks "does not work". | | "Shadow router 12.9%" → "wait until 70%" | Correctly flagged 🔴, but the 0.5B model is structurally incapable of multi-class JSON output from prompt tuning alone. This is an architectural blocker, not a temporary one. | | "9 organs ready" | On a damaged 0.5B (MMLU 70% vs orig 90%). Every organ that requires reasoning (router, contradiction, claim_extractor) leans on a damaged base. | | "Multi-topology resonant memory" | square/hexagon/star8/trident are different weights in one formula w·jaccard_ngram + w·jaccard_anchor + w·route_match. The "shape" of memory = a brand, not geometry. | | "Phase 1-5 as proof of architecture" | Phase-1 was an honest audit. Phase 2-5 = scaffolding around the same 0.5B. The architecture is not proven — it is rolled out as a Python skeleton. |
Summary
A. MEASURED_NATIVE → REAL (V4-Flash C++/CUDA, parity proven, 1.86 tok/s)
B. MEASURED_MODEL_SURGERY → REAL (Physarum-0.5B exists, damaged ~20% on hard, healthy on easy)
C. PROTOTYPE_SCAFFOLD → USEFUL AS HARNESS, not as runtime
D. UNSAFE_CLAIMS → MARKETING. To be excluded from baseline references.
Decision: A + B = the core. C = protocol. D = removed from "proven" metrics.
What to treat as ground truth going forward
- C++/CUDA inference in
~/v4flash/metal_native/— the only native runtime that actually works. folder/physarum_engine.cpp— the only real C++ weight-surgery code; must be ported intogigachad_native/src/physarium/.- regression I/O pairs — a useful dataset, but under strict verifiers, not lenient ones.
- DAG schema + hologram schema + organ spec — kept as protocol.
- Physarium-05B-Organic — a real artefact for tests.
What we drop as "architecture proof":
- 132/140 regression — re-evaluate under hard verifiers.
- Topology retrieval — no longer position as "memory intelligence".
- Anti-hallucination regex — not to be conflated with real fact-claim verification.
- Physarium food/poison — not to drive decisions until validated.