# GIGACHAD_NATIVE LAB — full history tree & chronometry

Compiled 2026-04-30 from reports/, docs/, archive/, DAG entries, task tracker.
Single source of truth for "what happened, when, with which model, what was claimed, what was true".

---

## 0. The big arc

```
2026-04-?? .. 2026-04-25  V4-FLASH ERA           pure-PyTorch DeepSeek-V4-Flash on 8 GB
2026-04-25 .. 2026-04-26  FLASH ENGINE PLAN      pivot to native C++/CUDA inference
2026-04-26 .. 2026-04-27  PHYSARIUM RESEARCH     organic surgery + 0.5B donor experiments
2026-04-27                PHASE 6                native consolidation, truth ledger
2026-04-27                PHASE 7                Physarium-7B surgery + organ farm
2026-04-27 .. 2026-04-28  PHASE 8 (A..F3)        native inference pipeline, identity surgery
2026-04-28 .. 2026-04-29  PHASE 9 (A..F.1)       Fusion Mass + identity LoRA 14/14
2026-04-29                PHASE 10               universal LLM surgery protocol doc
2026-04-29                PHASE 11 (A..D)        acceptance run 18/18
2026-04-29 .. 2026-04-30  PHASE 12 (.0/.H1/.CR/.TC/.TR/.HFR)  NanoOS + native loops + form replay
2026-04-29 .. 2026-04-30  CLEAN-ROOM DOCTRINE    patient autopsies, no external runtime deps
2026-04-30                PHASE 13 (BD1..BD3)    Black-Dog Organ Colony — audit + bug-fix
```

DAG span on disk (`dag/runs/`): **2026-04-27T11:57:20Z → 2026-04-30T15:10:49Z**, 2266 entries.

---

## 1. Pre-history — V4-Flash era (full chapter, ≈ 2026-04-21 → 2026-04-26)

Before GIGACHAD existed there was a long Python+C-extension era pushing
**DeepSeek-V4-Flash** onto a single **RTX 3060 Ti 8 GB** + 13 GB RAM + 80 GB swap inside WSL2. This was not "pre-history" — it was the entire prior phase. Multiple models were downloaded, sliced, repacked, and benchmarked here. Tools and concepts that survived into GIGACHAD (Physarium surgery, planck packs, monolith VRAM pool, hot-expert cache, singularity index) were **born in this folder**.

### 1.1 Physical scale of what was downloaded and run

| asset                                         | size              | path / log                                                |
|-----------------------------------------------|-------------------|-----------------------------------------------------------|
| DeepSeek-V4-Flash full HF download            | **148.7 GB**, 74 shards | `/mnt/c/Users/pc/Desktop/folder/deepseek_v4_flash/`, log: `v4_download.log` ("Done in 102.5 min") |
| adam_monolith.raw (packed Singularity Monolith) | **14.64 GB** uint32 | `/mnt/c/Users/pc/Desktop/folder/adam_monolith.raw` (mtime 2026-04-23 23:56) |
| Singularity Monolith → VRAM                    | 1.60 GB resident  | log: `[+] Monolith: 430,594,648 uint32 (1.60 GB in VRAM)` |
| Singularity index                              | 4992 projection entries | `singularity_index.json`                              |
| Singularity scales (BitNet-1.58b rescue)       | per-projection bf16 scalars | `singularity_scales.json` + `extract_scales.py`     |
| Build outputs                                  | `build/temp.linux-x86_64-cpython-312/` | C-extension intermediates                       |

C-extensions wired via `ctypes.CDLL`:

| `.so / .dll`            | role                                                            |
|--------------------------|-----------------------------------------------------------------|
| `planck_core{.so,.dll}`  | CPU decode of packed q8 blocks → fp32 output pointer            |
| `planck_core_v3{.so}`    | batched decode (`decode_batched_to_pointer`)                    |
| `planck_cuda{.so}`       | CUDA decode of q8 blocks (`cuda_decode_block_q8`)               |

These are the **direct precursors** to `Physarium-7B-Native/`'s `.planck` and `.q4planck` formats that landed in PHASE-8A. The cold-start move from packed q8 → 4 KB-aligned `.planck` mmap is the lineage.

### 1.2 Models downloaded, sliced, run (chronological)

| model                                  | role                                       | evidence                                  |
|----------------------------------------|--------------------------------------------|-------------------------------------------|
| **DeepSeek-V2-Lite 16B**               | predecessor pipeline (~3.4 s/tok, Python + io_uring + per-file Q8 experts) | referenced in `FLASH_ENGINE_PLAN.md` §0 |
| **DeepSeek-V4-Flash** (154B / 43L MoE / FP4 + FP8) | main optimization target                  | `v4_download.log`, `flash_mvp.py`, `pipeline_moe*.py`, `chat.py` |
| **Qwen2-14B (NF4 bnb_4bit)**           | secondary probe via Singularity Monolith   | `snake_qwen.log`: `[*] Loading Qwen2-14B in NF4 (bnb_4bit)`, `run_snake_qwen14b.py` |
| **Qwen 0.5B**                          | snake-organic probe                        | `snake_05b.log`, `run_snake_05b.py`        |
| **BitNet-1.58b**                       | Amplitude Rescue resurrection              | `CHAT_PATCH.md`: 2 edits / ~15 lines / `singularity_scales.json` mul at decode |

Output checks scattered through the era: `check_bf16.py`, `check_ds.py`, `check_keys.py`, `check_mlp.py`, `check_ppl.py`, `check_rope.py`, `check_v2.py` (per-component sanity probes), plus `bench_quant.py`, `measure_7b_ppl.py`.

### 1.3 The "snake" / "mycelium" / "ghost" / "blackhole" engine names

The naming evolved as the engine took shape:

```
mycelium_test.log  →  mycelium_v2.log
                          ↓
                      snake_05b.log / snake_qwen.log / snake_clean.log
                          ↓
                      ghost_test.log
                          ↓
                      blackhole_final_run.log
                          ↓
                      blackhole_absolute_final.log
                          ↓
                      working.log  →  current.log  →  final_benchmark.log
                          ↓
                      [pivot to GIGACHAD_NATIVE PHASE 6, 2026-04-27]
```

Each rename was a stage of the engine — the C-extension stack growing, the Singularity Monolith being packed, the lazy SSD-bound expert fetch maturing. The "Amplitude Rescue" (`rms_final.log`, `output_1bit.log`, `scales_test.log`) was the BitNet-1.58b sub-experiment.

### 1.4 The flash_mvp / chat.py architecture

`flash_mvp.py` (757 LoC) — the meta-init loader:
- meta-init the model (no storage allocated)
- index all 46 V4-Flash safetensors shards via `safe_open` (mmap, zero-copy)
- small params (norms, attn_sink, hc_*, biases, gate, embed, head) → GPU once
- linear weights: stay mmap'd, lazy-fetch per forward, discard
- routed experts: same lazy fetch (SSD-bound, but only 6 of 256 per token per layer)

Two PyTorch shims to make it compile under newer transformers:
- `fht_fallback.py` — fast_hadamard_transform shim
- `kernel_pytorch.py` — `kernel` module with PyTorch fallbacks

`chat.py` (824 LoC) — production chat path with the C-extension stack:
- `planck_core.decode_to_pointer` (CPU q8 → fp32 buffer)
- `planck_core_v3.decode_batched_to_pointer` (multi-tensor batched)
- `planck_cuda.cuda_decode_block_q8` (GPU q8 → buffer)
- `Monolith_VRAM_Pool` initialised with `singularity_index.json` (4992 proj entries)
- BitNet-1.58b Amplitude Rescue path: read `singularity_scales.json`, hold per-projection bf16 scalars on VRAM, zero-copy `mul` at decode time

### 1.5 Hard physical floor (V4_FLASH_TECH_BRIEF, 2026-04-25)

After 19 cumulative optimizations on top of the naive baseline:

| metric                       | value             |
|------------------------------|-------------------|
| cold prefill (1 prompt token) | 47.3 s           |
| warm decode best              | 7.5 s/tok        |
| warm decode median (p50)      | 9.6 s/tok        |
| warm decode p95               | 27.7 s/tok       |
| warm decode avg               | 13.8 s/tok       |
| steady-state (toks 9–16)      | ~9.5 s/tok       |
| target (user)                 | 3-4 s/tok        |
| stretch                       | 1 s/tok          |
| output                        | semantically valid (PHP/Laravel scaffolding from "hi" prompt) |

Authoritative artefact: `/mnt/c/Users/pc/Desktop/folder/V4_FLASH_TECH_BRIEF.md` (snapshot 2026-04-25 by Claude Opus 4.7, "hand to GPT-5.5 for advice"). Companion plan: `FLASH_ENGINE_PLAN.md` (369 LoC) — proposed targets ≤ 500 ms/tok decode, ≥ 20 tok/s prefill, ≤ 7.5 GB peak VRAM, ≤ 90 s cold start, 32K context usable.

### 1.6 Physarium surgery toolkit (born in V4-Flash folder)

Five Python tools that became the spine of GIGACHAD's PHASE 7:

| script                       | what                                                              |
|------------------------------|-------------------------------------------------------------------|
| `physarum_full.py`           | full Physarum-style mask: scipy COO matrix flow, mu=1.8 / γ=0.1 / n_iter=100 / threshold=0.01 / max_size=256 |
| `physarum_generator.py`      | dataset generator for the surgery loop                            |
| `physarum_global.py`         | global magnitude-flow (tile_size=256 sweeps)                       |
| `physarum_llm.py`            | binding into transformers `AutoModelForCausalLM`                    |
| `physarum_surgeon.py`        | the operator that walks tensors and applies the mask                |

These were NOT random Python noise — they are the algorithm that became `build/physarium7b_surgery` in PHASE 7 (the C++ port that produced `Physarium-7B-Native/`). The 22.22 % kill-percentage on Qwen2.5-7B-Instruct was achieved by the C++ port of THIS algorithm.

### 1.7 The MoE pipeline iterations

| script                         | role                                                          |
|--------------------------------|---------------------------------------------------------------|
| `pipeline.py`                  | naive single-shot                                              |
| `pipeline_moe.py`              | first MoE-aware variant                                        |
| `pipeline_moe_v2.py`           | corrected expert routing                                       |
| `pipeline_moe_v2_fixup.py`     | bug-patch over v2                                              |
| `pipeline_organic.py`          | organic surgery ablation pipeline                              |
| `pack_monolith.py`             | first Singularity Monolith packer                              |
| `pack_monolith_aligned.py`     | 4 KB-aligned variant (the alignment idea reused in `.planck`)  |

Plus diagnostics: `diagnose_cascade.py`, `diagnose_quant.py`, `cleanup.py`, `t.py / temp.py / test_*.py` for ad-hoc kernel + BMM + norm + MoE-layer testing.

### 1.8 Why the era ended (2026-04-26 → 2026-04-27)

Ground truth observed: **Python hot-path overhead** (busy-waits, ctypes round-trips, torch op-launch cost) dominated per-token latency. The 7.5 s/tok best-case under V4-Flash + 19 optimizations vs the 250-500 ms/tok target meant Python had to leave the hot path entirely. The answer was: rewrite the engine in C++/CUDA, keep Python only for tokenization, dataset generation, and forensic reports.

That decision became PHASE 6 — the Linux port of `physarum_engine.cpp` to `gigachad_physarium`, the new `gigachad_native` skeleton, the 4-category TRUTH_LEDGER. The V4-Flash folder was preserved as the historical lab; its surgery toolkit was C++-ported into PHASE 7's `physarium7b_surgery` binary; its Singularity Monolith concept reappeared as `.planck` packs in PHASE 8A.

### 1.9 Active artefacts still on disk (Windows lab, 2026-04-30)

- `/mnt/c/Users/pc/Desktop/folder/deepseek_v4_flash/` — 148.7 GB shards (kept for reference, not loaded by current runtime)
- `/mnt/c/Users/pc/Desktop/folder/adam_monolith.raw` — 14.64 GB packed (kept)
- `/mnt/c/Users/pc/Desktop/folder/Physarum-7B-Final/` (model.safetensors.index + tokenizer) — staging dir from V4-era
- `/mnt/c/Users/pc/Desktop/folder/Physarum-05B-Organic/` (model.safetensors single shard + tokenizer) — the donor that became `physarum05b.planck`
- `V4_FLASH_TECH_BRIEF.md` (365 LoC) + `FLASH_ENGINE_PLAN.md` (369 LoC) + `CHAT_PATCH.md` (174 LoC)
- 17 logs (`mycelium_*`, `ghost_*`, `snake_*`, `blackhole_*`, `working.log`, `current.log`, `final_*`, `output*.log`, `rms_final.log`, `scales_test.log`, `v4_download.log`)
- 53 Python scripts (chat / flash_mvp / physarum_* / pipeline_* / pack_monolith_* / kernel_pytorch / fht_fallback / extract_scales / measure_7b_ppl + per-component check_*.py)
- 1 build dir: `build/temp.linux-x86_64-cpython-312/`

---

## 2. Flash Engine plan (2026-04-25)

Plan doc: `/mnt/c/Users/pc/Desktop/folder/FLASH_ENGINE_PLAN.md`.

Targets:
- end-to-end decode ≤ 500 ms / token, stretch ≤ 250 ms
- prefill ≥ 20 tok/s @ batch 1
- peak VRAM ≤ 7.5 GB
- cold start ≤ 90 s
- 32K context usable

Per-token budget breakdown:
- Attention (CSA+HCA × 43 L) ~250 ms
- MoE compute (top-6 of 256 FP4) × 42 L ~120 ms
- Active weights disk read ~95 ms
- PCIe CPU→GPU cold experts ~30 ms
- Python/IO overhead target < 10 ms

Plan was scrapped in favour of GIGACHAD_NATIVE on 2026-04-27 — see §3.

---

## 3. PHASE 6 — Native consolidation (2026-04-27)

**Pivot.** Stop growing the Python swamp. Hot path = C++/CUDA. Python = forensic + dataset + report only.

Built (no LLM, no new models):

| binary                  | what                                                                  |
|-------------------------|-----------------------------------------------------------------------|
| `gigachad_physarium`    | Linux port of energy-economical Physarium engine (was Windows .exe). Selftest PASS. Determinism PASS. |
| `gigachad_native`       | demo CLI: dispatcher (regex) → STUB organ → hard_verifier → DAG entry. 8/8 selftest PASS. |

Hard verifier (`json_mini.cpp` recursive-descent + `hard_verifier.cpp`). No regex fallback, no "accept any text containing the word `contradiction`".

**Phase-6A — TRUTH_LEDGER** split everything into 4 categories:
- A. MEASURED_NATIVE (V4-Flash C++/CUDA, parity proven, 1.86 tok/s, 14× Python warm)
- B. MEASURED_MODEL_SURGERY (Physarum-0.5B exists, +12.5 % PPL, −22 % MMLU-mini)
- C. MARKETING (Phase-2 100 %-pass numbers re-scored under hard verifier → mostly 0 %)
- D. SCAFFOLDING (raw_archive, scrolls, holograms, physarium_field — Python only, port mandatory)

Re-score artefact: `reports/hard_verifier_rescore.json`. Authoritative report: `reports/GIGACHAD_PHASE6_NATIVE_CONSOLIDATION.md`.

---

## 4. PHASE 7 — Physarium-7B surgery + organ farm (2026-04-27)

First real surgical operation. **Donor: `/home/pc/qwen7b/instruct/` (Qwen2.5-7B-Instruct, BF16, 15 GB).**
**Output: `/home/pc/gigachad_native/Physarium-7B-Native/` (BF16, 15 GB, 339 tensors).**

Tool: `build/physarium7b_surgery` (193 KB native C++17, no Python in hot path).
Surgery: magnitude-flow (Physarium-v1), block_size=256, n_iter=30, β=2.0.

| metric                        | value               |
|-------------------------------|---------------------|
| target proj weights total     | 6,525,288,448       |
| weights killed (set to 0)     | 1,450,103,613       |
| kill % (over target proj)     | **22.22 %**         |
| kill % (over full 7B model)   | 19.04 %             |
| held-out PPL delta (organic 0.5B) | **+15.3 %**     |
| min / mean / max per-tensor sparsity | 16.32 / 22.33 / 45.92 % |
| wall                          | 2775.6 s (46.3 min) |

**Two distinct experiments — never mix.** `reports/PHYSARIUM_RESULTS_RECONCILE.md` is mandatory errata for every Physarium-v1 number. A vs B:

- **Experiment A — organic surgery:** kill % above, PPL delta on held-out test.
- **Experiment B — `lm-eval-harness` external probe:** PPL 19.62 → 19.94, MMLU `machine_learning` +0.9 pp.

Coverage audit: `reports/PHYSARIUM_COVERAGE_AUDIT.md` proves processed / target = 100 % (non-overlap 256×256 tiling sees every weight).

### Organ farm v1 declared

`organs/organ_farm.json` + 9 prompt templates under `organs/prompts/`. **8×0.5B** + **1×7B**:

| organ                          | role                       | tier  | verifier         |
|--------------------------------|----------------------------|-------|------------------|
| `phys05_json_repair`           | JSON syntax repair         | RAM   | json_strict      |
| `phys05_code_skeleton`         | code skeleton emit         | RAM   | code_compile     |
| `phys05_test_writer`           | unit test draft            | RAM   | test_runs        |
| `phys05_claim_extractor`       | source-bound claim extract | RAM   | source_present   |
| `phys05_triz_contradiction`    | TRIZ-style contradiction   | RAM   | hard_verifier    |
| `phys05_renderer`              | final answer render        | SSD   | hard_verifier    |
| `phys05_cache_matcher`         | route/pattern picker       | SSD   | source_present   |
| `phys05_critic_lite`           | rate 0-3 axes              | SSD   | json_strict      |
| `physarium_7b`                 | top brain                  | VRAM  | hard_verifier    |

(`phys05_wound` was MISSING — added 2026-04-30 in PHASE-13.)

Native tier manager: `src/runtime/tier_manager.cpp`. Persisted state: `physarium/tier_state.json`.

Authoritative reports: `PHYSARIUM7B_SURGERY_REPORT.md`, `GIGACHAD_PHASE7_CONSOLIDATED.md`.

---

## 5. PHASE 8 — Native inference pipeline (2026-04-27 → 2026-04-28)

The most surgically dense block. Built native Qwen2-arch runner that loads `.planck` packs (custom format), then collapses every Python ML dependency.

| sub-phase | landing                                                                                  |
|-----------|------------------------------------------------------------------------------------------|
| **8A**    | `PLANCK7B_PACK` format + writer + reader + verifier                                       |
| **8B**    | dense Qwen2-arch native runner                                                           |
| **8C0**   | sanity guard — determinism, top-k, NaN, checksum                                         |
| **8C**    | `organ_manager` + `planck_runner` wiring                                                  |
| **8D**    | first real E2E (ARIZ pipeline ariz_e2e_v1)                                                |
| **8E**    | CUDA fast path: `cuda_gemv_bf16` + pinned stager + `--backend cuda`                       |
| **8E.1**  | full GPU forward for 0.5B (kernels + GpuRunner)                                           |
| **8E.1b** | `--backend cuda` через `organ_manager` → все phys05 на CUDA                              |
| **8E.1c** | CUDA schema calibration (per-backend sampling + JSON balanced extract)                    |
| **8E.2**  | 7B layer streaming on CUDA (single-slot ring + per-step H2D)                              |
| **8E2_NUCLEAR** | Physarium-7B Q4 resident pack + fused CUDA dequant GEMV                            |
| **8E3b**  | hyperspeed — remove per-token syncs + persistent argmax buffer + microprofile             |
| **8E3 quality** | ChatML wrapper + identity / JSON smokes BF16 vs Q4                                  |
| **8E4**   | ChatML everywhere + identity hallucination gate + Q5_K track                              |
| **8E5/5b/6A/6B** | hyperspeed: CUDA graphs + fusion, fused residual+rmsnorm, fused silu_mul + down GEMV, tile-K shared-mem staged swiglu+down |
| **8E7**   | external kernel shootout — llama.cpp / ExLlamaV2 / AWQ                                   |
| **8E7B**  | llama.cpp backend integration                                                             |
| **8E8a**  | DP4A inner loop + delayed scale → 18.27 → 28.99 tok/s (×1.59), tg128 26.43 → 41.69        |
| **8F**    | decoder controls (repeat penalty, stop strings, balanced JSON, sampling)                  |
| **8F**    | prompt template surgery + memory injection                                                |
| **8F**    | regression — 5×{json_repair, code_skel, triz, claim}                                      |
| **8F0**   | identity audit (grep donor/chat tokens) + manifest + per-organ identity prompts + DAG `identity_version` + identity_probe regression (6 questions, DOD ≥5/6) |
| **8F1b**  | native batch regression runner (`gigachad_regression_native`)                             |
| **8F1c**  | calibration sweep (DAG defaults, json/code tuning, neg harness, audit fix)                |
| **8F2**   | ARIZ kernel + Black-Dog reinforcement loop ⇐ FIRST APPEARANCE OF BLACK-DOG                |
| **8F3**   | ARIZ trace builder (rule-based stages 1/4/5/6) — ariz_e2e_v4 → 3/3 verifier pass + full ARIZ trace + Black-Dog reinforcement |

End-of-phase artefacts: `Physarium-7B-Native/` (BF16 master), `physarium7b_identity.planck` + `.q4planck` (chatable Q4), `physarum05b.planck` (organs), `qwen2_5_7b_instruct.planck` + `.q4planck` (donor archived).

GPU runtime: `src/cuda/cuda_gemv.cu`, `src/cuda/cuda_q4_gemv.cu`, `gpu_runner.cpp`. Tokenizer: `src/tokenizer/qwen2_tokenizer.cpp`.

---

## 6. PHASE 9 — Fusion Mass + identity LoRA (2026-04-28 → 2026-04-29)

| sub-phase | landing                                                                          |
|-----------|----------------------------------------------------------------------------------|
| **9**     | Fusion Mass `--chat` + wire Q4 resident to `physarium_7b` organ                   |
| **9B**    | Fusion Compaction — shrink 7B prefill                                             |
| **9C**    | fast route pruning before parallel                                                |
| **9D**    | parallel organ slots + claim-skip                                                 |
| **9E**    | identity surgery reseed (gate→fail, no replace)                                   |
| **9F**    | identity LoRA surgery pipeline                                                    |
| **9F-RUN**| train + merge + repack identity LoRA                                              |
| **9F.1**  | memory-anchored seeds → **14/14 stretch**                                         |

Identity surgery loop codified: gate-fail → harvest DAG → seed → train QLoRA → merge → pack to Q4 → probe → acceptance. `Physarium-7B → Physarium-7B-identity`. Pack file: `physarium7b_identity.q4planck`.

Authoritative protocol: `docs/UNIVERSAL_LLM_SURGERY_PROTOCOL.md` (PHASE 10).

---

## 7. PHASE 11 — Acceptance run 18/18 (2026-04-29)

| sub-phase | landing                                                            |
|-----------|--------------------------------------------------------------------|
| **11**    | GIGACHAD_ACCEPTANCE_RUN_V1                                          |
| **11A**   | wire code/claim/test/memory routes into `--chat`                    |
| **11B**   | verifier alignment + code/PC fallbacks                              |
| **11C**   | code prose-strip + bench window 2000                                |
| **11D**   | organ-name LoRA patch → **18/18**                                   |

Architecture integrity acceptance held. The "18/18" is internal canon — see `reports/GIGACHAD_ACCEPTANCE_RUN_V1.json`.

---

## 8. PHASE 12 — NanoOS + native loops + form replay (2026-04-29 → 2026-04-30)

The most user-facing chunk. Phase 12 split into many sub-axes:

| sub-phase     | landing                                                                          |
|---------------|----------------------------------------------------------------------------------|
| **12 spec**   | `docs/PHASE_12_NANO_OS_EXECUTION_SUBSTRATE.md`                                    |
| **12.0**      | `chat_context_builder` (scrolls → system_msg)                                     |
| **12.H1**     | hologram exact-match cache in `run_chat` — sub-ms shortcut on identical prompts   |
| **12.CR**     | code-repair loop in native runtime, NOT in bench                                  |
| **12.CR.PAR** | parallel retry, kill the wall penalty (k=4 in parallel via `std::async`)          |
| **12.TC**     | tool-call schema-verified path (BFCL-shape)                                       |
| **12.TR**     | port terminal-task retry loop into C++ runtime                                    |
| **12.TR.B**   | 10-task Terminal NanoOS bench V1                                                  |
| **HEREDOC**   | `term_extract_bash_commands` heredoc + line-continuation aware                    |
| **HFR**       | holographic FORM replay — 4 patterns × 5 variants, 20/20 form-recognition         |

### NanoOS Capsule Substrate

`tools/capsule/shell_capsule.py` ships an Evidence object with replay_recipe + sha256 artifact hashes; gates 5 verifier kinds (`exit_zero / stdout_contains / file_exists / file_content / regex_match`).

### x100 axis claim trail

| claim file                                | what                                                                             |
|-------------------------------------------|----------------------------------------------------------------------------------|
| `EXACT_REPLAY_CACHE_V1.md` (renamed from `HOLOGRAM_REPLAY_X100.md`) | 5/5 exact repeats: ~2 ms warm, 140-403× speedup. **GREEN-utility, NOT strategic x100** (memoization). |
| `HOLOGRAPHIC_FORM_REPLAY_V1.md`           | 20/20 non-identical variants pass through 4 hand-curated `FormPattern` recognizers, all `model_called: false`, all 38-56 ms. **First REAL x100 advantage axis.** |

### Public-bench numbers landed

- MBPP n=100 (native_overtake_v1): PARROT 58 / MONSTER 73 (**+15**)
- HumanEval n=164 (native_overtake_v1): PARROT 106 / MONSTER 110 (**+4**)
- Combined n=264: **+18 / +16**
- Terminal NanoOS V1 (10 tasks): PARROT 7 / MONSTER 8 stable, 9 best-of-N (**+1 / +2**)
- Terminal NanoOS V2 (30 tasks): PARROT 20 / MONSTER 22 (**+2**), wall ratio 2.35×
- BFCL_SUBSET_V1 (10 hand): tied 100/100 (too easy for differential)

### llama.cpp integration

PARROT speed: 80+ tok/s raw via vendored llama-server (`tools/bench_external/gguf/physarium7b_identity-Q4_K_M.gguf`).

---

## 9. CLEAN-ROOM DOCTRINE — patient autopsies (2026-04-29)

External systems are **patients or env-flagged temporary oracles**, never runtime dependencies. Three legal modes only.

| autopsy file                        | what                                                              |
|-------------------------------------|-------------------------------------------------------------------|
| `PATIENT_LLAMA_CPP_AUTOPSY.md`      | kernel-level extraction (DP4A inner loop + per-32 Q8_1 activation scale) |
| `PATIENT_EXLLAMAV2_AUTOPSY.md`      | EXL2 mixed-bit autopsy                                             |
| `PATIENT_AWQ_MARLIN_AUTOPSY.md`     | activation-aware + Marlin int4                                     |
| `CLAUDE_CODE_OPUS47_AGENT_AUTOPSY.md` | autopsy of upstream agent harness                                |

Doctrine: `docs/CLEAN_ROOM_DOCTRINE.md`. Companion: `docs/RESEARCH_REFLEX_PROTOCOL.md`, `docs/GIGACHAD_AGENT_TEAM.md`, `docs/OBTEK_RULES.md`.

DP4A landing payoff: 18.27 → 28.99 tok/s (×1.59), tg128 26.43 → 41.69.

---

## 10. PHASE 13 — Black-Dog Organ Colony (2026-04-30, current)

Audit revealed:

```
last 2262 DAG entries:
  physarium_7b_chat       1966   86.9%
  physarium_7b             139    6.1%
  phys05_claim_extractor    57    2.5%
  phys05_code_skeleton      28    1.2%
  phys05_json_repair        20    0.9%
last 500 entries:
  physarium_7b_chat        500   100%   ← single-organ, no chain
```

5/8 0.5B organs had ZERO production calls; `phys05_wound` did not exist; Black-Dog conductance moved on ARIZ runs only (3 % of traffic).

### BD ladder

| step  | status      | what                                                                    |
|-------|-------------|-------------------------------------------------------------------------|
| BD1   | done        | `BLACK_DOG_ORGAN_AUDIT.md` — found four wiring bugs                     |
| BD2   | done        | All four bugs fixed. Conductance verified moving on terminal_native.    |
| BD3   | partial     | `tools/surgery/poison_to_dataset.py` shipped; harvested 663 poison rows. |
| BD4   | gated       | wire `Route::ariz` into `--chat` (currently legacy `--task` only).      |
| BD5   | gated       | learning curve probe — 30 tasks × 5 runs                                |
| BD6   | gated       | QLoRA per organ from BD3 datasets                                       |

### Four bugs that BD2 closed

1. `main.cpp:348` `(void)cond_*` discarded BD update from identity_fast DAG.
2. `run_native_*` paths skipped `bd_store` entirely — now wired in `terminal_native`, `tool_call`, code_repair attempts, and the new `organ_first` chain.
3. `run_chat_organ_route` hardcoded `food=1.0 / poison=0.0` — now verifier-driven (`bd_food = vr.ok ? 1 : 0`).
4. `cr_eval_one` (code-repair attempt) wrote food/poison but no conductance — now under thread-safe BD update.

Per-task verification on a single repeat run shows conductance growth across iterations:

```
0.0000 → 0.2000 → 0.3600 → 0.4880 → 0.5904
```

### New organs / templates

* `phys05_wound` registered (`organ_manager.cpp:279`); template at `organs/prompts/phys05_wound.txt` (compact patch + retry).
* `phys05_renderer` template re-purposed from prose to shell-drafter (was DEAD anyway).
* `phys05_critic_lite` template re-purposed from "rate 0-3 axes" to one-line failure diagnosis.

### Docs

* `docs/PHASE_13_BLACK_DOG_ORGAN_COLONY.md` — single source of truth.
* `docs/ORGAN_CANONICAL_MAP.md` — narrative ↔ code-side reconciliation.
* `reports/ORGAN_TRAFFIC_AUDIT.md` + `BLACK_DOG_ORGAN_AUDIT.md` — the audit pair that triggered the phase.
* `reports/poison_to_surgery_dataset.json` — first poison harvest summary.

---

## 11. Models traversed (chronological)

| model                                   | role                          | source                                  | when                |
|-----------------------------------------|-------------------------------|-----------------------------------------|---------------------|
| **DeepSeek-V4-Flash** (154B / 284B FP4) | original optimization target  | `/mnt/c/Users/pc/Desktop/folder/`       | pre-history         |
| **DeepSeek-V2-Lite 16B**                | predecessor pipeline (~3.4 s/tok) | folder                                | pre-history         |
| **BitNet-1.58b**                        | resurrection experiment       | `CHAT_PATCH.md`                         | pre-history         |
| **Qwen2.5-7B-Instruct**                 | top-brain donor (BF16 15 GB)  | `/home/pc/qwen7b/instruct/`             | 2026-04-27          |
| **Qwen2.5-0.5B**                        | organ donor                   | `Physarum-05B-Organic/`                 | pre-PHASE-7         |
| **Physarium-7B-Native**                 | post-magnitude-flow surgery (22.22 % killed) | `Physarium-7B-Native/`   | 2026-04-27 PHASE 7  |
| **Physarum-05B-Organic**                | 8 organs share one .planck pack | `physarum05b.planck` (Linux) + `Physarum-05B-Organic/` (Windows) | PHASE 7-8 |
| **physarium_7b_native**                 | BF16 master + planck repack   | `physarium7b.planck`                    | PHASE 8A            |
| **physarium_7b Q4**                     | resident Q4 GEMV pack         | `physarium7b.q4planck`                  | PHASE 8E2_NUCLEAR   |
| **Physarium-7B-Identity**               | LoRA-merged with identity training | `physarium7b_identity.planck/.q4planck` | PHASE 9F-RUN     |
| **physarium7b_identity-Q4_K_M.gguf**    | external benchmark target via llama.cpp | `tools/bench_external/gguf/`  | PHASE 8E7B          |
| **physarium7b_identity-Q5_K_M.gguf**    | quality probe (BF16 vs Q5)    | same                                    | PHASE 8E4           |
| **physarium7b_identity-f16.gguf**       | parity reference               | same                                    | PHASE 8E7B          |

(Original Qwen weights are donor only, never the runtime — per `ARCHITECTURE_LOCK.md`.)

---

## 12. Operations performed (alphabetical)

* **Magnitude-flow surgery (Physarium-v1)** — block_size=256, n_iter=30, β=2.0. Static, not activation-aware. Tile coverage of target proj tensors = 100 %.
* **QLoRA identity training** — bitsandbytes 4-bit base, LoRA r=16 attn-only, multi-stage seed → train → merge → pack → probe → 14/14 stretch.
* **Q4_K_M / Q5_K_M GGUF quantization** — via llama.cpp tooling, used for external-bench parity.
* **`.planck` / `.q4planck` custom format** — fixed-offset, 4 KB-aligned, mmap-ready, Phase-8A.
* **DP4A inner loop CUDA kernel** — `cuda_q4_gemv.cu` v3, sign-extended nibbles, 8 int8 activations re-packed, 2 dp4a per iter, delayed scale.
* **ChatML wrapping** — Phase-8E4, `s.use_chatml = true` only on 7B (0.5B not robust to ChatML).
* **Black-Dog reinforcement** — `food / poison / conductance` per `(pattern_hash, action_chain)`.
* **ARIZ kernel** — `src/reasoning/ariz_trace.cpp` rule-based stages 1/4/5/6.
* **Organ Manager + tier manager** — VRAM/RAM/SSD policy.
* **Hard verifier** — `json_mini` + `hard_verifier`, no regex fallback.
* **Shell capsule (NanoOS)** — `tools/capsule/shell_capsule.py`, sterile sandbox with Evidence + replay_recipe.
* **Form pattern recognition** — `src/main.cpp:FormPattern` registry.
* **Hologram exact cache** — `dag/hologram_cache.jsonl`, sha256_16 keyed.
* **Acceptance benches** — MBPP n=100, HumanEval n=164, Terminal NanoOS V1/V2/30, BFCL_SUBSET_V1, repeat_learning_torture.
* **Identity LoRA "organ-name patch"** — Phase-11D, 18/18.

---

## 13. Claim trail vs measured trail

```
CLAIMED                       MEASURED (under hard verifier)
─────────────────────────     ──────────────────────────────
"Phase-2 100 % pass"          → re-scored to mostly 0 % under hard rules (Phase-6A)
"Monster organ chain"         → 86.9 % traffic = single organ; 5/8 organs DEAD (Phase-13 BD1 audit)
"Hologram replay x100 GREEN"  → was exact-cache, RENAMED to EXACT_REPLAY_CACHE_V1 (utility)
"Holographic form replay"     → 20/20 non-identical variants, real form recognition (PHASE-12.HFR)
"Acceptance 18/18"            → internal canon, not external bench; held under regression
"MBPP +15 / HumanEval +4"     → external-bench measured under MONSTER_NATIVE_RETRY=1, replicable
"Terminal mini +2 best-of-N"  → +1 stable across 4 runs, model-class ceiling
"DP4A +59 % decode"           → 18.27 → 28.99 tok/s, 1/18 acceptance regression (opt-in flag)
"x100 effective speed"        → measured 140-403× on EXACT-INPUT repeats only — not x100 on novel
"Black-Dog learning loop"     → wired in 3 sites, only ARIZ moved conductance; bug fixed in BD2
```

The lab's culture is to write the claim, then re-score under hard verifier, then publish whichever number is true. The `archive/2026-04-29-noise/` graveyard preserves the failed claims; `site/surgery-open/07_graveyard/` mirrors the trail publicly.

---

## 14. Authoritative artefact list (read these in order to understand state)

1. `docs/X100_SCOREBOARD.md` — current axes / status / targets
2. `docs/CURRENT_TRUTH_LEDGER.md` — what is GREEN / YELLOW / RED right now
3. `ARCHITECTURE_LOCK.md` — laws of the project
4. `LLM_SURGERY_LAB.md` — architecture tree
5. `docs/PHASE_13_BLACK_DOG_ORGAN_COLONY.md` — current phase
6. `docs/ORGAN_CANONICAL_MAP.md` — what organs exist
7. `docs/UNIVERSAL_LLM_SURGERY_PROTOCOL.md` — surgery loop
8. `docs/CLEAN_ROOM_DOCTRINE.md` — external dependency policy
9. `reports/ORGAN_TRAFFIC_AUDIT.md` + `BLACK_DOG_ORGAN_AUDIT.md` — most recent honest accounting
10. `reports/HOLOGRAPHIC_FORM_REPLAY_V1.md` — first real x100 axis evidence
11. `reports/PHYSARIUM_RESULTS_RECONCILE.md` + `PHYSARIUM_COVERAGE_AUDIT.md` — mandatory errata
12. `GIGACHAD_LAB_MASTER_REPORT.md` — running master log (3500+ lines)

Mirrored to: `C:\Users\pc\Desktop\folder\` (Windows lab side).

---

## 15. The phrase that survived everything

```
The model wins on internal acceptance.
The runtime currently loses on external coding gauntlet.
The diagnostic loop says exactly where to fix.
That is honest. That is the lab.
```

Updated 2026-04-30 PM with: "Black-Dog is now wired everywhere it should have been wired in PHASE-8F2. The colony can finally feed and poison itself. Surgery datasets follow."
