# CURRENT_TRUTH_LEDGER

**Single source of truth.** Everything else in `reports/` is historical or superseded — cite this first.
**Last updated:** 2026-05-01 (after BD6 pass-1 → BD6.2 reverted → BD6.3 gate failed → production restored)

## 0. Production state (2026-05-01)

```
PHYS05_PACK = physarum05b_code_skeleton.planck (BD6 pass-1, with phys05 code-organ surgery)
PHYS7B_PACK = physarium7b_identity.q4planck   (Phase-9F identity LoRA merged)
MBPP B (organ-only)        = 13/100
HumanEval B (organ-only)   =  6/164
LCB easy B (post-route-fix)=  0/50  — honest 0.5B floor on competitive programming
Anchor 19 (pass-1 wins)    = 19/19  — verified post-revert
```

Mode-B authoritative artefact: `reports/MBPP_HE_3MODE_V1.{md,json}`.
LCB Mode-B authoritative: `reports/LIVECODEBENCH_3MODE_V1.{md,json}` + `reports/LCB_CODE_ROUTE_FIX.md`.

**BD6.2 and BD6.3 are archived negative results, NOT production.** They are kept on disk
as `physarum05b_code_skeleton_v2.planck` / `_v3.planck` for the surgery-history trail.
Reports: `reports/BD6_2_OVERTRAIN_DELTA.md`, `reports/BD6_3_ANCHOR_GATE_FAILED.md`.

---

## 1. Current best quality (internal)

```
acceptance Mode C native default:   17/18  (json_03 regressed after max_tokens=384 bump)
acceptance Mode C llama.cpp backend: 18/18 ✅ (production path)
acceptance Mode C native + DP4A flag: 17/18 (flag stays opt-in)
identity probe:                      14/14 ✅
architecture audit:                  10/10 GREEN ✅
identity leaks:                      0
```

Authoritative artefact: `reports/gigachad_acceptance_run_v14_llamacpp.json`.

## 2. Current best speed (measured 5-run mean, RTX 3060 Ti, Physarium-7B Q4)

```
native q4 v2 (default --chat):       18.27 tok/s
native q4 + Q4_GEMV_DP4A=1 (opt-in): 28.99 tok/s    +59 %
native q4 + DP4A, tg128:             41.69 tok/s    +58 %
llama.cpp env-flag (LLAMACPP_URL):   83.58 tok/s    production speed
llama.cpp Mode C mean wall:           2.99 s
```

Authoritative artefact: `reports/EXTERNAL_BACKEND_SHOOTOUT_V2.md` + `reports/PHASE_8E8A_DP4A_NATIVE_BACKEND.md`.

## 3a. Current OFFICIAL frontier benches — V2 (post G2.b/G3/G4)

```
HumanEval (20-subset):   PARROT 14/20 = 70 %   MONSTER 14/20 = 70 %    Δ   0 ✅
MBPP (20-subset):        PARROT 14/20 = 70 %   MONSTER 10/20 = 50 %    Δ -20 pp (model ceiling, not runtime)
AIME 2024 (full 30):     PARROT  1/30 =  3 %   MONSTER  0/30 =  0 %    Δ  -1 (model ceiling)
```

PARROT 70/70/3 sits inside the public 7B band (Qwen2.5-7B 60-70 % / Llama-3-8B 70 % HumanEval; 5-10 % AIME). **HumanEval gap closed** in V1→V2: G2.b widened `looks_like_humaneval()` to match canonical HumanEval shape; G3 added Python-compile probe to runtime verifier; G4 hardened AIME answer extraction. Acceptance Mode C llama.cpp stayed **18/18** through the changes (`reports/gigachad_acceptance_run_v18_after_g3.json`).

Remaining MBPP −4 / AIME −1 are model-correctness issues at 7B-class — both PARROT and MONSTER hit the band ceiling.

**TERMINAL_NANOOS_MINI_V1 (2026-04-30) — 10-task suite, GREEN by Δ +1 stable; +2 best-of-N**:

```
PARROT   7/10 = 70 %    wall ~3.0 s
MONSTER  8/10 = 80 %    wall ~7.8 s   (stable across 4 runs)
        9/10 = 90 %    wall ~7.6 s   (best-of-N, variance ~30 %)
Δ         +1 stable / +2 best-of-N
```

PHASE-12.TR.HEREDOC_AWARE landed on top: extractor in C++ runtime now
collapses `cat > file <<'EOF' ... EOF`, `python3 - <<'PY' ... PY`, and
trailing-`\` line continuations into single commands. Plus a stronger
retry prompt: prev_cmds shown to the model, stderr unescaped + given as
head-200 + tail-500 (Python tracebacks have the actual error type at the
END), failure-pattern hints (`#include <iostream>` for `is not a member
of std`, trailing-comma sed for JSONDecodeError, AssertionError ->
imported-module fix), and SHELL_AGENT_OVERRIDE preamble that bans
interactive editors (nano/vim block on stdin and abort the run).

Net effect across the 4-run sample: compile_cpp_missing_include and
sed_transform pass reliably under MONSTER (PARROT-X always); fix_failing_test
and find_bug_from_stderr each pass ~50 % of runs (model picks between
correct fixes and weird sed manipulations stochastically — 7B ceiling
on multi-step text edits via shell). Wall ratio MONSTER/PARROT ≈ 2.6×
(within the spec budget of 2× was missed slightly because failed tasks
burn full k=3 retries).

Replaced the 5-task probe with 10 tasks spanning easy/medium/hard:
create_file_exact, run_python_print_42, fix_failing_test (plain Python,
not pytest — capsule env has no pytest), parse_json, sed_transform,
compile_cpp_missing_include, chmod_run_executable, find_bug_from_stderr,
produce_patch, verify_output_hash. PARROT = one-shot llama.cpp + 1
capsule run; MONSTER = `--chat` envelope -> C++ runtime (PHASE-12.TR)
drives k=1..3 stderr-feedback retry.

Differential rows:
- `sed_transform`  : PARROT X (`FOO:1` no space) -> MONSTER OK at round 2
                     after stderr feedback corrected the format.
- `fix_failing_test`: both pass; MONSTER took 2 rounds (k=1 needed the
                      AssertionError text fed back).
- `compile_cpp_missing_include` and `find_bug_from_stderr` still fail at
  k=3 — model+harness ceiling (multiline edits via shell don't survive
  line-by-line bash extraction).

Two infra fixes landed during this run:
1. `shell_capsule.py` — `ok = verifier_pass and not timed_out` (verifier
   is source of truth; non-zero command exits no longer block ok).
   Unlocked `produce_patch` for both modes (`diff -u` exits 1 on diff).
2. `src/main.cpp:build_terminal_user_msg` — strong shell-agent override
   inside user_msg. The default organ injects a GIGACHAD_NATIVE persona
   preamble that pulled MONSTER away from terminal pragmatics; the
   override turns the model into a shell agent for the turn.

Every MONSTER pass row carries a `capsule_id` + ≥1 artifact sha256 in
the report. Every replay_recipe contains the full spec to re-execute
deterministically (`dag/capsules/cap_*.json`).

Authoritative artefact: `reports/TERMINAL_NANOOS_MINI_V1.md`,
`reports/terminal_nanoos_mini_v1.json`.

What landed:
- `tools/capsule/shell_capsule.py` — minimum viable NanoOS shell capsule
  (temp dir, subprocess, stdout/stderr/exit per command, sha256 artifacts,
  Evidence dict with replay_recipe, DAG entry at `dag/capsules/<cap_id>.json`).
  Smoke green. 5 verifier kinds (exit_zero / stdout_contains / file_exists /
  file_content / regex_match).
- `tools/bench/terminal_nanoos_mini.py` — harness comparing PARROT one-shot
  vs MONSTER k=3 stderr-feedback retry. Both use same capsule + verifier.

Why no differential: 4 of 5 tasks solved by 7B at k=1 (no retry needed); 1
task (fix Python `1+'2'` TypeError) is at the model ceiling — even k=3
retries with stderr couldn't rescue. Intermediate-difficulty tasks (compile
errors, pytest assertion failures, JSON schema-validation errors) are the
right next test class.

Infrastructure ready. Capsule path is what unlocks SWE-bench, Terminal-Bench,
τ-bench when wired through the C++ runtime.

Authoritative artefact: `reports/TERMINAL_NANOOS_MINI_V1.md`.

---

**PHASE-13 BLACK_DOG_ORGAN_COLONY (2026-04-30) — wiring revival in progress**:

```
BD1 audit (closed):  86.9 % traffic single-organ; 5/8 0.5B dead;
                     wound missing; conductance moved only on ARIZ.
BD2 fixes (closed):  4 wiring bugs patched.
                     - main.cpp:348 (void)cond_* discard removed
                     - run_native_terminal_task / tool_call / cr_eval_one wired
                     - run_chat_organ_route hardcoded food=1 → verifier-driven
                     Verified: terminal_native conductance 0.0 → 0.59 across 4 repeats.
BD3 in flight:       --organ-probe-batch CLI (CUDA-backed, 5 s/probe)
                     reviving 5 dead organs by firing them on role prompts.
                     663 poison rows already harvested; baseline run streaming.
BD4 in flight:       ARIZ/TRIZ chain wired in --chat (`run_chat_ariz_organ_first`).
                     phys05_triz_contradiction first (CUDA), strict TC/PC verifier;
                     on fail → physarium_7b_chat synth-fallback with 0.5B draft as scaffold;
                     on still-fail → fall through to legacy run_ariz_e2e.
                     Each step writes its own DAG entry with food/poison/cond.
BD5..6 gated:        repeat learning curve / QLoRA per organ.
```

Authoritative: `docs/PHASE_13_BLACK_DOG_ORGAN_COLONY.md`,
`reports/BLACK_DOG_ORGAN_AUDIT.md`,
`reports/ORGAN_TRAFFIC_AUDIT.md`,
`reports/poison_to_surgery_dataset.json`,
`reports/ORGAN_BASELINE_PROBE.md` (in flight).

---

**HOLOGRAPHIC_FORM_REPLAY_V1 (2026-04-30) — PHASE-12.HFR: REAL x100, form-recognition not memoization**:

```
20/20 non-identical variants pass
20/20 model_called: false   (no 7B forward)
20/20 llamacpp_called: false (no HTTP)
20/20 source_hologram_id present
20/20 delta_params extracted from new input
20/20 wall_ms_total < 100ms (38–56ms range)
```

Per family (5 variants each, all unique inputs):
* create_file_exact 5/5  (answer.txt=42, status.txt=ok, note.txt=done, …)
* sed_uppercase_keys 5/5  (data→out, pairs→up, items.csv→caps, …)
* json_extract_key 5/5  (data.json.target=42, cfg.json.version=3.7, …)
* grep_count_pattern 5/5  (log.txt ERROR=3, events.log WARN=3, build.log FAIL=4, …)

How this differs from `EXACT_REPLAY_CACHE_V1`:

* The hologram exact cache file (`dag/hologram_cache.jsonl`) is wiped at
  the start of the bench. No memoization can hit. Every pass must
  succeed via FORM RECOGNITION.
* `run_chat()` calls `form_pattern_match(env, &pat_id, &params)` BEFORE
  any model path. Each `FormPattern` has a `match` lambda that runs
  regex against the instruction + structural checks against the inputs
  dict, and a `build_commands` lambda that materializes a parametric
  bash command list from the extracted params.
* On match, `run_form_replay()` builds the spec, popen's
  `shell_capsule.py`, parses evidence. Verifier from the original
  envelope passes through unchanged. If the capsule's verifier passes,
  the runtime emits a `form_replay` envelope with `replay: true`,
  `replay_kind: form`, `pattern_id`, `delta_params`, and
  `materialized_commands`. If verifier fails, falls through to model
  path.
* No `shared_mgr()` call, no organ init, no llama.cpp HTTP. Bench
  proves this by reading `model_called: false` and `llamacpp_called:
  false` from every pass row.

V1 patterns are HAND-CURATED (4 of them). V2 will mine patterns from
clusters of successful cold runs (promote to learned templates). The
architecture is identical — `FormPattern` is the shape; only the
registration step changes.

Authoritative artefact:
  reports/HOLOGRAPHIC_FORM_REPLAY_V1.{md,json}
  src/main.cpp (FormPattern, g_form_patterns, form_pattern_match,
                run_form_replay)
  dag/capsules/cap_*.json — every variant leaves an evidence record

---

**EXACT_REPLAY_CACHE_V1 (2026-04-30, renamed from HOLOGRAM_REPLAY_X100) — PHASE-12.HR: utility cache hygiene, NOT x100 intelligence**:

```
workflow              cold(ms)   warm(ms)   speedup   replay  model_called
create_file_exact      574.6     2.1        275.8x    True    False
sed_transform         1151.0     2.9        403.3x    True    False
parse_json_target      399.2     2.2        180.1x    True    False
mbpp_solved_code       285.6     2.0        139.9x    True    False
identity_who_are_you   415.6     1.9        217.5x    True    False
```

All 5/5 workflows: warm<100ms, speedup≥100×. DOD met (≥3 of each, ≥1
stretch ≥100×). The runtime path:

1. `run_chat()` immediately calls `holo_cache_lookup(input, &cached)` —
   keyed on `sha256_16(input)`, before any organ init or model call.
2. On hit, `emit_holo_hit_envelope_v2` returns an envelope with
   `route: hologram_replay`, `replay: true`, `source_hologram_id`,
   `source_dag_id` (pointing at the cold-run capsule on disk),
   `model_called: false`, `llamacpp_called: false`, real measured
   `wall_ms`.
3. `run_native_terminal_task` and `run_native_code_repair` call
   `holo_cache_store(input, replay_payload)` after a verifier-pass run,
   so any successful workflow primes the cache for itself.

The 2ms warm wall is dominated by binary spawn + arg parse; the
in-process cache lookup itself is ~0.3ms. No 7B forward, no llama.cpp
HTTP, no capsule re-execution on warm.

Authoritative artefact: `reports/HOLOGRAM_REPLAY_X100.{md,json}`,
`dag/hologram_cache.jsonl` (5 entries primed by this run),
`dag/capsules/cap_*.json` (source DAG entries that warm rows point at).

---

**TERMINAL_NANOOS_NATIVE_V1 (2026-04-30) — PHASE-12.TR: retry loop ported into C++ runtime**:

```
PARROT_NATIVE   4/5 = 80 %    wall 3.6 s
MONSTER_NATIVE  4/5 = 80 %    wall 8.6 s   (3 rounds on hard task)
Δ                0  YELLOW
```

What changed: the bench-side k=1..3 stderr-feedback loop was deleted from
`tools/bench/*.py` and re-implemented in `src/main.cpp`
(`run_native_terminal_task`, ~370 LoC). Bench Python now sends ONE
`--chat` call per task, packing the task as a `TERMINAL_TASK_V1` envelope
(instruction + inputs JSON + verifier JSON). Runtime detects the magic,
parses the envelope, drives the loop — popen `shell_capsule.py` each round,
feed `stderr` + `exit_codes` back into the next prompt. Final envelope
emits `attempts`, `first_pass_round`, `final_dag` for replay.

Pass-rate parity vs the Python loop confirms zero functional regression.
Doctrine (cleverness lives in C++, Python is a thin dispatcher) now holds
on the Terminal axis the same way PHASE-12.CR did for code repair and
PHASE-12.TC did for tool-call.

Authoritative artefact: `reports/TERMINAL_NANOOS_NATIVE_V1.md`.

---

**BFCL_SUBSET_V1 (2026-04-30) — tool-call axis, runtime tied with model alone**:

```
PARROT      10/10 = 100 %    MONSTER 10/10 = 100 %    Δ +0  YELLOW
```

Hand-curated 10 BFCL-shape problems too easy for 7B Q4 — solved single-shot at k=1 every time. Runtime parallel-retry never fired. Schema validation in C++ runtime works (smoke green) but doesn't differentiate on this subset.

**Honest finding:** simple tool-call (single tool, clear intent, well-typed schema) is at the 7B model ceiling for instant pass; the architectural axis only matters when first attempt fails. To show +15-25 pp here we need harder BFCL v4 prompts (multi-turn, hallucination, ambiguous, parallel tool dispatch). Not a runtime regression — just a wrong test class for measuring our edge.

Authoritative artefact: `reports/BFCL_SUBSET_V1.md`.

Code shipped: `src/main.cpp` Phase-12.TC: `looks_like_tool_call`, `extract_tool_name`, `extract_required_keys`, `extract_tool_call_json`, `build_tool_call_prompt`, `tc_eval_one`, `run_native_tool_call` (~250 lines). Ready to fire on harder benchmarks; the runtime path itself is correct.

---

**PARALLEL_RETRY_V3 (2026-04-30) — wall down + pass-rate up on MBPP** ✅:

```
                    PARROT          MONSTER (V3 parallel)
MBPP n=100         58/100 = 58 %   73/100 = 73 %   Δ +15 ✅
                                    wall 165 s (vs 71 PARROT, vs 253 V2 seq) — −35 %
HE n=164          101/164 = 62 %  104/164 = 63 %   Δ +3
                                    wall 322 s (vs 288 PARROT) — neutral
Combined n=264    159/264 = 60 %  177/264 = 67 %   Δ +18 ✅
                                    wall 487 s (vs sum-PARROT 360 s) — 1.4×
```

`run_native_code_repair` now runs k=0 sync, then k=1..4 in parallel via
`std::async` against `llama-server --parallel 5`. Mixed honest result:
MBPP big win (pass +15, wall −35 %), HumanEval slight wall regression
(+12 % from KV-cache split across 5 slots) but pass-rate stable.

The +18/264 combined exceeds V2's +16/264. The architectural axis is
real and improves with parallelism on the prompt class where retries fire.

Authoritative artefact: `reports/PARALLEL_RETRY_V3.md`.

---

**PUBLIC_BENCH_EXPANSION_V2 (2026-04-30) — overtake holds at SCALE (sequential)** ✅✅✅:

```
MBPP n=100         PARROT  58/100 = 58 %   MONSTER 70/100 = 70 %   Δ +12 ✅
HumanEval n=164    PARROT 106/164 = 65 %   MONSTER 110/164 = 67 %  Δ +4  ✅
Combined n=264     PARROT 164/264 = 62 %   MONSTER 180/264 = 68 %  Δ +16 ✅
```

The +1/+1 on n=20 was not noise. Scaling 5× confirmed the C++ retry loop
genuinely rescues 18/264 ≈ 7 % problems via preamble rotation +
embedded-assert / doctest test execution. Bench Python sends --chat once
per problem; all retry/preamble/fn-extraction/test logic in src/main.cpp.

Authoritative artefact: `reports/PUBLIC_BENCH_EXPANSION_V2.md`.

---

**VICTORY_NATIVE_OVERTAKE_V1 (2026-04-30) — first overtake on n=20 sample (superseded by V2 at scale)**:

```
MBPP_NATIVE        PARROT 14/20   MONSTER_NATIVE 15/20   Δ +1 ✅
HumanEval_NATIVE   PARROT 14/20   MONSTER_NATIVE 15/20   Δ +1 ✅
COMBINED           PARROT 28/40   MONSTER_NATIVE 30/40   Δ +2 ✅
```

The Python bench harness sends `./build/gigachad_native --chat "<task>"` ONCE per
problem. ALL retry / preamble rotation / fn-name extraction / embedded-assert
extraction / doctest parsing / compile probe / DAG-per-attempt recording lives
in `src/main.cpp` (run_native_code_repair, build_code_retry_prompt,
extract_code_entry_point, extract_embedded_asserts, run_embedded_asserts).
Behind env `MONSTER_NATIVE_RETRY=1`.

first_pass_k inside the runtime:
  MBPP        k=1: 12, k=2: 2, k=3: 1, miss: 5  (3 problems caught by retry)
  HumanEval   k=1: 14, k=2: 1, miss: 5           (1 problem caught by doctest retry)

Production path Mode C llama.cpp: 17/18 (identity_02 known phrasing flake, not new).

Authoritative artefact: `reports/VICTORY_NATIVE_OVERTAKE_V1.md`.

---

**MBPP_OVERTAKE_V1 (2026-04-30) — first PUBLIC bench where MONSTER > PARROT (bench-side, superseded by NATIVE)** :

```
PARROT_K5     14/20 = 70 %    same prompt, temps rotated [0.0, 0.4, 0.7, 0.5, 0.9]
MONSTER_K5    15/20 = 75 %    5 different preamble shapes (baseline → fn-name +
                               failed-test feedback → spec → step-by-step → schema-fill)
Δ +1 ✅
```

First-pass-k distribution shows MONSTER's edge: PARROT only catches at k=1 (12)
and rarely at k=4 (1) and k=5 (1). MONSTER catches at k=1 (10), k=2 (2),
k=3 (2), k=4 (1) — its varied preambles actually move the candidate
distribution where temperature alone cannot.

Authoritative artefact: `reports/MBPP_OVERTAKE_V1.md`.

---

**MBPP_4MODE_V1 (2026-04-30) — single-shot deficit isolation**:

```
A PARROT (this run):    12/20 = 60 %  (was 14/20; llama-server warm-state drift)
B MONSTER current:      11/20 = 55 %
C MONSTER FORCE_7B:      9/20 = 45 %    ← forcing 7B HURT (0.5B chain has value)
D MONSTER + retry:      12/20 = 60 %    ← matches PARROT (rescues 2 of 9 fails)
```

Verdict YELLOW — Monster+retry MATCHES PARROT single-shot. 7B Q4 model ceiling on MBPP repeated mistakes. Next leverage: capsule-based execution diff (Phase-12) or MBPP-LoRA. Not routing tricks.

Authoritative artefact: `reports/MBPP_4MODE_V1.md`.

Authoritative artefact: `reports/OFFICIAL_FRONTIER_BENCH_RUN_V2.md` + `docs/OFFICIAL_FRONTIER_BENCHMARKS.md`.

Gated for next iteration (with one-line reason each in the doc): SWE-bench Verified, Terminal-Bench 2.0, BFCL v4, τ-bench, OSWorld, LiveCodeBench, GPQA Diamond, HLE, MMLU-Pro, ARC-AGI-2, MMMU, MathVista.

## 3b. Internal Gauntlet (post Gap C kill) — for the trail, not the headline

```
PARROT (pure 7B via llama.cpp):       60/60 = 100 %
MONSTER_LEARNING (full --chat):       59/60 = 98 %     ← PARITY
Δ:                                     -1 round (count_unique_chars flake)
```

V3:10 → V4:20 → V5:31 → **V6:59**. Five distinct runtime bugs surfaced + closed across the iterations (extractor, newline encoding, bool harness, ARIZ misroute, ChatML seed, type-hint verifier, max_tokens, function trim). All documented in `reports/SOVEREIGN_COGNITION_GAUNTLET_V1.md`.

Authoritative artefact: `reports/SOVEREIGN_COGNITION_GAUNTLET_V1.md`.

## 4. Current diagnostic bench (repeat-learning)

```
STATELESS (parrot mode):                    20/50 = 40 %
STATEFUL+ADMIN (Monster runtime, scroll wire on):
                                            30/50 = 60 %
Δ:                                          +20 pp ✅
clean win on doctrine_recall:               0/10 → 9/10
```

Authoritative artefact: `reports/REPEAT_LEARNING_TORTURE_V2.md`.

The **system can be made to learn** between rounds — that signal is GREEN. The **gauntlet pass-rate** is RED until Gap C lands.

## 5. Active blockers

```
B1   GAUNTLET_GAP_C_KILL — CLOSED 2026-04-29 (V6: 59/60)
B2   Phase-12.G3-fix    — per-route max_tokens override
                          (json regressed to 17/18 on native after the
                          384-token bump that fixed code; production
                          llama.cpp path stays 18/18)
B3   Phase-8E8a-fix     — Q8_1 per-block activation scale to recover
                          code_03 under DP4A and flip default ON
B4   Self-repair loop autonomy — gated on Phase-12 capsule runner
B5   350-volume HOLO_LOG_PACK proof — skeleton green, corpus pending
B6   Frontier bench expansion — G2.b widens HumanEval-route detection;
                                G3 verify-and-fallback for code;
                                G4 AIME answer-extraction tightening
B7   SWE-bench Verified, Terminal-Bench, τ-bench — gated on Phase-12 capsules
B8   GPQA Diamond, HLE — gated on HF license accept
```

## 6. Next executable step (one-line, unambiguous)

```
PROJECT GAUNTLET_GAP_C_KILL_AND_REAL_BENCH_V1
  → finish G2 (route HumanEval-style prompts to 7B before 0.5B)
  → rerun the 6×10 gauntlet
  → only after Monster ≥ PARROT, run HumanEval-full / MBPP / BFCL
```

## 7. Disk hygiene state (2026-04-29 cleanup)

```
purged:    /home/pc/v4flash/                 -286 GB  (DeepSeek + V4-Flash old phase)
archived:  reports/v1..v13 acceptance JSONs  → archive/2026-04-29-noise/reports_tmp/
archived:  reports/_*_run.log gauntlet+torture noise
                                              → archive/2026-04-29-noise/logs_tmp/
archived:  physarium7b.planck (pre-LoRA BF16) → archive/2026-04-29-noise/
archived:  physarium7b.q4planck (pre-LoRA Q4) → archive/2026-04-29-noise/
free:      775 GB on /
```

The old DeepSeek MoE work is gone. Identity and acceptance reference packs are intact. No production artefact was deleted.

## 8. Doctrine

```
CLEAN_ROOM_DOCTRINE:  external systems are patients, not spine
OBTEK_RULES:          1-7, see docs/OBTEK_RULES.md
patients vendored:    0
```

## 9. Citation rules

- Cite **this ledger first**.
- Cite the **authoritative artefacts** listed in §1-4.
- Anything in `archive/2026-04-29-noise/` is **NOT** authoritative; it is preserved for the trail (see `site/surgery-open/07_graveyard/`).
- Anything not on the keep list is **superseded** — do not cite from it without re-running.

## 10. Slogan we earn from this state

```
The model wins on internal acceptance.
The runtime currently loses on external coding gauntlet.
The diagnostic loop says exactly where to fix.
That is honest. That is the lab.
```
