# BD6.8F+ — decoder grid: no sweet spot exists (2026-05-02)

**TL;DR — at the (cuda_rep_penalty, no_repeat_ngram) granularity, NO
decoder spec keeps production at 19/19 AND lifts BD6.5 above 15/19.
The two requirements are mutually exclusive: production needs
cuda_rep ≥ 1.05 to pass anchor; BD6.5 only gains anchors when
cuda_rep ≤ 1.00. The intervals don't overlap. Lever F+ is closed.**

**Next step is BD6.8D (token-weighted CE on BD6.5 dataset shape) —
that's the path to recover MBPP/53 + HE/85 (the two real weight-gap
holdouts) without changing decoder.**

---

## Grid tested

CPU `repetition_penalty` held at 1.15 (no effect on CUDA backend used
by phys05_code_skeleton). Varied `cuda_rep_penalty` × `no_repeat_ngram`:

| cell | cuda_rep | ngram | prod pack | v5 pack | note |
|------|----------|-------|-----------|---------|------|
| C0 (current) | 1.05 | 2 | **19/19** | 15/19 | production decoder, also BD6.5's measurement |
| C1 | 1.05 | 0 | **19/19** | 15/19 | drop ngram only — prod safe, v5 unchanged |
| C2 | 1.02 | 2 | **4/19**  | 15/19 | soften cuda_rep slightly — prod collapses |
| C4 | 1.00 | 2 | **1/19**  | 16/19 | kill cuda_rep, keep ngram — prod cratered, v5 +1 |
| C5 (greedy) | 1.00 | 0 | **1/19**  | 16/19 | pure greedy — prod cratered, v5 +1 |

(Cell C3 = (1.02, 0) was skipped: at cuda_rep=1.02 production already
fell to 4/19 in C2, so dropping ngram on top wouldn't recover prod.)

## Two non-overlapping intervals

```
                             cuda_rep_penalty
       ╔══════ 1.05 ══════╗  ╔═ 1.02 ═╗  ╔══ 1.00 ══╗
prod   ║ 19/19 (safe)     ║  ║ 4/19   ║  ║ 1/19     ║
       ╚══════════════════╝  ╚════════╝  ╚══════════╝
       ╔══════ 1.05 ══════╗  ╔═ 1.02 ═╗  ╔══ 1.00 ══╗
v5     ║ 15/19 (no gain)  ║  ║ 15/19  ║  ║ 16/19    ║
       ╚══════════════════╝  ╚════════╝  ╚══════════╝
                             ↑ no overlap ↑
                             where prod stays ≥19 AND v5 ≥17
```

There is no value of `cuda_rep_penalty` (at the 0.05 step granularity
we tested) that satisfies both constraints simultaneously.

## Why this is decisive

* Production BD6 pass-1 is essentially a "hot mode" of the donor
  weights — the model only behaves correctly when `cuda_rep_penalty`
  is biting hard enough to suppress repetition. Drop it at all and
  the model loops/repeats and fails compile.
* BD6.5 weights have learned to be *less* repetition-prone — they
  generalize the anchor patterns enough that pure greedy works for
  most of them. But the residual 4 holdouts (MBPP/53, HE/34, /45, HE/85)
  fail under the production decoder because `cuda_rep_penalty=1.05`
  perturbs the long-anchor argmax just enough to lose the pattern.
* These are two *different operating regimes*. Production lives in
  the hot regime (rep-pen mandatory); BD6.5 lives in the cool regime
  (rep-pen unhelpful or harmful). One decoder cannot serve both packs.

## Per-row verdict on the 4 BD6.5 holdouts (re-confirmed)

From combined BD6.8F + BD6.8F+ data:

| holdout | C0 prod | C0 v5 | C5 v5 (greedy) | verdict |
|---------|---------|-------|----------------|---------|
| MBPP/53     | ✓ | ✗ | ✗ | **real weight gap** (both decoders fail v5) |
| HumanEval/34| ✓ | ✗ | ✓ | decoder-noise (greedy fixed v5) |
| HumanEval/45| ✓ | ✗ | ✓ | decoder-noise (greedy fixed v5) |
| HumanEval/85| ✓ | ✗ | ✗ | **real weight gap** (both decoders fail v5) |

* HE/34, HE/45 — the LoRA actually has the right weights; only the
  decoder change recovers them. But that decoder change costs
  production.
* MBPP/53, HE/85 — these are the targets BD6.8D (token-weighted CE)
  must address.

## Decision per spec

```
Do not merge v5 unless:
  - production config is safe          ← requires cuda_rep ≥ 1.05
  - v5 anchor ≥ 19/19 OR ≥ 17/19 with explicit user approval
```

Best v5 result at any production-safe spec: **15/19**.
Best v5 result at any spec at all: **16/19** (greedy, prod cratered).
Neither passes the gate. **v5 is NOT merged.** Production stays
unchanged.

## Recommended next step

**BD6.8D — token-weighted CE, no other changes.**

Specifically, modify `tools/surgery/train_code_skeleton_lora_bd6_5.py`
loss step:

```python
# current
total_loss = out.loss

# new (BD6.8D)
seq_len = (labels != -100).sum().clamp(min=1).float()
total_loss = out.loss / seq_len.sqrt()
```

This makes long targets contribute proportionally less per gradient
step, so they don't dominate early training and force the LoRA
into pattern-memorization mode that drifts under poison pressure.

Use BD6.5 dataset shape unchanged (`bd6_5_mixed_train.jsonl`, 525 rows,
53 % anchor share, stratified).

DO NOT add KL (BD6.7 ladder showed it's redundant).
DO NOT increase replication (BD6.6 showed saturation at 53 %).
DO NOT change decoder for production (BD6.8F+ proved no overlap).

## Production state (after BD6.8F+ revert)

* `PHYS05_PACK = physarum05b_code_skeleton.planck` (BD6 pass-1).
* phys05_code_skeleton spec: rep=1.15, ng=2, cuda_rep=1.05 (restored).
* Anchor 19/19 verification scheduled post-revert (see verify line at
  end of this report or grep `[anchor] 19/19 pass` in the run logs).

## Files this probe touched

* `src/organs/organ_manager.cpp` — phys05_code_skeleton add05() spec
  edited 4× during grid, restored to (1.15, 2, 1.05) at end
* `src/organs/organ_manager.cpp::PHYS05_PACK` — flipped prod↔v5 8×, restored to prod
* `/tmp/bd6_8f_plus_grid_summary.txt` — raw per-cell results
* `/tmp/bd6_8f_plus_*.log` — per-cell anchor logs
* `/tmp/bd6_8f_plus_grid.sh`, `/tmp/bd6_8f_plus_cell.sh` — orchestration scripts
* `reports/BD6_8F_PLUS_DECODER_GRID.md` — this file

No data files written, no LoRA produced, no .planck repacked.

## What this proves

* **The (cuda_rep_penalty, no_repeat_ngram) lever is a binary cliff,
  not a spectrum.** cuda_rep=1.05 holds production; cuda_rep<1.05
  drops it instantly. There's no graceful intermediate.
* **BD6.5 has 4 anchor losses with two distinct causes:** 2 are
  decoder-noise (HE/34, HE/45 — would pass under softer decoder if
  prod could tolerate it), 2 are real weight gaps (MBPP/53, HE/85 —
  fail under both decoders). Lever F+ can't help either category at
  the same time.
* **PHYS05_DECODER_LOCKED:** any future runtime change that touches
  cuda_rep_penalty for phys05_code_skeleton risks cratering production.
  Worth flagging in `docs/CURRENT_TRUTH_LEDGER.md`.

The remaining lever for closing MBPP/53 and HE/85 is BD6.8D
(token-weighted CE). Awaiting GO.