BD6.8F+ — decoder grid: no sweet spot exists (2026-05-02)

TL;DR — at the (cuda_rep_penalty, no_repeat_ngram) granularity, NO decoder spec keeps production at 19/19 AND lifts BD6.5 above 15/19. The two requirements are mutually exclusive: production needs cuda_rep ≥ 1.05 to pass anchor; BD6.5 only gains anchors when cuda_rep ≤ 1.00. The intervals don't overlap. Lever F+ is closed.

Next step is BD6.8D (token-weighted CE on BD6.5 dataset shape) — that's the path to recover MBPP/53 + HE/85 (the two real weight-gap holdouts) without changing decoder.

Grid tested

CPU repetition_penalty held at 1.15 (no effect on CUDA backend used by phys05_code_skeleton). Varied cuda_rep_penalty × no_repeat_ngram:

| cell | cuda_rep | ngram | prod pack | v5 pack | note | |------|----------|-------|-----------|---------|------| | C0 (current) | 1.05 | 2 | 19/19 | 15/19 | production decoder, also BD6.5's measurement | | C1 | 1.05 | 0 | 19/19 | 15/19 | drop ngram only — prod safe, v5 unchanged | | C2 | 1.02 | 2 | 4/19 | 15/19 | soften cuda_rep slightly — prod collapses | | C4 | 1.00 | 2 | 1/19 | 16/19 | kill cuda_rep, keep ngram — prod cratered, v5 +1 | | C5 (greedy) | 1.00 | 0 | 1/19 | 16/19 | pure greedy — prod cratered, v5 +1 |

(Cell C3 = (1.02, 0) was skipped: at cuda_rep=1.02 production already fell to 4/19 in C2, so dropping ngram on top wouldn't recover prod.)

Two non-overlapping intervals

                             cuda_rep_penalty
       ╔══════ 1.05 ══════╗  ╔═ 1.02 ═╗  ╔══ 1.00 ══╗
prod   ║ 19/19 (safe)     ║  ║ 4/19   ║  ║ 1/19     ║
       ╚══════════════════╝  ╚════════╝  ╚══════════╝
       ╔══════ 1.05 ══════╗  ╔═ 1.02 ═╗  ╔══ 1.00 ══╗
v5     ║ 15/19 (no gain)  ║  ║ 15/19  ║  ║ 16/19    ║
       ╚══════════════════╝  ╚════════╝  ╚══════════╝
                             ↑ no overlap ↑
                             where prod stays ≥19 AND v5 ≥17

There is no value of cuda_rep_penalty (at the 0.05 step granularity we tested) that satisfies both constraints simultaneously.

Why this is decisive

Production BD6 pass-1 is essentially a "hot mode" of the donor

weights — the model only behaves correctly when cuda_rep_penalty is biting hard enough to suppress repetition. Drop it at all and the model loops/repeats and fails compile.

BD6.5 weights have learned to be less repetition-prone — they

generalize the anchor patterns enough that pure greedy works for most of them. But the residual 4 holdouts (MBPP/53, HE/34, /45, HE/85) fail under the production decoder because cuda_rep_penalty=1.05 perturbs the long-anchor argmax just enough to lose the pattern.

These are two different operating regimes. Production lives in

the hot regime (rep-pen mandatory); BD6.5 lives in the cool regime (rep-pen unhelpful or harmful). One decoder cannot serve both packs.

Per-row verdict on the 4 BD6.5 holdouts (re-confirmed)

From combined BD6.8F + BD6.8F+ data:

| holdout | C0 prod | C0 v5 | C5 v5 (greedy) | verdict | |---------|---------|-------|----------------|---------| | MBPP/53 | ✓ | ✗ | ✗ | real weight gap (both decoders fail v5) | | HumanEval/34| ✓ | ✗ | ✓ | decoder-noise (greedy fixed v5) | | HumanEval/45| ✓ | ✗ | ✓ | decoder-noise (greedy fixed v5) | | HumanEval/85| ✓ | ✗ | ✗ | real weight gap (both decoders fail v5) |

HE/34, HE/45 — the LoRA actually has the right weights; only the

decoder change recovers them. But that decoder change costs production.

MBPP/53, HE/85 — these are the targets BD6.8D (token-weighted CE)

must address.

Decision per spec

Do not merge v5 unless:
  - production config is safe          ← requires cuda_rep ≥ 1.05
  - v5 anchor ≥ 19/19 OR ≥ 17/19 with explicit user approval

Best v5 result at any production-safe spec: 15/19. Best v5 result at any spec at all: 16/19 (greedy, prod cratered). Neither passes the gate. v5 is NOT merged. Production stays unchanged.

Recommended next step

BD6.8D — token-weighted CE, no other changes.

Specifically, modify tools/surgery/train_code_skeleton_lora_bd6_5.py loss step:

# current
total_loss = out.loss

# new (BD6.8D)
seq_len = (labels != -100).sum().clamp(min=1).float()
total_loss = out.loss / seq_len.sqrt()

This makes long targets contribute proportionally less per gradient step, so they don't dominate early training and force the LoRA into pattern-memorization mode that drifts under poison pressure.

Use BD6.5 dataset shape unchanged (bd6_5_mixed_train.jsonl, 525 rows, 53 % anchor share, stratified).

DO NOT add KL (BD6.7 ladder showed it's redundant). DO NOT increase replication (BD6.6 showed saturation at 53 %). DO NOT change decoder for production (BD6.8F+ proved no overlap).

Production state (after BD6.8F+ revert)

PHYS05_PACK = physarum05b_code_skeleton.planck (BD6 pass-1).
phys05_code_skeleton spec: rep=1.15, ng=2, cuda_rep=1.05 (restored).
Anchor 19/19 verification scheduled post-revert (see verify line at

end of this report or grep [anchor] 19/19 pass in the run logs).

Files this probe touched

src/organs/organ_manager.cpp — phys05_code_skeleton add05() spec

edited 4× during grid, restored to (1.15, 2, 1.05) at end

src/organs/organ_manager.cpp::PHYS05_PACK — flipped prod↔v5 8×, restored to prod
/tmp/bd6_8f_plus_grid_summary.txt — raw per-cell results
/tmp/bd6_8f_plus_*.log — per-cell anchor logs
/tmp/bd6_8f_plus_grid.sh, /tmp/bd6_8f_plus_cell.sh — orchestration scripts
reports/BD6_8F_PLUS_DECODER_GRID.md — this file

No data files written, no LoRA produced, no .planck repacked.

What this proves

**The (cuda_rep_penalty, no_repeat_ngram) lever is a binary cliff,

not a spectrum.** cuda_rep=1.05 holds production; cuda_rep<1.05 drops it instantly. There's no graceful intermediate.

BD6.5 has 4 anchor losses with two distinct causes: 2 are

decoder-noise (HE/34, HE/45 — would pass under softer decoder if prod could tolerate it), 2 are real weight gaps (MBPP/53, HE/85 — fail under both decoders). Lever F+ can't help either category at the same time.

PHYS05_DECODER_LOCKED: any future runtime change that touches

cuda_rep_penalty for phys05_code_skeleton risks cratering production. Worth flagging in docs/CURRENT_TRUTH_LEDGER.md.

The remaining lever for closing MBPP/53 and HE/85 is BD6.8D (token-weighted CE). Awaiting GO.