CyberdyneLabs · Reports · BD6_8F_PLUS_DECODER_GRID

BD6.8F+ — decoder grid: no sweet spot exists (2026-05-02)

reports/BD6_8F_PLUS_DECODER_GRID.md 961 words raw markdown ↗

BD6.8F+ — decoder grid: no sweet spot exists (2026-05-02)

TL;DR — at the (cuda_rep_penalty, no_repeat_ngram) granularity, NO decoder spec keeps production at 19/19 AND lifts BD6.5 above 15/19. The two requirements are mutually exclusive: production needs cuda_rep ≥ 1.05 to pass anchor; BD6.5 only gains anchors when cuda_rep ≤ 1.00. The intervals don't overlap. Lever F+ is closed.

Next step is BD6.8D (token-weighted CE on BD6.5 dataset shape) — that's the path to recover MBPP/53 + HE/85 (the two real weight-gap holdouts) without changing decoder.


Grid tested

CPU repetition_penalty held at 1.15 (no effect on CUDA backend used by phys05_code_skeleton). Varied cuda_rep_penalty × no_repeat_ngram:

| cell | cuda_rep | ngram | prod pack | v5 pack | note | |------|----------|-------|-----------|---------|------| | C0 (current) | 1.05 | 2 | 19/19 | 15/19 | production decoder, also BD6.5's measurement | | C1 | 1.05 | 0 | 19/19 | 15/19 | drop ngram only — prod safe, v5 unchanged | | C2 | 1.02 | 2 | 4/19 | 15/19 | soften cuda_rep slightly — prod collapses | | C4 | 1.00 | 2 | 1/19 | 16/19 | kill cuda_rep, keep ngram — prod cratered, v5 +1 | | C5 (greedy) | 1.00 | 0 | 1/19 | 16/19 | pure greedy — prod cratered, v5 +1 |

(Cell C3 = (1.02, 0) was skipped: at cuda_rep=1.02 production already fell to 4/19 in C2, so dropping ngram on top wouldn't recover prod.)

Two non-overlapping intervals

                             cuda_rep_penalty
       ╔══════ 1.05 ══════╗  ╔═ 1.02 ═╗  ╔══ 1.00 ══╗
prod   ║ 19/19 (safe)     ║  ║ 4/19   ║  ║ 1/19     ║
       ╚══════════════════╝  ╚════════╝  ╚══════════╝
       ╔══════ 1.05 ══════╗  ╔═ 1.02 ═╗  ╔══ 1.00 ══╗
v5     ║ 15/19 (no gain)  ║  ║ 15/19  ║  ║ 16/19    ║
       ╚══════════════════╝  ╚════════╝  ╚══════════╝
                             ↑ no overlap ↑
                             where prod stays ≥19 AND v5 ≥17

There is no value of cuda_rep_penalty (at the 0.05 step granularity we tested) that satisfies both constraints simultaneously.

Why this is decisive

weights — the model only behaves correctly when cuda_rep_penalty is biting hard enough to suppress repetition. Drop it at all and the model loops/repeats and fails compile.

generalize the anchor patterns enough that pure greedy works for most of them. But the residual 4 holdouts (MBPP/53, HE/34, /45, HE/85) fail under the production decoder because cuda_rep_penalty=1.05 perturbs the long-anchor argmax just enough to lose the pattern.

the hot regime (rep-pen mandatory); BD6.5 lives in the cool regime (rep-pen unhelpful or harmful). One decoder cannot serve both packs.

Per-row verdict on the 4 BD6.5 holdouts (re-confirmed)

From combined BD6.8F + BD6.8F+ data:

| holdout | C0 prod | C0 v5 | C5 v5 (greedy) | verdict | |---------|---------|-------|----------------|---------| | MBPP/53 | ✓ | ✗ | ✗ | real weight gap (both decoders fail v5) | | HumanEval/34| ✓ | ✗ | ✓ | decoder-noise (greedy fixed v5) | | HumanEval/45| ✓ | ✗ | ✓ | decoder-noise (greedy fixed v5) | | HumanEval/85| ✓ | ✗ | ✗ | real weight gap (both decoders fail v5) |

decoder change recovers them. But that decoder change costs production.

must address.

Decision per spec

Do not merge v5 unless:
  - production config is safe          ← requires cuda_rep ≥ 1.05
  - v5 anchor ≥ 19/19 OR ≥ 17/19 with explicit user approval

Best v5 result at any production-safe spec: 15/19. Best v5 result at any spec at all: 16/19 (greedy, prod cratered). Neither passes the gate. v5 is NOT merged. Production stays unchanged.

Recommended next step

BD6.8D — token-weighted CE, no other changes.

Specifically, modify tools/surgery/train_code_skeleton_lora_bd6_5.py loss step:

# current
total_loss = out.loss

# new (BD6.8D)
seq_len = (labels != -100).sum().clamp(min=1).float()
total_loss = out.loss / seq_len.sqrt()

This makes long targets contribute proportionally less per gradient step, so they don't dominate early training and force the LoRA into pattern-memorization mode that drifts under poison pressure.

Use BD6.5 dataset shape unchanged (bd6_5_mixed_train.jsonl, 525 rows, 53 % anchor share, stratified).

DO NOT add KL (BD6.7 ladder showed it's redundant). DO NOT increase replication (BD6.6 showed saturation at 53 %). DO NOT change decoder for production (BD6.8F+ proved no overlap).

Production state (after BD6.8F+ revert)

end of this report or grep [anchor] 19/19 pass in the run logs).

Files this probe touched

edited 4× during grid, restored to (1.15, 2, 1.05) at end

No data files written, no LoRA produced, no .planck repacked.

What this proves

not a spectrum.** cuda_rep=1.05 holds production; cuda_rep<1.05 drops it instantly. There's no graceful intermediate.

decoder-noise (HE/34, HE/45 — would pass under softer decoder if prod could tolerate it), 2 are real weight gaps (MBPP/53, HE/85 — fail under both decoders). Lever F+ can't help either category at the same time.

cuda_rep_penalty for phys05_code_skeleton risks cratering production. Worth flagging in docs/CURRENT_TRUTH_LEDGER.md.

The remaining lever for closing MBPP/53 and HE/85 is BD6.8D (token-weighted CE). Awaiting GO.