# BD6.7 — KL-anchor ladder, no rung beats BD6.5 (2026-05-02)

**TL;DR — KL distillation against the BD6 pass-1 teacher does not help on
this organ.** Ran the full λ ∈ {0.05, 0.10, 0.20} ladder per spec; every
rung produced a *worse* anchor pass-rate than BD6.5's no-KL baseline of
15/19. Best KL-ladder rungs were λ=0.05 and λ=0.10 at **12/19** each
(different regression sets); λ=0.20 was 10/19. Production reverted; all
three v7 packs archived. **BD6.5 (15/19, no KL) remains the local peak.**
The structural reason is now clear: with same-architecture teacher/student
and `anchor_positive.jsonl` already containing teacher outputs as CE
targets, the KL term is redundant with CE on anchors and just dilutes the
poison gradient.

---

## Pipeline (PYTHON_QUARANTINE-compliant)

```
production: physarum05b_code_skeleton.planck  (BD6 pass-1, anchor 19/19)

Trainer:        tools/surgery/train_code_skeleton_lora_bd6_7.py  (new)
Dataset:        bd6_5_mixed_train.jsonl  (UNCHANGED — 525 rows, 53.3 % anchor)
                — strict per spec: do NOT change data shape
Hyperparams:    r=8, alpha=16, lr=3e-5, ep=1, ckpt-step=50
KL settings:    top-K=50 truncation, holdout-mult=4×
                (MBPP/53, HE/34, /45, /85 each get 4× KL weight)
Teacher:        tools/surgery/output/Physarum05B-CodeSkeleton/
                — frozen merged BD6 pass-1 (BF16 in HF; 4-bit for training)
Loss:           total = CE(prompt+target) + λ * w(task) * KL_topk(t || s)
                w(task) = 4.0 if task ∈ HOLDOUTS else 1.0
                KL applied to anchor rows only; poison rows = CE only.

Three runs, λ ∈ {0.10, 0.20, 0.05}, each: train → merge → flip → gate →
revert → archive.
```

GPU footprint: student (4-bit, 0.3 GB) + teacher (4-bit, 0.3 GB) + activations
+ optimizer ≈ 1.93 GB / 8 GB. Inside budget on RTX 3060 Ti.

---

## Results — KL ladder vs BD6.5 baseline

| run    | λ    | avg_ce | avg_kl | anchor pass | gate | regressed (vs full 19) |
|--------|------|--------|--------|-------------|------|-------------------------|
| BD6.5  | 0.00 | 0.5354 |  —     | **15 / 19** | REVERT (still <19) | MBPP/53, HE/34, /45, /85 |
| v7  (a)| 0.10 | 0.6696 | 0.5792 | 12 / 19     | REVERT | MBPP/20, /53, /93, /96, HE/27, /45, /85 |
| v7b (b)| 0.20 | 0.6837 | 0.4807 | 10 / 19     | REVERT | MBPP/20, /53, /93, /96, /99, HE/27, /34, /45, /85 |
| v7c (c)| 0.05 | 0.6649 | 0.6653 | 12 / 19     | REVERT | MBPP/53, /96, HE/27, /34, /45, /53, /85 |

KL increases CE on anchors (CE ≈ 0.67 vs BD6.5's 0.54) — exactly because
λ × KL takes optimizer step away from CE-on-target. The CE rises as λ
rises, while KL falls (student gets pulled tighter toward teacher
distribution). Both directions visible in the avg numbers.

**Holdouts always lost.** Across all three rungs, the four BD6.5 holdouts
(MBPP/53, HE/34, /45, /85) failed at least once each. The KL term did
*briefly* recover HE/34 at λ=0.10 and lose it again at λ=0.20. **MBPP/53,
HE/45, HE/85 — never recovered at any λ.**

**Plus collateral:** every rung introduced regressions on short anchors
that were stable at BD6.5 (MBPP/20, /93, /96, /99, HE/27, /53). The
KL pull on those non-holdout anchors fights with the CE pull on poison;
the LoRA can't optimize both, so it drifts.

## Structural reason KL doesn't help here

The teacher (BD6 pass-1 merged) and the student (donor + new LoRA) are
the **same architecture, same size** (Qwen2 0.5B, 24 layers, 14 heads).
And the anchor rows in `anchor_positive.jsonl` already contain the
teacher's actual outputs as `repair_target`. So:

```
anchor row → CE on target  pulls student → teacher's output token sequence
                       → KL on logits  pulls student → teacher's logits over top-50
```

These are the **same direction**. KL adds redundant pressure on anchors;
meanwhile poison rows still have only CE; net effect is anchor side is
*over-constrained* and poison side is *under-fit*. Result: anchor-stable
short prompts drift (because LoRA can't even learn poison without disturbing
anchor), and holdout long prompts still fail (because the failure mode is
not lack of teacher-alignment — the teacher's own outputs *do* pass at
training but don't pass at runtime due to length × poison interaction).

In short: KL distillation is the right tool when teacher >> student in
capacity. **Here teacher = student.** The lever was always wrong.

## Numbers across all six BD6.x passes

| pass        | anchor share | KL λ | post-merge | gate decision | note |
|-------------|--------------|------|------------|---------------|------|
| BD6 pass-1  | 0 %          | —    | 19/19      | KEEP          | defines anchor |
| BD6.2       | 0 %          | —    | not run    | REVERT        | post-bench MBPP regress |
| BD6.3       | 0 %          | —    | 0 / 19     | REVERT        | catastrophic forgetting |
| BD6.4       | 28 %         | —    | 7 / 19     | REVERT        | 5× anchor replication |
| **BD6.5**   | **53 %**     | **—** | **15 / 19** | **REVERT (peak)** | bench-aware repl + stratified |
| BD6.6       | 63 %         | —    | 11 / 19    | REVERT        | over-anchor regression |
| BD6.7a      | 53 %         | 0.10 | 12 / 19    | REVERT        | KL ladder mid-rung |
| BD6.7b      | 53 %         | 0.20 | 10 / 19    | REVERT        | KL ladder high-rung |
| BD6.7c      | 53 %         | 0.05 | 12 / 19    | REVERT        | KL ladder low-rung |

**Local peak unchanged: BD6.5 with 15/19, no KL.**

## What still doesn't work

* **Replication** (BD6.4 → BD6.5 → BD6.6): saturates at 53 %, then
  reverses. *Lever closed.*
* **KL anchor** (BD6.7 ladder a/b/c): adds redundant pressure to CE on
  same-size teacher; degrades short-prompt fluency without unlocking
  long holdouts. *Lever closed.*

The 4 holdouts (MBPP/53, HE/34, /45, /85) are robustly unfixable with
**any data-shape lever and any teacher-student lever** at this
trainer/runtime config.

## What's actually different about the holdouts

Length isn't sufficient: MBPP/19 (337 chars) consistently passes; HE/85
(366 chars) consistently fails. Token similarity to short MBPP isn't
sufficient: HE/27 (267 chars) is within MBPP-style range but fails.
Common pattern across the 4: **multi-line targets with blank-line
boundaries inside the def-body**. The runtime extractor in
`anchor_eval.py::extract_def` reads a continuous indented block, but
greedy extraction stops at boundary-like sequences when the model emits
slight whitespace drift. So:

* The training-time CE thinks it's reproducing the target (and it
  largely is, at the token level).
* The runtime sampling adds tiny stochastic drift (top-p, repetition
  penalty, no-7B-fallback path) that occasionally inserts/removes a
  blank line.
* The verifier then can't compile what it gets.

This is **not** a teacher-pull problem. It's a **runtime decoding
determinism problem**, possibly compounded by the prompt template's
`Do NOT write 'import' lines` instruction (which forces the model to
emit imports inline-after-def in some HE prompts that need them).

## Levers that remain (recommended order for BD6.8 — pending user GO)

### Lever F — runtime determinism on anchor prompts

Force greedy decoding (top-p=1.0, top-k=1, repeat-penalty=1.0, fixed
seed) for the four holdout prompts. Or for *all* phys05_code_skeleton
calls during the gate. This costs nothing at training time — it's a
runtime-config-only fix.

If this alone takes BD6.5 from 15/19 to 19/19, **no further surgery is
needed** — the 15/19 was always 15/19 in deterministic eval, and 4
"failures" were sampler noise on borderline outputs.

**Cheapest test of this hypothesis: re-run anchor_eval.py against the
production pack with `phys05.greedy=1` env (or the equivalent existing
flag). If anchor jumps to 18-19/19, runtime variance was the bug.** No
training needed.

### Lever D — token-weighted CE (still on the table)

If runtime is already deterministic and the holdouts still fail, scale
per-row CE loss by `1 / sqrt(target_token_count)` so long targets don't
dominate gradient. Use BD6.5 dataset shape unchanged. ~10-line change to
the bd6_5 trainer's loss step. **No KL.**

### Lever G — extractor relaxation in `anchor_eval.py`

The verifier's def-extractor stops at blank lines inside the body. Make
it tolerate up to 1 blank line as long as indentation continues. ~5
lines. Test against frozen production: if production now passes 19/19 +
also the BD6.5 v5 pack jumps from 15/19 → 18/19, the gate was overstrict
for these specific holdouts.

**Recommended Lever F first** (zero risk, zero training, tests a
different hypothesis), then G, then D. **Do not run KL again.**

## Production state (after BD6.7 ladder revert)

* `PHYS05_PACK = physarum05b_code_skeleton.planck` (BD6 pass-1).
* MBPP B = 13/100, HE B = 6/164, LCB B = 0/50, anchor 19/19.
* Archives:
  * `physarum05b_code_skeleton_v7_lambda005.planck`
  * `physarum05b_code_skeleton_v7_lambda010.planck`
  * `physarum05b_code_skeleton_v7_lambda020.planck`
  * matching `tools/surgery/output/code_skeleton_lora_v7_lambda{005,010,020}/`
  * matching merged HF dirs

## Files this pass touched

* `tools/surgery/train_code_skeleton_lora_bd6_7.py` — new, KL-anchor trainer
* `physarum05b_code_skeleton_v7_lambda{005,010,020}.planck` — repacked, rejected, archived
* `tools/surgery/output/code_skeleton_lora_v7_lambda{005,010,020}/` — adapters + ckpts (rejected)
* `tools/surgery/output/Physarum05B-CodeSkeleton-v7_lambda{005,010,020}/` — merged HF dirs (rejected)
* `src/organs/organ_manager.cpp::PHYS05_PACK` — flipped 3× then back to v1
* `reports/BD6_7_KL_ANCHOR_LADDER.md` — this file

## What this proves

* **Same-size teacher/student KL distillation does not work** when the
  CE targets are already the teacher's own outputs. Two redundant
  signals do not equal one stronger signal — they equal one diluted
  signal.
* **The holdouts are not a training-data problem.** Five separate
  training-side levers tried (BD6.3/4/5/6/7-a/b/c) — none unlock all
  four. The remaining levers are at the runtime/verifier boundary.
* **The strict gate keeps doing its job.** Six rejections in a row,
  production stayed at 19/19 the whole time. The whole BD6.x cycle
  has not introduced a single regression to MBPP B or HE B.

The lever is no longer in the trainer. It's in the runtime sampler and
the verifier extractor. Awaiting user decision on Lever F / G / D order.