CyberdyneLabs · Reports · BD6_7_KL_ANCHOR_LADDER

BD6.7 — KL-anchor ladder, no rung beats BD6.5 (2026-05-02)

reports/BD6_7_KL_ANCHOR_LADDER.md 1564 words raw markdown ↗

BD6.7 — KL-anchor ladder, no rung beats BD6.5 (2026-05-02)

TL;DR — KL distillation against the BD6 pass-1 teacher does not help on this organ. Ran the full λ ∈ {0.05, 0.10, 0.20} ladder per spec; every rung produced a worse anchor pass-rate than BD6.5's no-KL baseline of 15/19. Best KL-ladder rungs were λ=0.05 and λ=0.10 at 12/19 each (different regression sets); λ=0.20 was 10/19. Production reverted; all three v7 packs archived. BD6.5 (15/19, no KL) remains the local peak. The structural reason is now clear: with same-architecture teacher/student and anchor_positive.jsonl already containing teacher outputs as CE targets, the KL term is redundant with CE on anchors and just dilutes the poison gradient.


Pipeline (PYTHON_QUARANTINE-compliant)

production: physarum05b_code_skeleton.planck  (BD6 pass-1, anchor 19/19)

Trainer:        tools/surgery/train_code_skeleton_lora_bd6_7.py  (new)
Dataset:        bd6_5_mixed_train.jsonl  (UNCHANGED — 525 rows, 53.3 % anchor)
                — strict per spec: do NOT change data shape
Hyperparams:    r=8, alpha=16, lr=3e-5, ep=1, ckpt-step=50
KL settings:    top-K=50 truncation, holdout-mult=4×
                (MBPP/53, HE/34, /45, /85 each get 4× KL weight)
Teacher:        tools/surgery/output/Physarum05B-CodeSkeleton/
                — frozen merged BD6 pass-1 (BF16 in HF; 4-bit for training)
Loss:           total = CE(prompt+target) + λ * w(task) * KL_topk(t || s)
                w(task) = 4.0 if task ∈ HOLDOUTS else 1.0
                KL applied to anchor rows only; poison rows = CE only.

Three runs, λ ∈ {0.10, 0.20, 0.05}, each: train → merge → flip → gate →
revert → archive.

GPU footprint: student (4-bit, 0.3 GB) + teacher (4-bit, 0.3 GB) + activations


Results — KL ladder vs BD6.5 baseline

| run | λ | avg_ce | avg_kl | anchor pass | gate | regressed (vs full 19) | |--------|------|--------|--------|-------------|------|-------------------------| | BD6.5 | 0.00 | 0.5354 | — | 15 / 19 | REVERT (still <19) | MBPP/53, HE/34, /45, /85 | | v7 (a)| 0.10 | 0.6696 | 0.5792 | 12 / 19 | REVERT | MBPP/20, /53, /93, /96, HE/27, /45, /85 | | v7b (b)| 0.20 | 0.6837 | 0.4807 | 10 / 19 | REVERT | MBPP/20, /53, /93, /96, /99, HE/27, /34, /45, /85 | | v7c (c)| 0.05 | 0.6649 | 0.6653 | 12 / 19 | REVERT | MBPP/53, /96, HE/27, /34, /45, /53, /85 |

KL increases CE on anchors (CE ≈ 0.67 vs BD6.5's 0.54) — exactly because λ × KL takes optimizer step away from CE-on-target. The CE rises as λ rises, while KL falls (student gets pulled tighter toward teacher distribution). Both directions visible in the avg numbers.

Holdouts always lost. Across all three rungs, the four BD6.5 holdouts (MBPP/53, HE/34, /45, /85) failed at least once each. The KL term did briefly recover HE/34 at λ=0.10 and lose it again at λ=0.20. MBPP/53, HE/45, HE/85 — never recovered at any λ.

Plus collateral: every rung introduced regressions on short anchors that were stable at BD6.5 (MBPP/20, /93, /96, /99, HE/27, /53). The KL pull on those non-holdout anchors fights with the CE pull on poison; the LoRA can't optimize both, so it drifts.

Structural reason KL doesn't help here

The teacher (BD6 pass-1 merged) and the student (donor + new LoRA) are the same architecture, same size (Qwen2 0.5B, 24 layers, 14 heads). And the anchor rows in anchor_positive.jsonl already contain the teacher's actual outputs as repair_target. So:

anchor row → CE on target  pulls student → teacher's output token sequence
                       → KL on logits  pulls student → teacher's logits over top-50

These are the same direction. KL adds redundant pressure on anchors; meanwhile poison rows still have only CE; net effect is anchor side is over-constrained and poison side is under-fit. Result: anchor-stable short prompts drift (because LoRA can't even learn poison without disturbing anchor), and holdout long prompts still fail (because the failure mode is not lack of teacher-alignment — the teacher's own outputs do pass at training but don't pass at runtime due to length × poison interaction).

In short: KL distillation is the right tool when teacher >> student in capacity. Here teacher = student. The lever was always wrong.

Numbers across all six BD6.x passes

| pass | anchor share | KL λ | post-merge | gate decision | note | |-------------|--------------|------|------------|---------------|------| | BD6 pass-1 | 0 % | — | 19/19 | KEEP | defines anchor | | BD6.2 | 0 % | — | not run | REVERT | post-bench MBPP regress | | BD6.3 | 0 % | — | 0 / 19 | REVERT | catastrophic forgetting | | BD6.4 | 28 % | — | 7 / 19 | REVERT | 5× anchor replication | | BD6.5 | 53 % | | 15 / 19 | REVERT (peak) | bench-aware repl + stratified | | BD6.6 | 63 % | — | 11 / 19 | REVERT | over-anchor regression | | BD6.7a | 53 % | 0.10 | 12 / 19 | REVERT | KL ladder mid-rung | | BD6.7b | 53 % | 0.20 | 10 / 19 | REVERT | KL ladder high-rung | | BD6.7c | 53 % | 0.05 | 12 / 19 | REVERT | KL ladder low-rung |

Local peak unchanged: BD6.5 with 15/19, no KL.

What still doesn't work

reverses. Lever closed.

same-size teacher; degrades short-prompt fluency without unlocking long holdouts. Lever closed.

The 4 holdouts (MBPP/53, HE/34, /45, /85) are robustly unfixable with any data-shape lever and any teacher-student lever at this trainer/runtime config.

What's actually different about the holdouts

Length isn't sufficient: MBPP/19 (337 chars) consistently passes; HE/85 (366 chars) consistently fails. Token similarity to short MBPP isn't sufficient: HE/27 (267 chars) is within MBPP-style range but fails. Common pattern across the 4: multi-line targets with blank-line boundaries inside the def-body. The runtime extractor in anchor_eval.py::extract_def reads a continuous indented block, but greedy extraction stops at boundary-like sequences when the model emits slight whitespace drift. So:

largely is, at the token level).

penalty, no-7B-fallback path) that occasionally inserts/removes a blank line.

This is not a teacher-pull problem. It's a runtime decoding determinism problem, possibly compounded by the prompt template's Do NOT write 'import' lines instruction (which forces the model to emit imports inline-after-def in some HE prompts that need them).

Levers that remain (recommended order for BD6.8 — pending user GO)

Lever F — runtime determinism on anchor prompts

Force greedy decoding (top-p=1.0, top-k=1, repeat-penalty=1.0, fixed seed) for the four holdout prompts. Or for all phys05_code_skeleton calls during the gate. This costs nothing at training time — it's a runtime-config-only fix.

If this alone takes BD6.5 from 15/19 to 19/19, no further surgery is needed — the 15/19 was always 15/19 in deterministic eval, and 4 "failures" were sampler noise on borderline outputs.

Cheapest test of this hypothesis: re-run anchor_eval.py against the production pack with phys05.greedy=1 env (or the equivalent existing flag). If anchor jumps to 18-19/19, runtime variance was the bug. No training needed.

Lever D — token-weighted CE (still on the table)

If runtime is already deterministic and the holdouts still fail, scale per-row CE loss by 1 / sqrt(target_token_count) so long targets don't dominate gradient. Use BD6.5 dataset shape unchanged. ~10-line change to the bd6_5 trainer's loss step. No KL.

Lever G — extractor relaxation in anchor_eval.py

The verifier's def-extractor stops at blank lines inside the body. Make it tolerate up to 1 blank line as long as indentation continues. ~5 lines. Test against frozen production: if production now passes 19/19 + also the BD6.5 v5 pack jumps from 15/19 → 18/19, the gate was overstrict for these specific holdouts.

Recommended Lever F first (zero risk, zero training, tests a different hypothesis), then G, then D. Do not run KL again.

Production state (after BD6.7 ladder revert)

Files this pass touched

What this proves

CE targets are already the teacher's own outputs. Two redundant signals do not equal one stronger signal — they equal one diluted signal.

training-side levers tried (BD6.3/4/5/6/7-a/b/c) — none unlock all four. The remaining levers are at the runtime/verifier boundary.

production stayed at 19/19 the whole time. The whole BD6.x cycle has not introduced a single regression to MBPP B or HE B.

The lever is no longer in the trainer. It's in the runtime sampler and the verifier extractor. Awaiting user decision on Lever F / G / D order.