BD6.7 — KL-anchor ladder, no rung beats BD6.5 (2026-05-02)
TL;DR — KL distillation against the BD6 pass-1 teacher does not help on this organ. Ran the full λ ∈ {0.05, 0.10, 0.20} ladder per spec; every rung produced a worse anchor pass-rate than BD6.5's no-KL baseline of 15/19. Best KL-ladder rungs were λ=0.05 and λ=0.10 at 12/19 each (different regression sets); λ=0.20 was 10/19. Production reverted; all three v7 packs archived. BD6.5 (15/19, no KL) remains the local peak. The structural reason is now clear: with same-architecture teacher/student and anchor_positive.jsonl already containing teacher outputs as CE targets, the KL term is redundant with CE on anchors and just dilutes the poison gradient.
Pipeline (PYTHON_QUARANTINE-compliant)
production: physarum05b_code_skeleton.planck (BD6 pass-1, anchor 19/19)
Trainer: tools/surgery/train_code_skeleton_lora_bd6_7.py (new)
Dataset: bd6_5_mixed_train.jsonl (UNCHANGED — 525 rows, 53.3 % anchor)
— strict per spec: do NOT change data shape
Hyperparams: r=8, alpha=16, lr=3e-5, ep=1, ckpt-step=50
KL settings: top-K=50 truncation, holdout-mult=4×
(MBPP/53, HE/34, /45, /85 each get 4× KL weight)
Teacher: tools/surgery/output/Physarum05B-CodeSkeleton/
— frozen merged BD6 pass-1 (BF16 in HF; 4-bit for training)
Loss: total = CE(prompt+target) + λ * w(task) * KL_topk(t || s)
w(task) = 4.0 if task ∈ HOLDOUTS else 1.0
KL applied to anchor rows only; poison rows = CE only.
Three runs, λ ∈ {0.10, 0.20, 0.05}, each: train → merge → flip → gate →
revert → archive.
GPU footprint: student (4-bit, 0.3 GB) + teacher (4-bit, 0.3 GB) + activations
- optimizer ≈ 1.93 GB / 8 GB. Inside budget on RTX 3060 Ti.
Results — KL ladder vs BD6.5 baseline
| run | λ | avg_ce | avg_kl | anchor pass | gate | regressed (vs full 19) | |--------|------|--------|--------|-------------|------|-------------------------| | BD6.5 | 0.00 | 0.5354 | — | 15 / 19 | REVERT (still <19) | MBPP/53, HE/34, /45, /85 | | v7 (a)| 0.10 | 0.6696 | 0.5792 | 12 / 19 | REVERT | MBPP/20, /53, /93, /96, HE/27, /45, /85 | | v7b (b)| 0.20 | 0.6837 | 0.4807 | 10 / 19 | REVERT | MBPP/20, /53, /93, /96, /99, HE/27, /34, /45, /85 | | v7c (c)| 0.05 | 0.6649 | 0.6653 | 12 / 19 | REVERT | MBPP/53, /96, HE/27, /34, /45, /53, /85 |
KL increases CE on anchors (CE ≈ 0.67 vs BD6.5's 0.54) — exactly because λ × KL takes optimizer step away from CE-on-target. The CE rises as λ rises, while KL falls (student gets pulled tighter toward teacher distribution). Both directions visible in the avg numbers.
Holdouts always lost. Across all three rungs, the four BD6.5 holdouts (MBPP/53, HE/34, /45, /85) failed at least once each. The KL term did briefly recover HE/34 at λ=0.10 and lose it again at λ=0.20. MBPP/53, HE/45, HE/85 — never recovered at any λ.
Plus collateral: every rung introduced regressions on short anchors that were stable at BD6.5 (MBPP/20, /93, /96, /99, HE/27, /53). The KL pull on those non-holdout anchors fights with the CE pull on poison; the LoRA can't optimize both, so it drifts.
Structural reason KL doesn't help here
The teacher (BD6 pass-1 merged) and the student (donor + new LoRA) are the same architecture, same size (Qwen2 0.5B, 24 layers, 14 heads). And the anchor rows in anchor_positive.jsonl already contain the teacher's actual outputs as repair_target. So:
anchor row → CE on target pulls student → teacher's output token sequence
→ KL on logits pulls student → teacher's logits over top-50
These are the same direction. KL adds redundant pressure on anchors; meanwhile poison rows still have only CE; net effect is anchor side is over-constrained and poison side is under-fit. Result: anchor-stable short prompts drift (because LoRA can't even learn poison without disturbing anchor), and holdout long prompts still fail (because the failure mode is not lack of teacher-alignment — the teacher's own outputs do pass at training but don't pass at runtime due to length × poison interaction).
In short: KL distillation is the right tool when teacher >> student in capacity. Here teacher = student. The lever was always wrong.
Numbers across all six BD6.x passes
| pass | anchor share | KL λ | post-merge | gate decision | note | |-------------|--------------|------|------------|---------------|------| | BD6 pass-1 | 0 % | — | 19/19 | KEEP | defines anchor | | BD6.2 | 0 % | — | not run | REVERT | post-bench MBPP regress | | BD6.3 | 0 % | — | 0 / 19 | REVERT | catastrophic forgetting | | BD6.4 | 28 % | — | 7 / 19 | REVERT | 5× anchor replication | | BD6.5 | 53 % | — | 15 / 19 | REVERT (peak) | bench-aware repl + stratified | | BD6.6 | 63 % | — | 11 / 19 | REVERT | over-anchor regression | | BD6.7a | 53 % | 0.10 | 12 / 19 | REVERT | KL ladder mid-rung | | BD6.7b | 53 % | 0.20 | 10 / 19 | REVERT | KL ladder high-rung | | BD6.7c | 53 % | 0.05 | 12 / 19 | REVERT | KL ladder low-rung |
Local peak unchanged: BD6.5 with 15/19, no KL.
What still doesn't work
- Replication (BD6.4 → BD6.5 → BD6.6): saturates at 53 %, then
reverses. Lever closed.
- KL anchor (BD6.7 ladder a/b/c): adds redundant pressure to CE on
same-size teacher; degrades short-prompt fluency without unlocking long holdouts. Lever closed.
The 4 holdouts (MBPP/53, HE/34, /45, /85) are robustly unfixable with any data-shape lever and any teacher-student lever at this trainer/runtime config.
What's actually different about the holdouts
Length isn't sufficient: MBPP/19 (337 chars) consistently passes; HE/85 (366 chars) consistently fails. Token similarity to short MBPP isn't sufficient: HE/27 (267 chars) is within MBPP-style range but fails. Common pattern across the 4: multi-line targets with blank-line boundaries inside the def-body. The runtime extractor in anchor_eval.py::extract_def reads a continuous indented block, but greedy extraction stops at boundary-like sequences when the model emits slight whitespace drift. So:
- The training-time CE thinks it's reproducing the target (and it
largely is, at the token level).
- The runtime sampling adds tiny stochastic drift (top-p, repetition
penalty, no-7B-fallback path) that occasionally inserts/removes a blank line.
- The verifier then can't compile what it gets.
This is not a teacher-pull problem. It's a runtime decoding determinism problem, possibly compounded by the prompt template's Do NOT write 'import' lines instruction (which forces the model to emit imports inline-after-def in some HE prompts that need them).
Levers that remain (recommended order for BD6.8 — pending user GO)
Lever F — runtime determinism on anchor prompts
Force greedy decoding (top-p=1.0, top-k=1, repeat-penalty=1.0, fixed seed) for the four holdout prompts. Or for all phys05_code_skeleton calls during the gate. This costs nothing at training time — it's a runtime-config-only fix.
If this alone takes BD6.5 from 15/19 to 19/19, no further surgery is needed — the 15/19 was always 15/19 in deterministic eval, and 4 "failures" were sampler noise on borderline outputs.
Cheapest test of this hypothesis: re-run anchor_eval.py against the production pack with phys05.greedy=1 env (or the equivalent existing flag). If anchor jumps to 18-19/19, runtime variance was the bug. No training needed.
Lever D — token-weighted CE (still on the table)
If runtime is already deterministic and the holdouts still fail, scale per-row CE loss by 1 / sqrt(target_token_count) so long targets don't dominate gradient. Use BD6.5 dataset shape unchanged. ~10-line change to the bd6_5 trainer's loss step. No KL.
Lever G — extractor relaxation in anchor_eval.py
The verifier's def-extractor stops at blank lines inside the body. Make it tolerate up to 1 blank line as long as indentation continues. ~5 lines. Test against frozen production: if production now passes 19/19 + also the BD6.5 v5 pack jumps from 15/19 → 18/19, the gate was overstrict for these specific holdouts.
Recommended Lever F first (zero risk, zero training, tests a different hypothesis), then G, then D. Do not run KL again.
Production state (after BD6.7 ladder revert)
PHYS05_PACK = physarum05b_code_skeleton.planck(BD6 pass-1).- MBPP B = 13/100, HE B = 6/164, LCB B = 0/50, anchor 19/19.
- Archives:
physarum05b_code_skeleton_v7_lambda005.planckphysarum05b_code_skeleton_v7_lambda010.planckphysarum05b_code_skeleton_v7_lambda020.planck- matching
tools/surgery/output/code_skeleton_lora_v7_lambda{005,010,020}/ - matching merged HF dirs
Files this pass touched
tools/surgery/train_code_skeleton_lora_bd6_7.py— new, KL-anchor trainerphysarum05b_code_skeleton_v7_lambda{005,010,020}.planck— repacked, rejected, archivedtools/surgery/output/code_skeleton_lora_v7_lambda{005,010,020}/— adapters + ckpts (rejected)tools/surgery/output/Physarum05B-CodeSkeleton-v7_lambda{005,010,020}/— merged HF dirs (rejected)src/organs/organ_manager.cpp::PHYS05_PACK— flipped 3× then back to v1reports/BD6_7_KL_ANCHOR_LADDER.md— this file
What this proves
- Same-size teacher/student KL distillation does not work when the
CE targets are already the teacher's own outputs. Two redundant signals do not equal one stronger signal — they equal one diluted signal.
- The holdouts are not a training-data problem. Five separate
training-side levers tried (BD6.3/4/5/6/7-a/b/c) — none unlock all four. The remaining levers are at the runtime/verifier boundary.
- The strict gate keeps doing its job. Six rejections in a row,
production stayed at 19/19 the whole time. The whole BD6.x cycle has not introduced a single regression to MBPP B or HE B.
The lever is no longer in the trainer. It's in the runtime sampler and the verifier extractor. Awaiting user decision on Lever F / G / D order.