CyberdyneLabs · Reports · BD6_6_OVER_ANCHOR_REGRESSION

BD6.6 — over-anchor regression, 11/19 (down from BD6.5's 15/19) (2026-05-02)

reports/BD6_6_OVER_ANCHOR_REGRESSION.md 1309 words raw markdown ↗

BD6.6 — over-anchor regression, 11/19 (down from BD6.5's 15/19) (2026-05-02)

TL;DR — over-replication on holdouts backfires. Pushed 4 holdouts × 50, plus other-HE × 25, MBPP-long × 25, MBPP-short × 10 against capped poison (36.6 %). Final anchor gate dropped from 15/19 to 11/19: all 4 holdouts still failed AND 4 previously-kept short-target anchors regressed. Loss curve was tighter (avg 0.5085 vs BD6.5's 0.5354), but tighter loss on the holdout subset produced mode collapse, not memorization. Production reverted; v6 archived; BD6.5 stratified shape stands as the local optimum for replication-only levers. Real fix is no longer replication — it's the loss function itself (Lever D: token-weighted) or KL anchor (Lever E).


Pipeline (PYTHON_QUARANTINE-compliant)

production: physarum05b_code_skeleton.planck   (BD6 pass-1, anchor 19/19)
anchor_positive.jsonl: 19 captured pass-1 outputs (unchanged from BD6.4)

bd6_6_mixed_train.jsonl  (525 → 670 rows, anchor share 53 → 63 %)
  holdout    × 50 = 4 × 50 = 200 rows  (MBPP/53, HE/34, /45, /85)
  he_other   × 25 = 3 × 25 =  75 rows  (HE/23, /27, /53)
  mbpp_long  × 25 = 2 × 25 =  50 rows  (MBPP/19, /90, threshold ≥150 char)
  mbpp_short × 10 = 10 × 10 = 100 rows
  → 425 anchor rows
+ poison_train.jsonl 245 (with refs, all)
= 670 rows total, 63.4 % anchor / 36.6 % poison

trainer: tools/surgery/train_code_skeleton_lora_bd6_5.py (unchanged)
  --rank 8 --alpha 16 --lr 3e-5 --epochs 1 --checkpoint-steps 50

trainable params = 1.08 M / 495 M = 0.22 %
loss curve (samples):
  step  50  loss 0.30
  step 100  loss 0.05  (anchor memorization fast under heavy replication)
  step 200  loss 0.06
  step 300  loss 0.09
  step 350  loss 0.98  (poison spike)
  step 400  loss 0.25
  step 500  loss 0.29
  step 600  loss 0.06
  step 650  loss 0.02
  step 670  final (epoch end)
  epoch avg 0.5085

merge → physarum05b_code_skeleton_v6.planck
flip → rebuild → anchor_eval

anchor gate (final adapter):
  ▼
  11 / 19 PASS  (rate 57.9 %)        organ_leaks=0
  ▼
  threshold 85 % → REJECT
  ▼
REVERT: PHYS05_PACK = physarum05b_code_skeleton.planck
REBUILD
VERIFY: anchor 19/19 ✅ production safe

Numbers across all five BD6.x passes

| pass | dataset shape | r | lr | ep | anchor share | anchor post-merge | gate | |-------------|----------------------------------------|----|------|----|-------------|--------------------|------------| | BD6 pass-1 | poison v1 (256 with refs) | 16 | 2e-4 | 3 | 0 % | 19/19 (defines anchor) | KEEP | | BD6.2 | union v1∪v2 (260) | 16 | 2e-4 | 4 | 0 % | not run | REVERT (post-bench MBPP regress) | | BD6.3 | fresh-only (245) | 8 | 1e-4 | 1 | 0 % | 0 / 19 | REVERT | | BD6.4 | fresh + 5× anchor (340) | 8 | 1e-4 | 1 | 28 % | 7 / 19 | REVERT | | BD6.5 | fresh + bench-aware-anchor (525) | 8 | 5e-5 | 1 | 53 % | 15 / 19 | REVERT (still < 19/19) | | BD6.6 | + holdout×50 + cap poison (670) | 8 | 3e-5 | 1 | 63 % | 11 / 19 | REVERT (over-anchor regression) |

Trajectory: 0 → 7 → 15 → 11. The curve bent backward at 63 % anchor share.

Per-row anchor result on v6 final

KEPT (11):
  MBPP/17, MBPP/19, MBPP/41, MBPP/51, MBPP/52,
  MBPP/64, MBPP/90, MBPP/96, MBPP/99, MBPP/105,
  HumanEval/23

LOST (8):
  MBPP/20        (66  chars, short)        — was KEPT in BD6.5, now LOST
  MBPP/53        (256 chars, holdout LONG) — still LOST (50× repl no help)
  MBPP/93        (52  chars, short)        — was KEPT in BD6.5, now LOST
  HumanEval/27   (267 chars, other_he 25×) — was KEPT in BD6.5, now LOST
  HumanEval/34   (182 chars, holdout LONG) — still LOST
  HumanEval/45   (249 chars, holdout LONG) — still LOST
  HumanEval/53   (101 chars, other_he 25×) — was KEPT in BD6.5, now LOST
  HumanEval/85   (366 chars, holdout LONG) — still LOST

Two failure modes co-occurred:

  1. Holdout-replication didn't unlock the holdouts. All 4 of MBPP/53,

HE/34, HE/45, HE/85 still failed despite 50× replication in the training mix. The LoRA could memorize their ChatML→def pattern at the token level (loss 0.05 on these batches by step 100), but at inference time the runtime's actual generation drifted away — same failure mode BD6.5 had on these exact 4 prompts. Replication does not fix what is fundamentally a length-vs-poison interaction.

  1. Mode collapse on short prompts. 4 short-target anchors that were

stable in BD6.5 (MBPP/20, MBPP/93, HE/27, HE/53) regressed in BD6.6. Heavy holdout replication pushed the LoRA into a long-pattern hot region; on short prompts the model now over-emits long-style content and trips the verifier.

The MBPP/19 (337 char, long) stayed kept — not because it's similar to the holdouts but because its target is structurally close to short MBPP. Length alone isn't the lever; token distribution similarity is.

What this proves

BD6.5 (53 %) was monotone good (7 → 15). BD6.5 (53 %) → BD6.6 (63 % with holdout-skewed) reverses (15 → 11). The data points trace a parabola; 53 % stratified is the peak for the replication-only lever.

holdouts each appeared 50 times in training; the LoRA still failed to reproduce them at inference. The signal needed is not more exposure — it's a different gradient shape.

worse than the BD6.5 attempt; the gate caught it cleanly; production reverted before any user-visible damage. Three strict-gate rejects in a row (BD6.3, .4, .5, .6) — and production stayed at 19/19 the whole time.

What's actually needed for BD6.7

The replication knob is exhausted. Two real levers remain — both were identified in the BD6.5 report (Lever D, Lever E):

Lever D — token-weighted loss (correctness fix at training time)

Currently each row contributes equal cross-entropy. Long targets contribute 2-3× more tokens, so their per-row loss looks bigger to the optimizer; the LoRA "tries hard" on them in early steps then drifts under continued poison gradient. Fix: scale per-row loss by 1 / sqrt(target_token_count) so long targets don't dominate the gradient. This is what the loss curve mathematically asks for. ~10 lines in trainer's loss step. No data-shape change. Use BD6.5 mix (53 % anchor) — it was the peak.

Lever E — KL anchor on the 4 holdouts only

Run inference on [MBPP/53, HE/34, /45, /85] against the frozen pass-1 base, capture top-k logits per token, add a KL term to the LoRA training step that pulls the student toward those reference logits on those four prompts. Canonical anti-forgetting fix; principled. ~30 lines extra; PEFT pattern.

Recommended order: D first (cheap, well-targeted, matches the diagnostic). If D leaves any holdout, E.

Do NOT increase replication further. That door is closed.

Production state (after BD6.6 revert)

Files this pass touched

Reading the trajectory

0 % anchor     → 0/19   (BD6.3, lr=1e-4)
28 % anchor    → 7/19   (BD6.4, lr=1e-4, 5× repl)
53 % anchor    → 15/19  (BD6.5, lr=5e-5, bench-aware repl, stratified)
63 % anchor    → 11/19  (BD6.6, lr=3e-5, holdout×50, capped poison)

The curve says: stay at 53 %, change the loss function, not the data weighting. BD6.7 = train_code_skeleton_lora_bd6_5.py + 1 small loss modification, same dataset shape as BD6.5. No new experiments before that change is in place.