CyberdyneLabs · Reports · BD6_3_ANCHOR_GATE_FAILED

BD6.3 — anchor gate FAILED (2026-05-01)

reports/BD6_3_ANCHOR_GATE_FAILED.md 913 words raw markdown ↗

BD6.3 — anchor gate FAILED (2026-05-01)

TL;DR — BD6.3 trained safely (r=8, lr=1e-4, 1 epoch, fresh poison only — exactly the user's spec) but the anti-forgetting anchor gate came back 0/19. Even the smallest safe training pass on this poison set destroys all BD6 pass-1 wins. Pack reverted to BD6 pass-1. Surgery on this organ + this data shape has hit a hard wall — the fix is not "tune hyperparams more" but "different data shape."


Pipeline (PYTHON_QUARANTINE-compliant)

fresh post-BD6 failures (245 rows w/ refs, 87 MBPP + 158 HE)
  → tools/surgery/train_code_skeleton_lora.py     [Python, GPU]
    --rank 8 --alpha 16 --lr 1e-4 --epochs 1
    avg_loss epoch 0 = 0.4484, trainable params 1.08 M / 495 M (0.22 %)
  → tools/surgery/output/code_skeleton_lora_v3/    [PEFT adapter]
    → tools/surgery/merge_code_skeleton_lora.py    [Python, merge + planck7b_tool]
      → physarum05b_code_skeleton_v3.planck        [BF16 pack]
        → src/organs/organ_manager.cpp PHYS05_PACK retargeted
          → make -j4
            → tools/surgery/anchor_eval.py         [GATE]
              19 anchor prompts (BD6 pass-1 wins)
              ▼
              0/19 PASS — REGRESSION
              ▼
              REVERT: PHYS05_PACK = physarum05b_code_skeleton.planck
              REBUILD
              VERIFY: 19/19 anchor pass

The capsule never escaped. PYTHON_QUARANTINE held: the runtime is back on the v1 pack with no Python in the request path.


Settings vs. previous attempts

| param | BD6 pass-1 (kept) | BD6.2 (reverted) | BD6.3 (reverted) | |----------------|--------------------|------------------|-------------------| | dataset | poison v1 (256 rows w/ refs) | poison v1∪v2 (260 rows) | fresh-only (245 rows) | | rank | 16 | 16 | 8 | | alpha | 32 | 32 | 16 | | lr | 2e-4 | 2e-4 | 1e-4 | | epochs | 3 | 4 | 1 | | trainable % | 0.44 | 0.44 | 0.22 | | final loss | 0.21 | 0.18 | 0.45 | | anchor 19 | (defines anchor) | not run pre-merge | 0 / 19 ← gate failed | | MBPP B | 13 | 6 | not measured | | HumanEval B | 6 | 8 | not measured | | status | PRODUCTION | archived | archived |

User's BD6.3 spec was followed exactly (poison-only, anchor eval, lr 1e-4, r=8, 1 epoch, no merge if pass-1 wins regress). Gate fired correctly. Surgery did not ship.

Anchor list (19 BD6 pass-1 wins)

MBPP/17, MBPP/19, MBPP/20, MBPP/41, MBPP/51, MBPP/52, MBPP/53,
MBPP/64, MBPP/90, MBPP/93, MBPP/96, MBPP/99, MBPP/105
HumanEval/23, HumanEval/27, HumanEval/34, HumanEval/45,
HumanEval/53, HumanEval/85

These are derived from data/organ_surgery/phys05_code_skeleton/anchor_eval.jsonl as (official_test_split) \ (current Mode-B failures).

Pre-merge against BD6 pass-1 pack: 19/19 pass. Post-merge against BD6.3 v3 pack: 0/19 pass. Post-revert against BD6 pass-1 pack: 19/19 pass (production safe).

Why even this pass overfit

The diagnosis from BD6.2 was "union dataset + 4 epochs over-specialized." The fix was "fresh-only + 1 epoch + smaller adapter." That fix was correct — the model still lost the wins. So the data shape itself is the problem, not the training schedule:

reference solutions."

list comprehensions over enumerated tuples, multi-clause early returns) that are different from the simple-return n*n-style the 0.5B nailed pre-surgery.

output distribution toward those harder idioms — and the easier prompts now also get the harder idioms, which over-engineer simple cases and break verifier match.

In other words: the wins were 13 specific easy prompts the base 0.5B could already solve cleanly. Any training on harder material shifts that distribution. There is no safe "add hard wins without losing easy wins" with this dataset alone.

What's needed before BD6.4

Two paths, neither is hyperparameter tuning:

Path A — anchor-positive training (mixed-data SFT)

Train on both poison rows AND a same-size set of pass-1 win "keep working" rows. Each epoch sees an equal mix. The model must keep producing the easy outputs while learning the hard ones.

Concretely: write each anchor row as {task: <prompt>, repair_target: <the function the 0.5B already produces correctly>}. Ground truth = current 0.5B output, captured before training. Then poison + anchor form a balanced curriculum.

This is what proper supervised fine-tuning of small adapters needs; poison-only is a known footgun.

Path B — KL-anchored LoRA / regularization

Add a KL-divergence term against the frozen base on a subset of prompts, so the LoRA can't drift far on those. PEFT supports this via reference-model loss; needs ~30 lines extra in the trainer.

Both A and B are dataset / loss-function problems, not "more epochs" or "different lr". BD6.3 closes the easy-knob territory.

Production state (after BD6.3 revert)

Files this pass touched

The runtime is back on the production pack. The gate did its job. The surgery recipe needs anchor-positive data before BD6.4.