CyberdyneLabs · Reports · BD6_POST_SURGERY_DELTA

BD6 — phys05_code_skeleton surgery delta vs frozen baseline

reports/BD6_POST_SURGERY_DELTA.md 907 words raw markdown ↗

BD6 — phys05_code_skeleton surgery delta vs frozen baseline

_Last verified: 2026-05-03 (data: MBPP_HE @ 2026-05-01 22:26, LCB @ 2026-05-01 20:23; baseline frozen in BENCH_CLEANUP_AND_OFFICIAL_RUN.md)._

The first organ-only surgery (BD6 pass-1) measured against the frozen BENCH_CLEANUP_AND_OFFICIAL_RUN baseline. BD6.2 follow-up overtrained and was reverted — current production pack is BD6 pass-1.


Pipeline (BD6 pass-1)

  1. Poison harvesttools/surgery/bench_to_poison_dataset.py reads Mode-B failure rows from reports/MBPP_HE_3MODE_V1.json and reports/LIVECODEBENCH_3MODE_V1.json, joins with official MBPP / HumanEval / LCB reference solutions. Result: 306 poison rows, 256 with reference targets (LCB ships no canonical solutions). Saved to data/organ_surgery/phys05_code_skeleton/poison_train.jsonl.
  1. QLoRA trainingtools/surgery/train_code_skeleton_lora.py. Base = /home/pc/gigachad/qwen05/physarum (BF16 0.5B HF dir). r=16, α=32, lr=2e-4, 3 epochs, batch=1, max-len=1024, target_modules=q/k/v/o_proj. 256 rows × 3 epochs = 768 steps. Avg loss: 0.44 → 0.30 → 0.21. Trainable params 2.16 M / 496 M (0.44 %).
  1. Merge + repacktools/surgery/merge_code_skeleton_lora.py. PEFT merge_and_unload → BF16 HF Physarum05B-CodeSkeleton/build/planck7b_tool buildphysarum05b_code_skeleton.planck (988 MB BF16, same shape as baseline pack).
  1. Pack flip + rebuildsrc/organs/organ_manager.cpp:31 PHYS05_PACK retargeted; make -j4 rebuilds gigachad_native.
  1. Mode-B re-runmbpp_he_3mode.py --modes B and livecodebench_3mode.py --modes B. Same prompts, same harness, same NO_7B_FALLBACK gate.

Headline delta (current production pack vs frozen baseline)

| bench | n | baseline B | post-surgery B | Δ abs | Δ rel | wall before | wall after | |---------------|-----|------------|----------------|-------|----------|-------------|------------| | MBPP | 100 | 6/100 | 13/100 | +7 | +117 % | 550 s | 286.6 s | | HumanEval | 164 | 2/164 | 6/164 | +4 | +200 % | 1083 s | 570.4 s | | LiveCodeBench | 50 | 0/50 | 0/50 | 0 | 0 % | 7 s | 431.4 s |

MBPP doubled. HumanEval tripled. LCB unchanged (LCB prompts route through ARIZ → unsupported under NO_7B_FALLBACK=1; surgery doesn't touch that lane — confirmed by organs=phys05_triz_contradiction rather than phys05_code_skeleton in LCB-B logs).

organs_used_set for both improved benches: {phys05_code_skeleton} only. fallback_count for both: 0 — no 7B leaked.

bd_signal_count (rows that wrote DAG envelope with food/poison/conductance): MBPP 88, HumanEval 117, LCB 32 — the BD3 poison stream for the next surgery pass is alive.


BD6.2 cycle summary (snapshot retained, pack reverted)

A second pass tried to ride the same recipe on the post-BD6 failures (310 union rows, 4 epochs):

| bench | baseline B | BD6.2 snapshot | vs BD6 pass-1 | |---------------|------------|----------------|---------------| | MBPP | 6/100 | 6/100 | −7 (regress) | | HumanEval | 2/164 | 8/164 | +2 (modest) | | LCB | 0/50 | 0/50 | 0 |

Net: net regression on the bench we cared about most (MBPP). Pack reverted to BD6 pass-1; snapshot retained at reports/MBPP_HE_3MODE_V1_bd6_2_snapshot.json for autopsy. Full write-up in reports/BD6_2_OVERTRAIN_DELTA.md.


TASK 5 constraints check (current production = BD6 pass-1)

| constraint | status | |---------------------------------------------|--------------| | 0.5B organs used | ✅ phys05_code_skeleton only | | BD written | ✅ MBPP-B 88, HE-B 117, LCB-B 32 envelopes carry food/poison/conductance | | no route falls through wrong handler | ✅ route=code_fast for in-lane MBPP/HE prompts | | no json_repair → ariz_e2e | ✅ unchanged from TASK 1 | | benchmark not hand-made easy subset | ✅ MBPP n=100 official, HumanEval n=164 full official, LCB official easy n=50 | | fallback_count visible | ✅ 0/0/0 | | B mode not skipped | ✅ ran on all 314 prompts |


What this proves


Targets (per user spec)

| target | hit? | |-------------------------------------|-------| | MBPP B-mode 6/100 → 25/100 first | partial: 13/100 (+7 of +19 needed). | | HumanEval B-mode 2/164 → 20/164 first | partial: 6/164 (+4 of +18 needed). | | LCB B-mode 0/50 → 5/50 first | not hit: 0/50 — dispatcher routing fix is prerequisite, not more surgery. |


Files written / changed (BD6 pass-1)