BD9 — phys05_json_repair surgery, 10/10 GREEN (2026-05-04)
TL;DR — phys05_json_repair was a placeholder organ riding on the default base 0.5B pack with no LoRA. BD9 forged a 280-row synthetic training corpus covering the 10 broken-JSON shapes the runtime actually sees, trained 6-epoch LoRA (loss 0.055 → 0.0003), merged + repacked, flipped pack, and the organ now repairs all 10 production failure modes end-to-end via --organ-probe smoke. First organ under the doctrine with a clean 100% on a real failure-catalog bench.
Headline
10 broken-JSON cases × phys05_json_repair (BD9 v1 pack):
case result organ output
--------------------------------------------------------------------------
missing_close_brace OK {"a":1,"b":2}
trailing_comma OK {"a":1,"b":2}
unquoted_keys OK {"name": "alice", "age": 30}
single_quoted OK {"food": 1, "poison": 0}
missing_colon OK {"a": 1, "b": 2}
comma_as_colon OK {"key": "value", "k2": 5}
markdown_fence OK {"x":1,"y":2}
leading_prose OK {"x":1,"y":2}
trailing_prose OK {"x":1,"y":2}
missing_close_bracket OK {"items":[1,2,3]}
== 10/10 pass
comma_as_colon is exactly the wound v2 quirk that BD8 had to repair with a Python-side shim. The fresh json_repair organ handles it natively.
Pipeline
- Forge synthetic data —
tools/surgery/build_json_repair_dataset.py
takes a 30-seed pool of realistic JSON shapes and mutates each through 11 break-classes from the runtime failure catalog. Output: 280 rows in data/organ_surgery/phys05_json_repair/json_repair_train_v1.jsonl.
- Train LoRA —
tools/surgery/train_json_repair_lora.py(clone of
BD7 trainer adapted for the json_repair prompt template). r=8 α=16 lr=5e-5 6 epochs, max-len=1024.
`` epoch 0 avg_loss=0.0547 epoch 1 avg_loss=0.0082 epoch 2 avg_loss=0.0040 epoch 3 avg_loss=0.0019 epoch 4 avg_loss=0.0005 epoch 5 avg_loss=0.0003 ``
- Merge + repack —
merge_code_skeleton_lora.py(reused script).
Output: physarum05b_json_repair.planck (988 MB BF16).
- Wire —
src/organs/organ_manager.cpp:
- new constant
PHYS05_JSON_REPAIR_PACK specs_["phys05_json_repair"].pack_path = PHYS05_JSON_REPAIR_PACKmax_tokens 96 → 256(repaired JSON often >100 tokens)repetition_penalty 1.03,cuda_repetition_penalty 1.00json_output = true
- Smoke — 10 hand-curated broken JSONs from the runtime catalog
probed via ./build/gigachad_native --organ-probe phys05_json_repair. All 10 returned valid JSON with the correct repair.
Why this trained without overfitting (vs BD7.3)
BD7.3 (TRIZ 9-epoch) regressed because:
- 80 training rows × 9 epochs of free-form prose JSON → memorization of
specific token sequences in long fields.
- Loss kept descending past the generalization point.
BD9 (json_repair 6-epoch) didn't regress because:
- 280 training rows × 6 epochs (more data, less time per row).
- Each row is short and structurally regular (broken pattern → fixed
pattern), so the LoRA learns transformations, not specific tokens.
- 30 distinct seeds × 11 mutation classes = 330 unique signal sources.
No single sequence dominates the gradient.
The same 6-epoch sweet-spot from BD7 applies; the data shape is what made loss-near-zero safe to ship.
Production state (after BD9)
PHYS05_PACK physarum05b_code_skeleton.planck MBPP B 13/100 · HE B 6/164 · LCB 0/50 · anchor 19/19
PHYS05_TRIZ_PACK physarum05b_triz_contradiction_v2 ARIZ 88/100 strict · fb=0
PHYS05_CRITIC_PACK physarum05b_critic_lite_v2.planck out of ARIZ rescue path
PHYS05_WOUND_PACK physarum05b_wound_v2.planck live in --chat ARIZ rescue (BD8 V9 + TRACK 2 C++ port)
PHYS05_JSON_REPAIR_PACK physarum05b_json_repair.planck NEW · 10/10 on broken-JSON catalog
PHYS7B_PACK physarium7b_identity.q4planck Q4 · 11.16 tok/s · identity 14/14
acceptance bench 18/18 · leaks 0
Files this surgery produced
tools/surgery/build_json_repair_dataset.py 280-row forge
tools/surgery/train_json_repair_lora.py BD9 trainer
tools/surgery/output/json_repair_lora_v1/ PEFT adapter
tools/surgery/output/Physarum05B-JsonRepair/ merged BF16 HF dir
data/organ_surgery/phys05_json_repair/json_repair_train_v1.jsonl
physarum05b_json_repair.planck 988 MB pack
src/organs/organ_manager.cpp PHYS05_JSON_REPAIR_PACK + spec overrides
reports/BD9_JSON_REPAIR_FINAL.md this file
Engineering takeaways
- Synthetic mutation forge beats sparse real poison — 280 forged rows
from a 30-seed pool covered 10/10 production failure modes. Real DAG poison had only 1 row; would have learned nothing useful.
- Mechanical-repair tasks tolerate 6 epochs without overtrain. The
BD7.3 trap was specific to long prose targets, not short structured transformations.
- The wound v2's
"key", "value"quirk is a json_repair concern.
Now that json_repair handles it, the wound rescue chain in run_chat_ariz_organ_first could optionally invoke json_repair as a final cleanup step — queued as BD9.1.
- Same 5-step template (forge → train → merge → flip → smoke) is
now proven across phys05_code_skeleton (BD6), phys05_triz_contradiction (BD7), critic_lite + wound (BD8), and phys05_json_repair (BD9). It is the project's universal organ-surgery loop.
Queued next (per same template)
- phys05_renderer — needs (broken_render → fixed_render) pairs
- phys05_test_writer — needs (function → unit_test) pairs
- phys05_cache_matcher — needs (input_hash → cache_decision) pairs
- phys05_claim_extractor — needs (text → structured_claims) pairs
- BD8.2 wound v3 — re-train wound on broader quirk catalog
- BD8.1 critic_lite v3 — re-train critic on ARIZ schema failures (not stderr)
Each is the same forge → 6-epoch QLoRA → merge → flip → smoke loop.