CyberdyneLabs · Reports · BD9_JSON_REPAIR_FINAL

BD9 — phys05_json_repair surgery, **10/10 GREEN** (2026-05-04)

reports/BD9_JSON_REPAIR_FINAL.md 736 words raw markdown ↗

BD9 — phys05_json_repair surgery, 10/10 GREEN (2026-05-04)

TL;DR — phys05_json_repair was a placeholder organ riding on the default base 0.5B pack with no LoRA. BD9 forged a 280-row synthetic training corpus covering the 10 broken-JSON shapes the runtime actually sees, trained 6-epoch LoRA (loss 0.055 → 0.0003), merged + repacked, flipped pack, and the organ now repairs all 10 production failure modes end-to-end via --organ-probe smoke. First organ under the doctrine with a clean 100% on a real failure-catalog bench.

Headline

10 broken-JSON cases × phys05_json_repair (BD9 v1 pack):

case                       result   organ output
--------------------------------------------------------------------------
missing_close_brace        OK       {"a":1,"b":2}
trailing_comma             OK       {"a":1,"b":2}
unquoted_keys              OK       {"name": "alice", "age": 30}
single_quoted              OK       {"food": 1, "poison": 0}
missing_colon              OK       {"a": 1, "b": 2}
comma_as_colon             OK       {"key": "value", "k2": 5}
markdown_fence             OK       {"x":1,"y":2}
leading_prose              OK       {"x":1,"y":2}
trailing_prose             OK       {"x":1,"y":2}
missing_close_bracket      OK       {"items":[1,2,3]}

== 10/10 pass

comma_as_colon is exactly the wound v2 quirk that BD8 had to repair with a Python-side shim. The fresh json_repair organ handles it natively.

Pipeline

  1. Forge synthetic datatools/surgery/build_json_repair_dataset.py

takes a 30-seed pool of realistic JSON shapes and mutates each through 11 break-classes from the runtime failure catalog. Output: 280 rows in data/organ_surgery/phys05_json_repair/json_repair_train_v1.jsonl.

  1. Train LoRAtools/surgery/train_json_repair_lora.py (clone of

BD7 trainer adapted for the json_repair prompt template). r=8 α=16 lr=5e-5 6 epochs, max-len=1024.

`` epoch 0 avg_loss=0.0547 epoch 1 avg_loss=0.0082 epoch 2 avg_loss=0.0040 epoch 3 avg_loss=0.0019 epoch 4 avg_loss=0.0005 epoch 5 avg_loss=0.0003 ``

  1. Merge + repackmerge_code_skeleton_lora.py (reused script).

Output: physarum05b_json_repair.planck (988 MB BF16).

  1. Wiresrc/organs/organ_manager.cpp:
  1. Smoke — 10 hand-curated broken JSONs from the runtime catalog

probed via ./build/gigachad_native --organ-probe phys05_json_repair. All 10 returned valid JSON with the correct repair.

Why this trained without overfitting (vs BD7.3)

BD7.3 (TRIZ 9-epoch) regressed because:

specific token sequences in long fields.

BD9 (json_repair 6-epoch) didn't regress because:

pattern), so the LoRA learns transformations, not specific tokens.

No single sequence dominates the gradient.

The same 6-epoch sweet-spot from BD7 applies; the data shape is what made loss-near-zero safe to ship.

Production state (after BD9)

PHYS05_PACK              physarum05b_code_skeleton.planck       MBPP B 13/100 · HE B 6/164 · LCB 0/50 · anchor 19/19
PHYS05_TRIZ_PACK         physarum05b_triz_contradiction_v2      ARIZ 88/100 strict · fb=0
PHYS05_CRITIC_PACK       physarum05b_critic_lite_v2.planck      out of ARIZ rescue path
PHYS05_WOUND_PACK        physarum05b_wound_v2.planck            live in --chat ARIZ rescue (BD8 V9 + TRACK 2 C++ port)
PHYS05_JSON_REPAIR_PACK  physarum05b_json_repair.planck         NEW · 10/10 on broken-JSON catalog
PHYS7B_PACK              physarium7b_identity.q4planck          Q4 · 11.16 tok/s · identity 14/14
acceptance bench         18/18 · leaks 0

Files this surgery produced

tools/surgery/build_json_repair_dataset.py                     280-row forge
tools/surgery/train_json_repair_lora.py                        BD9 trainer
tools/surgery/output/json_repair_lora_v1/                      PEFT adapter
tools/surgery/output/Physarum05B-JsonRepair/                   merged BF16 HF dir
data/organ_surgery/phys05_json_repair/json_repair_train_v1.jsonl
physarum05b_json_repair.planck                                 988 MB pack
src/organs/organ_manager.cpp                                   PHYS05_JSON_REPAIR_PACK + spec overrides
reports/BD9_JSON_REPAIR_FINAL.md                               this file

Engineering takeaways

  1. Synthetic mutation forge beats sparse real poison — 280 forged rows

from a 30-seed pool covered 10/10 production failure modes. Real DAG poison had only 1 row; would have learned nothing useful.

  1. Mechanical-repair tasks tolerate 6 epochs without overtrain. The

BD7.3 trap was specific to long prose targets, not short structured transformations.

  1. The wound v2's "key", "value" quirk is a json_repair concern.

Now that json_repair handles it, the wound rescue chain in run_chat_ariz_organ_first could optionally invoke json_repair as a final cleanup step — queued as BD9.1.

  1. Same 5-step template (forge → train → merge → flip → smoke) is

now proven across phys05_code_skeleton (BD6), phys05_triz_contradiction (BD7), critic_lite + wound (BD8), and phys05_json_repair (BD9). It is the project's universal organ-surgery loop.

Queued next (per same template)

Each is the same forge → 6-epoch QLoRA → merge → flip → smoke loop.