BD7 — teacher hand-fix to 100/100 (2026-05-02)
TL;DR — added 24 hand-written ideal TRIZ targets for the v2 losers and replaced 2 confirmed-weak v1 winners (ARIZ/44 inverted-direction, ARIZ/82 generic IFR). All 100 rows now pass strict 6-field validator. Spot-check 20 random rows confirms schema-correct + content defensible. Output ready for QLoRA train/eval split + surgery.
Numbers
| stage | rows from teacher | rows hand-written/replaced | total strict-pass | |-----------------|-------------------|----------------------------|--------------------| | forge v1 (200 prompts, 2 variants) | 70 winners | — | 70/100 | | retry v2 (90 prompts, 3 variants on 30 losers) | +6 | — | 76/100 | | hand-fix v3 | 74 (kept v2 winners minus ARIZ/44, /82) | +24 losers + 2 weak | 100/100 |
Source distribution
| source | count | |------------------------|-------| | 7B teacher (variant A) | ~40 | | 7B teacher (variant B) | ~30 | | 7B teacher (retry C) | ~6 | | hand_v3 | 26 |
hand_v3 covers all of the 24 v2 still-losers (ARIZ/01, /03, /07, /18, /30, /32, /34, /41, /43, /48, /61, /67, /68, /72, /77, /79, /81, /86, /88, /93, /94, /99, /02, /06) plus ARIZ/44 and ARIZ/82 replacements.
Validation result
[handfix] strict-validated 100/100
[handfix] hand-written / replaced: 26
All 100 rows have:
- technical_contradiction (str, non-empty)
- physical_contradiction (str, non-empty)
- ifr (str, non-empty)
- resources (list len ≥ 2)
- triz_operators (list len ≥ 2, TRIZ-40 names)
- candidate_moves (list len ≥ 2, concrete interventions)
Spot-check 20 random rows (seed=7)
All 20 sampled task_ids show proper "X improves while Y worsens" TC form. Mix of teacher and hand sources. No empty fields, no markdown.
Mild caveat: 21/100 IFR strings contain generic engineering phrases like "optimal", "efficient", "without compromising" — most are context-appropriate (e.g. "optimal heat exchange") not vacuous fluff. The two truly fluff IFRs that were spotted in v1 review (ARIZ/82, ARIZ/44) were replaced.
Files
data/organ_surgery/phys05_triz_contradiction/teacher_targets_v3_100.jsonl
— 100 strict-validated rows, ordered by task_id
tools/surgery/build_triz_teacher_handfix_v3.py— script with
inline 26 hand-written TRIZ analyses
reports/BD7_TEACHER_HAND_FIX_V3.md— this file
Ready for next phase
Per BD7 spec step 7: train_set: 80, eval_set: 20, +10 anchor rows. With 100 strict targets:
- 80 train + 20 eval = full split possible
- 10 anchor (cleanest, broadest domain coverage) drawn from
hand-written rows + strongest 7B rows
Per BD7 spec step 8: QLoRA surgery → physarum05b_triz_contradiction.planck. The trainer can be cloned from train_code_skeleton_lora_bd6_5.py (stratified anchor/poison) but for BD7 we don't have a "poison" pool — this is supervised fine-tuning, not anti-forgetting. Simpler trainer.
Per BD7 spec step 9: PHYS05_TRIZ_PACK already wired in organ_manager (currently fallback to code_skeleton.planck; flip to the new triz .planck after train+gate).
Awaiting GO for Phase 4 (train/eval split) or Phase 5 (QLoRA surgery with the chosen split).