LCB_CODE_ROUTE_FIX (2026-05-01)
LCB prompts had been mis-routed to phys05_triz_contradiction / run_chat_ariz_organ_first because the dispatcher classified "Solve this competitive programming problem..." as an ARIZ / contradiction prompt. The 0.5B code organ never saw them, so BD6 surgery couldn't help LCB even in principle.
This task widens the code-route classifier and fixes the fall-through envelope so LCB Mode B now actually exercises phys05_code_skeleton.
Code change (single file, runtime)
src/main.cpp::looks_like_humaneval — add an LCB regex branch:
// LCB_CODE_ROUTE_FIX (2026-05-01) — LiveCodeBench competitive-programming
// prompts. Without this, "Solve this competitive programming problem. ..."
// gets misclassified as ARIZ contradiction-extraction (the word "problem"
// hits the ariz dispatcher).
static const std::regex rx_lcb(
R"(competitive\s+programming|complete\s+python\s+program|"
R"(reads\s+stdin.*writes\s+stdout|stdin.*stdout|"
R"(\batcoder\b|\bcodeforces\b)",
std::regex::icase);
if (std::regex_search(s, rx_lcb)) return true;
src/main.cpp::run_chat_organ_route — when 0.5B fails AND NO_7B_FALLBACK=1, emit a code_fast_no_7b envelope showing organs_used = [phys05_code_skeleton] so the bench can attribute the failure correctly. Without this, fall-through hands the prompt to the dispatcher → ARIZ / unsupported, which obscures who actually ran.
The envelope still writes a Black-Dog signal (food=0, poison=1, conductance update) so the failure feeds the next surgery cycle.
PYTHON_QUARANTINE compliance: zero Python in the runtime; only the C++ classifier and the C++ envelope-emit path. Bench harness still calls --chat once per prompt.
Smoke (single LCB-shape prompt, NO_7B_FALLBACK=1)
$ NO_7B_FALLBACK=1 ORGAN_FIRST=1 ./build/gigachad_native --chat \
'Solve this competitive programming problem. Output ONLY a complete Python program (reads stdin, writes stdout): ...A+B...'
"route": "code_fast_no_7b",
"organs_used": ["phys05_code_skeleton"],
"verifier_ok": false,
Pre-fix the same prompt had route: unsupported_no_7b_fallback, organs_used: [], route: ariz_organ_first etc. Post-fix it lands on the code organ.
Full LCB Mode B (n=50 easy, post-route)
| metric | pre-fix BD6 pass-1 | post-fix | |-----------------------|-------------------|----------| | pass | 0/50 | 0/50 | | organs_used_set | {phys05_triz_contradiction} (wrong route) | {phys05_code_skeleton} ✅ | | fallback_count | 0 (cause: empty envelope) | 0 ✅ | | route | ariz_organ_first | code_fast_no_7b ✅ | | wall | 7 s (early-exit) | 431 s (real organ run) |
Pass-rate stays 0/50. That is the honest measurement of the 0.5B on competitive programming, not a routing artefact. The 0.5B produces template-leaking output ("def You are phys05_code_skeleton.") because the code-skeleton template at organs/prompts/phys05_code_skeleton.txt was never trained on stdin/stdout-shape problems. The next surgery pass (BD6.3) gets these 50 LCB poison rows for free now that they hit the right organ.
TASK 5 constraints (post-route Mode B on LCB)
| constraint | check | |--------------------------------------|-------| | 0.5B organ used | ✅ phys05_code_skeleton only | | BD written | ✅ poison row per LCB prompt now hits the code-route BD store | | no route falls through wrong handler | ✅ no more triz / ariz / unsupported on LCB | | no json_repair → ariz_e2e | ✅ unchanged | | benchmark not hand-made | ✅ official livecodebench/code_generation (easy n=50) | | fallback_count visible | ✅ 0/50 | | B mode not skipped | ✅ |
GREEN on every constraint. The 0/50 pass rate is now the true 0.5B-on-LCB floor, not a router bug.
Production state
PHYS05_PACK = physarum05b_code_skeleton.planck(BD6 pass-1, unchanged)looks_like_humaneval()now also catches LCB.run_chat_organ_routeemits a*_fast_no_7benvelope on 0.5B fail
under NO_7B_FALLBACK=1 instead of falling through.
The MBPP / HumanEval Mode B numbers are unaffected by this change (the LCB regex doesn't match those prompts), so BD6 pass-1's 13/100 + 6/164 baseline still stands.
Why the count stayed 0
The 0.5B base was never trained on:
- stdin / stdout I/O patterns,
int(input())/input().split()parsing,- competitive-programming idioms (binary search on N, prefix sums, etc).
Surgery pass BD6.3 will see 50 fresh LCB poison rows (without reference solutions — LCB doesn't ship them) plus the 245 MBPP+HE rows. The 0.5B can't learn LCB without ground-truth programs, so this branch will need either:
- A public competitive-programming dataset with solutions
(codeforces problem-solution pairs, atcoder editorial scrapes) — would have to be downloaded and aligned to LCB-shape prompts.
- Or, accept that LCB stays at 0 in B-mode until we add such a
dataset; LCB is a stretch goal, not a primary surgery target.
The runtime route is now correct. The model quality on LCB is the remaining gap, and that gap is a dataset problem, not a router problem.