CyberdyneLabs · Reports · LCB_CODE_ROUTE_FIX

LCB_CODE_ROUTE_FIX (2026-05-01)

reports/LCB_CODE_ROUTE_FIX.md 705 words raw markdown ↗

LCB_CODE_ROUTE_FIX (2026-05-01)

LCB prompts had been mis-routed to phys05_triz_contradiction / run_chat_ariz_organ_first because the dispatcher classified "Solve this competitive programming problem..." as an ARIZ / contradiction prompt. The 0.5B code organ never saw them, so BD6 surgery couldn't help LCB even in principle.

This task widens the code-route classifier and fixes the fall-through envelope so LCB Mode B now actually exercises phys05_code_skeleton.


Code change (single file, runtime)

src/main.cpp::looks_like_humaneval — add an LCB regex branch:

// LCB_CODE_ROUTE_FIX (2026-05-01) — LiveCodeBench competitive-programming
// prompts. Without this, "Solve this competitive programming problem. ..."
// gets misclassified as ARIZ contradiction-extraction (the word "problem"
// hits the ariz dispatcher).
static const std::regex rx_lcb(
    R"(competitive\s+programming|complete\s+python\s+program|"
    R"(reads\s+stdin.*writes\s+stdout|stdin.*stdout|"
    R"(\batcoder\b|\bcodeforces\b)",
    std::regex::icase);
if (std::regex_search(s, rx_lcb)) return true;

src/main.cpp::run_chat_organ_route — when 0.5B fails AND NO_7B_FALLBACK=1, emit a code_fast_no_7b envelope showing organs_used = [phys05_code_skeleton] so the bench can attribute the failure correctly. Without this, fall-through hands the prompt to the dispatcher → ARIZ / unsupported, which obscures who actually ran.

The envelope still writes a Black-Dog signal (food=0, poison=1, conductance update) so the failure feeds the next surgery cycle.

PYTHON_QUARANTINE compliance: zero Python in the runtime; only the C++ classifier and the C++ envelope-emit path. Bench harness still calls --chat once per prompt.

Smoke (single LCB-shape prompt, NO_7B_FALLBACK=1)

$ NO_7B_FALLBACK=1 ORGAN_FIRST=1 ./build/gigachad_native --chat \
  'Solve this competitive programming problem. Output ONLY a complete Python program (reads stdin, writes stdout): ...A+B...'

  "route": "code_fast_no_7b",
  "organs_used": ["phys05_code_skeleton"],
  "verifier_ok": false,

Pre-fix the same prompt had route: unsupported_no_7b_fallback, organs_used: [], route: ariz_organ_first etc. Post-fix it lands on the code organ.

Full LCB Mode B (n=50 easy, post-route)

| metric | pre-fix BD6 pass-1 | post-fix | |-----------------------|-------------------|----------| | pass | 0/50 | 0/50 | | organs_used_set | {phys05_triz_contradiction} (wrong route) | {phys05_code_skeleton} ✅ | | fallback_count | 0 (cause: empty envelope) | 0 ✅ | | route | ariz_organ_first | code_fast_no_7b ✅ | | wall | 7 s (early-exit) | 431 s (real organ run) |

Pass-rate stays 0/50. That is the honest measurement of the 0.5B on competitive programming, not a routing artefact. The 0.5B produces template-leaking output ("def You are phys05_code_skeleton.") because the code-skeleton template at organs/prompts/phys05_code_skeleton.txt was never trained on stdin/stdout-shape problems. The next surgery pass (BD6.3) gets these 50 LCB poison rows for free now that they hit the right organ.

TASK 5 constraints (post-route Mode B on LCB)

| constraint | check | |--------------------------------------|-------| | 0.5B organ used | ✅ phys05_code_skeleton only | | BD written | ✅ poison row per LCB prompt now hits the code-route BD store | | no route falls through wrong handler | ✅ no more triz / ariz / unsupported on LCB | | no json_repair → ariz_e2e | ✅ unchanged | | benchmark not hand-made | ✅ official livecodebench/code_generation (easy n=50) | | fallback_count visible | ✅ 0/50 | | B mode not skipped | ✅ |

GREEN on every constraint. The 0/50 pass rate is now the true 0.5B-on-LCB floor, not a router bug.

Production state

under NO_7B_FALLBACK=1 instead of falling through.

The MBPP / HumanEval Mode B numbers are unaffected by this change (the LCB regex doesn't match those prompts), so BD6 pass-1's 13/100 + 6/164 baseline still stands.

Why the count stayed 0

The 0.5B base was never trained on:

Surgery pass BD6.3 will see 50 fresh LCB poison rows (without reference solutions — LCB doesn't ship them) plus the 245 MBPP+HE rows. The 0.5B can't learn LCB without ground-truth programs, so this branch will need either:

  1. A public competitive-programming dataset with solutions

(codeforces problem-solution pairs, atcoder editorial scrapes) — would have to be downloaded and aligned to LCB-shape prompts.

  1. Or, accept that LCB stays at 0 in B-mode until we add such a

dataset; LCB is a stretch goal, not a primary surgery target.

The runtime route is now correct. The model quality on LCB is the remaining gap, and that gap is a dataset problem, not a router problem.