# X100_SCOREBOARD — sprint progress (one truth file)

Single file. New numbers only — no philosophy, no tied placeholders.
Every axis updated when its proving bench produces a score.

**RULE (rebased 2026-04-30):** `sha256(input)` exact-input cache hits are
**utility hygiene**, NOT a strategic x100 advantage. Real x100 only counts
when input is non-identical, system extracts a form pattern + delta
params, replays an action template, verifier passes, and the model is
NOT called.

| Axis                     | current               | target                | status |
|--------------------------|-----------------------|-----------------------|--------|
| Raw decode (Q4 7B)       | 41.7 tok/s (DP4A)     | 100+ tok/s            | YELLOW |
| llama.cpp raw            | 80+ tok/s             | 100+ tok/s            | YELLOW |
| MBPP (n=100, native)     | **B=13/100 (production), A=60, C=60.** BD6.2 union/4ep regressed to 6, reverted. BD6.3 anchor-gate 0/19, rejected at gate. | B ≥ 25 first | YELLOW — production stable at 13; BD6.4 needs anchor-positive curriculum |
| HumanEval (n=164, full)  | **B=6/164 (production), A=81, C=81.** Same BD6.2/BD6.3 history. | B ≥ 20 first | YELLOW — production stable at 6; BD6.4 next |
| LCB easy (n=50)          | **B=0/50 (post-route-fix), organs={phys05_code_skeleton}, fb=0.** Pre-fix was routing artefact (triz). | B ≥ 3 after BD6.4 | YELLOW — route fixed; data needs competitive-programming refs |
| Anchor (BD6 pass-1 wins) | **19/19** verified on production pack | gate must stay 19/19 before any pack flip | GREEN |
| Terminal NanoOS V1 (10)  | PARROT 7 / MONSTER 8 stable; 9 in best-of-N (+1 / +2) | MONSTER ≥9 | YELLOW |
| Terminal NanoOS V2 (30)  | PARROT 20 / MONSTER 22 (+2), wall ratio 2.35× | MONSTER − PARROT ≥ +8 | YELLOW |
| Exact replay cache       | 5/5 exact repeats: ~2ms warm, 140-403× speedup | utility cache hygiene | GREEN (UTILITY) |
| Holographic form replay  | 20/20 non-identical variants, all <100ms, all model_called:false | ≥15/20 non-identical variants pass with no 7B call | GREEN |
| **Organ farm liveness**  | **9/9 organs fired with food=1; 5 dead organs revived; phys05_wound born and live** | every `--chat` DAG carries multi-organ chain | YELLOW (alive but not yet routed in --chat) |
| **Black-Dog reinforcement** | **9/9 organs have BD-moved conductance (0.000 → 0.20-0.67 in 5 prompts each)** | every --chat DAG has real food/poison/cond_before/cond_after; conductance influences routing | GREEN (signal layer) / YELLOW (router not yet using cond) |
| **ARIZ/TRIZ organ-first chain** | smoke green: triz fail → 7B-chat synth pass; verifier "TC+PC both filled" | 100 ARIZ task probe; triz solo pass-rate ≥40%; chain pass-rate ≥85% | YELLOW (architecturally GREEN, model-quality YELLOW) |
| **MONSTER_INTEGRATION_V1** | **8/8 routes landed, 6/6 active non-cache routes have BD signal in DAG** (json_repair fall-through closed 2026-05-01) | every route lands + has BD signal | GREEN (architecturally one body, every lane proven) |
| **Organ baseline probe** | **45/45 probes ok=true; 100 % pass-rate at sane_nonempty verifier; per-organ BD curves in `reports/ORGAN_BASELINE_PROBE.md`** | per-organ STRICT verifier (json_strict / TC-PC / etc) — BD4 work | GREEN |
| Prefix-cache retry       | not implemented       | MBPP wall <100s @ ≥73 | RED    |
| Runtime-owned schemas    | not implemented       | tokens −40 %          | RED    |
| SWE micro 10             | none                  | MONSTER − PARROT ≥ +4 | RED    |
| Memory 350 lookup        | smoke only            | <1ms, 100/100         | RED    |
| Self-repair runtime      | partial (templates)   | 3 auto repairs        | RED    |
| Official bench queue     | partial               | one status doc        | YELLOW |

Last updated: 2026-05-01. Owner: agent (auto-update after each numbered TASK).

## BENCH_CLEANUP_AND_OFFICIAL_RUN status

* **TASK 1** (json_repair fall-through fix) — **DONE 2026-05-01.**
  Route 8/8, BD signal on 6/6 active non-cache routes.
  See `reports/MONSTER_INTEGRATION_V1.md`.
* **TASK 2** (MBPP/HE × A/B/C) — **HARNESS READY.**
  `NO_7B_FALLBACK` env gate landed in `run_chat_organ_route` and
  `run_chat` HumanEval branch. Bench: `tools/bench/mbpp_he_3mode.py`.
  Mode A = MONSTER_FORCE_7B (or external llama-server when reachable).
  Mode B = ORGAN_FIRST=1 + NO_7B_FALLBACK=1 (raw 0.5B only).
  Mode C = ORGAN_FIRST=1 + MONSTER_NATIVE_RETRY=1 (organ + 7B fallback).
* **TASK 3** (LiveCodeBench / BFCL official / GPQA Diamond) — pending.
* **TASK 4** (single unified table) — pending.

---

## Why exact-cache is GREEN-UTILITY but NOT x100 advantage

`HOLOGRAM_REPLAY_X100` (5/5 workflows, 140-403× speedup) was renamed to
`EXACT_REPLAY_CACHE_V1`. It proves a memoization layer: same exact input
→ same exact output, no model call. Useful for any workflow that
genuinely repeats verbatim. But it is **not** strategic intelligence:
two prompts that differ in a single character bypass it entirely.

Real x100 advantage requires the system to recognize the FORM of a task
across surface variations and replay the action template with new
parameters. That is `HOLOGRAPHIC_FORM_REPLAY_V1`, which is the next
sprint axis below.

---

## Sprint order (no choice; do in order)

1. ✅ PHASE_12_TR_HEREDOC_AWARE         → Terminal V1 +1 stable / +2 best
2. ✅ TERMINAL_NANOOS_30                 → +2 (target +8 model-class gated)
3. ✅ EXACT_REPLAY_CACHE_V1              → utility hygiene GREEN, not x100
4. ⏳ HOLOGRAPHIC_FORM_REPLAY_V1         → real x100: ≥15/20 form variants
5. PREFIX_CACHE_RETRY                    → MBPP wall <100s
6. RUNTIME_OWNED_OUTPUTS                 → tokens −40 %
7. SWE_MICRO_CAPSULE_V1                  → +4 vs PARROT
8. MEMORY_350_PROOF_V1                   → exact lookup <1ms
9. SELF_REPAIR_RUNTIME_V1                → 3 auto repairs
10. OFFICIAL_BENCH_QUEUE_STATUS          → status doc

A pass without a new number does not count.
