# SOVEREIGN_COGNITION_GAUNTLET_V1

**Big Tech-style coding bench × repeat-learning axis.**

_6 problems × 10 rounds × 2 backends. Same Physarium-7B Q4 weights via llama.cpp. PARROT = pure HTTP call. MONSTER_LEARNING = full --chat runtime with scroll injection and admin-self-seed-on-fail._


## Pass-rate per round

| round | PARROT | MONSTER_LEARNING |
|---|---|---|
|   1 | 6/6 | 5/6 |
|   2 | 6/6 | 6/6 |
|   3 | 6/6 | 6/6 |
|   4 | 6/6 | 6/6 |
|   5 | 6/6 | 6/6 |
|   6 | 6/6 | 6/6 |
|   7 | 6/6 | 6/6 |
|   8 | 6/6 | 6/6 |
|   9 | 6/6 | 6/6 |
|  10 | 6/6 | 6/6 |

**PARROT total:** 60/60 = 100%
**MONSTER_LEARNING total:** 59/60 = 98%
**Δ:** +-1 passes (-2 pp)


## Per-problem first-pass round

| problem | PARROT first-pass | MONSTER first-pass | seed injected after |
|---|---|---|---|
| is_prime | 1 | 1 (post-seed pass=—) | — |
| fizzbuzz_list | 1 | 1 (post-seed pass=—) | — |
| roman_to_int | 1 | 1 (post-seed pass=—) | — |
| is_balanced | 1 | 1 (post-seed pass=—) | — |
| count_unique_chars | 1 | 2 (post-seed pass=9/9) | — |
| flatten_nested | 1 | 1 (post-seed pass=—) | — |

## DOD

**RED** — MONSTER worse than PARROT.

## Why this differs from HumanEval / MBPP / SWE-bench

Standard benches under temp=0 produce ONE pass-rate number per problem. Our gauntlet runs the same problem 10× and asks: *if the system fails, can it be made to learn?* PARROT under temp=0 gives the same wrong answer 10× — that's a flat fail line. MONSTER_LEARNING gets an admin-written exemplar between rounds (the "self-LoRA-with-admin-rights" loop in cheap form) and the curve actually rises. A standard bench cannot express this question.

_Raw: `reports/SOVEREIGN_COGNITION_GAUNTLET_V1.json`_