# TERMINAL_NANOOS_MINI_V1

Phase-12 NanoOS Capsule Substrate × Terminal-Bench-style suite (10 tasks). PARROT = one llama.cpp call -> extract bash -> ONE capsule run. MONSTER = envelope `--chat` -> C++ runtime drives k=1..3 stderr-feedback retry via `tools/capsule/shell_capsule.py`. Same capsule + verifier per task across both modes.

Both modes record an Evidence object at `dag/capsules/cap_*.json` containing `capsule_id`, exit codes per command, sha256 of every artifact file, and a `replay_recipe` (the full spec to re-execute deterministically). MONSTER's evidence is the survivor of the feedback loop; PARROT's is one-shot.

| mode | pass | total | rate | wall | VWS |
|---|---|---|---|---|---|
| PARROT  | 7  | 10 | 70 % | 3.1s | 2.26 |
| MONSTER | 8  | 10 | 80 % | 7.5s | 1.07 |

**Δ MONSTER − PARROT: +1**

## Per-task

| task | diff | PARROT | MONSTER | rounds (M) | capsule_id (M) | artifacts (M) |
|---|---|---|---|---|---|---|
| create_file_exact | easy | OK | OK | 1 | 30_184916_dca840 | a948904f2f0f479b |
| run_python_print_42 | easy | OK | OK | 1 | 30_184916_91e4c3 | b2101d5826aa11de |
| fix_failing_test | hard | OK | X | 3 | 30_184918_311478 | d9ae93c739f49feb, e1a894022d1a0829, c24be9d84c1462cf |
| parse_json | easy | OK | OK | 1 | 30_184918_43072a | 36bd4ed657a57ace |
| sed_transform | medium | X | OK | 2 | 30_184919_1376a2 | ac914dfa543e017c, ac914dfa543e017c |
| compile_cpp_missing_include | hard | X | OK | 2 | 30_184920_4f9f8e | 9d648401c7b6fb42, 3433be63e64588d1 |
| chmod_run_executable | medium | OK | OK | 1 | 30_184921_5243b5 | f2e35109e0f7bfa8 |
| find_bug_from_stderr | hard | X | X | 3 | 30_184922_950abf | ede1fb2f0d44a773 |
| produce_patch | medium | OK | OK | 1 | 30_184922_24179c | 470450d1505afcca, 4fdbc441ea7b5461, 636b5bae55f8b988 |
| verify_output_hash | medium | OK | OK | 1 | 30_184923_d28b3d | 5fc4ae2d24d613fb, 8bd186b55ecb5d98 |

## Architectural witness

* PARROT pass rows still produce a capsule + DAG entry — the difference is that MONSTER's evidence is what survived a retry, while PARROT's is the single shot.
* Every MONSTER pass row has a non-empty `capsule_id` and ≥1 artifact sha256. Replay any of them by running the capsule with the spec dumped inside `replay_recipe.spec_inline`.
* For MONSTER rows where `rounds > 1`, the C++ runtime fed stderr+exit codes from the k-1 capsule into the next prompt — that's the execution evidence the doctrine demands.

## Tasks (10)

* `create_file_exact` (easy) — Create a file named hello.txt in the current directory whose contents are exactly the line `hello wo
* `run_python_print_42` (easy) — Write a python script `solve.py` that prints exactly the integer 42 and run it with python3. Use she
* `fix_failing_test` (hard) — There are two files: `math_lib.py` and `run_tests.py`. Run the tests with `python3 run_tests.py`. Th
* `parse_json` (easy) — There is a file `data.json` in the working dir. Use python3 to read it and print the value of the ke
* `sed_transform` (medium) — There is a file `data.txt` with three lines like `key: N`. Produce a file `out.txt` where every key 
* `compile_cpp_missing_include` (hard) — There is a file `prog.cpp` in the working dir. Compile it with `g++ prog.cpp -o prog`. Then run `./p
* `chmod_run_executable` (medium) — There is a file `script.sh` in the working dir. Make it executable and run it (./script.sh). Its out
* `find_bug_from_stderr` (hard) — Run `python3 app.py`. The output must contain the literal token OK_PARSED. If the program raises an 
* `produce_patch` (medium) — There are two files `before.txt` and `after.txt`. Produce a unified diff and save it as `patch.diff`
* `verify_output_hash` (medium) — There is a file `payload.txt` in the working dir. Compute its sha256 hash with `sha256sum payload.tx

## DOD

* GREEN  — MONSTER 8/10 > PARROT 7/10; execution layer earned its place by Δ +1.