TERMINAL_NANOOS_MINI_V1

Phase-12 NanoOS Capsule Substrate × Terminal-Bench-style suite (10 tasks). PARROT = one llama.cpp call -> extract bash -> ONE capsule run. MONSTER = envelope --chat -> C++ runtime drives k=1..3 stderr-feedback retry via tools/capsule/shell_capsule.py. Same capsule + verifier per task across both modes.

Both modes record an Evidence object at dag/capsules/cap_*.json containing capsule_id, exit codes per command, sha256 of every artifact file, and a replay_recipe (the full spec to re-execute deterministically). MONSTER's evidence is the survivor of the feedback loop; PARROT's is one-shot.

| mode | pass | total | rate | wall | VWS | |---|---|---|---|---|---| | PARROT | 7 | 10 | 70 % | 3.1s | 2.26 | | MONSTER | 8 | 10 | 80 % | 7.5s | 1.07 |

Δ MONSTER − PARROT: +1

Per-task

| task | diff | PARROT | MONSTER | rounds (M) | capsule_id (M) | artifacts (M) | |---|---|---|---|---|---|---| | create_file_exact | easy | OK | OK | 1 | 30_184916_dca840 | a948904f2f0f479b | | run_python_print_42 | easy | OK | OK | 1 | 30_184916_91e4c3 | b2101d5826aa11de | | fix_failing_test | hard | OK | X | 3 | 30_184918_311478 | d9ae93c739f49feb, e1a894022d1a0829, c24be9d84c1462cf | | parse_json | easy | OK | OK | 1 | 30_184918_43072a | 36bd4ed657a57ace | | sed_transform | medium | X | OK | 2 | 30_184919_1376a2 | ac914dfa543e017c, ac914dfa543e017c | | compile_cpp_missing_include | hard | X | OK | 2 | 30_184920_4f9f8e | 9d648401c7b6fb42, 3433be63e64588d1 | | chmod_run_executable | medium | OK | OK | 1 | 30_184921_5243b5 | f2e35109e0f7bfa8 | | find_bug_from_stderr | hard | X | X | 3 | 30_184922_950abf | ede1fb2f0d44a773 | | produce_patch | medium | OK | OK | 1 | 30_184922_24179c | 470450d1505afcca, 4fdbc441ea7b5461, 636b5bae55f8b988 | | verify_output_hash | medium | OK | OK | 1 | 30_184923_d28b3d | 5fc4ae2d24d613fb, 8bd186b55ecb5d98 |

Architectural witness

PARROT pass rows still produce a capsule + DAG entry — the difference is that MONSTER's evidence is what survived a retry, while PARROT's is the single shot.
Every MONSTER pass row has a non-empty capsule_id and ≥1 artifact sha256. Replay any of them by running the capsule with the spec dumped inside replay_recipe.spec_inline.
For MONSTER rows where rounds > 1, the C++ runtime fed stderr+exit codes from the k-1 capsule into the next prompt — that's the execution evidence the doctrine demands.

Tasks (10)

create_file_exact (easy) — Create a file named hello.txt in the current directory whose contents are exactly the line `hello wo
run_python_print_42 (easy) — Write a python script solve.py that prints exactly the integer 42 and run it with python3. Use she
fix_failing_test (hard) — There are two files: math_lib.py and run_tests.py. Run the tests with python3 run_tests.py. Th
parse_json (easy) — There is a file data.json in the working dir. Use python3 to read it and print the value of the ke
sed_transform (medium) — There is a file data.txt with three lines like key: N. Produce a file out.txt where every key
compile_cpp_missing_include (hard) — There is a file prog.cpp in the working dir. Compile it with g++ prog.cpp -o prog. Then run `./p
chmod_run_executable (medium) — There is a file script.sh in the working dir. Make it executable and run it (./script.sh). Its out
find_bug_from_stderr (hard) — Run python3 app.py. The output must contain the literal token OK_PARSED. If the program raises an
produce_patch (medium) — There are two files before.txt and after.txt. Produce a unified diff and save it as patch.diff
verify_output_hash (medium) — There is a file payload.txt in the working dir. Compute its sha256 hash with `sha256sum payload.tx

DOD

GREEN — MONSTER 8/10 > PARROT 7/10; execution layer earned its place by Δ +1.