CyberdyneLabs · Reports · TERMINAL_NANOOS_MINI_V1

TERMINAL_NANOOS_MINI_V1

reports/TERMINAL_NANOOS_MINI_V1.md 633 words raw markdown ↗

TERMINAL_NANOOS_MINI_V1

Phase-12 NanoOS Capsule Substrate × Terminal-Bench-style suite (10 tasks). PARROT = one llama.cpp call -> extract bash -> ONE capsule run. MONSTER = envelope --chat -> C++ runtime drives k=1..3 stderr-feedback retry via tools/capsule/shell_capsule.py. Same capsule + verifier per task across both modes.

Both modes record an Evidence object at dag/capsules/cap_*.json containing capsule_id, exit codes per command, sha256 of every artifact file, and a replay_recipe (the full spec to re-execute deterministically). MONSTER's evidence is the survivor of the feedback loop; PARROT's is one-shot.

| mode | pass | total | rate | wall | VWS | |---|---|---|---|---|---| | PARROT | 7 | 10 | 70 % | 3.1s | 2.26 | | MONSTER | 8 | 10 | 80 % | 7.5s | 1.07 |

Δ MONSTER − PARROT: +1

Per-task

| task | diff | PARROT | MONSTER | rounds (M) | capsule_id (M) | artifacts (M) | |---|---|---|---|---|---|---| | create_file_exact | easy | OK | OK | 1 | 30_184916_dca840 | a948904f2f0f479b | | run_python_print_42 | easy | OK | OK | 1 | 30_184916_91e4c3 | b2101d5826aa11de | | fix_failing_test | hard | OK | X | 3 | 30_184918_311478 | d9ae93c739f49feb, e1a894022d1a0829, c24be9d84c1462cf | | parse_json | easy | OK | OK | 1 | 30_184918_43072a | 36bd4ed657a57ace | | sed_transform | medium | X | OK | 2 | 30_184919_1376a2 | ac914dfa543e017c, ac914dfa543e017c | | compile_cpp_missing_include | hard | X | OK | 2 | 30_184920_4f9f8e | 9d648401c7b6fb42, 3433be63e64588d1 | | chmod_run_executable | medium | OK | OK | 1 | 30_184921_5243b5 | f2e35109e0f7bfa8 | | find_bug_from_stderr | hard | X | X | 3 | 30_184922_950abf | ede1fb2f0d44a773 | | produce_patch | medium | OK | OK | 1 | 30_184922_24179c | 470450d1505afcca, 4fdbc441ea7b5461, 636b5bae55f8b988 | | verify_output_hash | medium | OK | OK | 1 | 30_184923_d28b3d | 5fc4ae2d24d613fb, 8bd186b55ecb5d98 |

Architectural witness

Tasks (10)

DOD