TERMINAL_NANOOS_NATIVE_V1
Phase-12.TR — same 5 Terminal-Bench-style tasks, same NanoOS shell capsule, but the retry loop now lives in src/main.cpp (run_native_terminal_task). The bench harness sends ONE --chat envelope per task.
| mode | pass | total | rate | wall | VWS | |---|---|---|---|---|---| | PARROT_NATIVE | 4 | 5 | 80 % | 3.6s | 1.12 | | MONSTER_NATIVE | 4 | 5 | 80 % | 8.6s | 0.47 |
Δ MONSTER_NATIVE − PARROT_NATIVE: +0
Per-task
| task | PARROT_N | MONSTER_N | rounds (M) | wall_M (s) | note (M) | |---|---|---|---|---|---| | create_file_exact | OK | OK | 1 | 0.542 | content_has:'hello world' | | run_python_print_42 | OK | OK | 1 | 0.881 | found:'42' | | fix_broken_python | X | X | 3 | 5.656 | missing:'f() = 3' | | parse_json | OK | OK | 1 | 0.984 | found:'42' | | count_grep | OK | OK | 1 | 0.531 | regex_hit:^\\\\s*4\\\\s*$ |
Architectural witness
- Bench Python sends ONE envelope per task.
looks_like_terminal_task()detectsTERMINAL_TASK_V1magic.parse_terminal_envelope()extracts instruction, inputs JSON, verifier JSON.run_native_terminal_task()drives k=1..3 sequentially, popen-ingshell_capsule.pyeach round and feeding stderr+exit codes back into the next prompt.- Final envelope emits
attempts,first_pass_round,final_dagfor replay. The Python bench is now a thin dispatcher.
DOD
- YELLOW — tie at 4/5: tasks too thin to differentiate (same as Python bench).