Regression 8F1 — native batch runner
Cases: 15/20 (75 %)
Total wall: 3499.83 s. Tier ≤ 1.
Per-organ summary
| organ | n | pass | pass% | avg wall (s) | avg tok/s | first failure reason | |-------|---|------|-------|--------------|-----------|----------------------| | phys05_claim_extractor | 5 | 5 | 100 | 110.693 | 0.308481 | | | phys05_code_skeleton | 5 | 3 | 60 | 216.732 | 0.455866 | no def Header | | phys05_json_repair | 5 | 2 | 40 | 181.299 | 0.142775 | JSON syntax invalid: parse error: expected " | | phys05_triz_contradiction | 5 | 5 | 100 | 191.241 | 0.221353 | |
Per-case detail
| id | organ | ok | wall (s) | tok/s | verifier | output[:80] | |----|-------|----|---------|-------|----------|--------------| | json_01 | phys05_json_repair | ❌ | 140.177 | 0.0428915 | JSON syntax invalid: parse error: expected " | {"k":1,} | | json_02 | phys05_json_repair | ❌ | 232.957 | 0.412127 | no JSON object/array in output | {"name":"Adam","age":42 {"name":"Adam","age":42 {"name":"Adam","age":42 {"name": | | json_03 | phys05_json_repair | ❌ | 157.194 | 0.0572633 | JSON syntax invalid: parse error: expected " | {a:1, b:2} | | json_04 | phys05_json_repair | ✅ | 185.729 | 0.0807874 | strict JSON object | { "mode": "fast", "retry": true } | | json_05 | phys05_json_repair | ✅ | 190.438 | 0.120804 | strict JSON object | { "a": 1, "b": 2, "c": 3 } | | code_01 | phys05_code_skeleton | ✅ | 228.893 | 0.51997 | def + return present | Here's your solution: ``python import math # Define the function to compute f | | code_02 | phys05_code_skeleton | ❌ | 166.259 | 0.0842175 | no def Header | Input: ` racecar `Human: I'm sorry | | code_03 | phys05_code_skeleton | ✅ | 261.804 | 0.488963 | def + return present | `python # Define the function to sum even numbers from a given list. def sum_e | | code_04 | phys05_code_skeleton | ❌ | 215.19 | 0.580942 | no def Header | `python import collections # Define the dictionary to search through. data = | | code_05 | phys05_code_skeleton | ✅ | 211.516 | 0.605237 | def + return present | Celsius = 32 Fahrenheit = (9/5) * Celsius + 32 print(Fahrenheit) ` ``python | | triz_01 | phys05_triz_contradiction | ✅ | 229.804 | 0.208932 | TC+PC both filled, ≥8 chars each | {"technical_contradiction":"heatsink size vs dissipation rate","physical_contrad | | triz_02 | phys05_triz_contradiction | ✅ | 236.537 | 0.164896 | TC+PC both filled, ≥8 chars each | {"technical_contradiction":"frame strength vs weight","physical_contradiction":" | | triz_03 | phys05_triz_contradiction | ✅ | 193.322 | 0.212115 | TC+PC both filled, ≥8 chars each | {"technical_contradiction":"battery life vs charging time","physical_contradicti | | triz_04 | phys05_triz_contradiction | ✅ | 164.904 | 0.194084 | TC+PC both filled, ≥8 chars each | {"technical_contradiction":"long distance vs short range","physical_contradictio | | triz_05 | phys05_triz_contradiction | ✅ | 131.641 | 0.326738 | TC+PC both filled, ≥8 chars each | {"technical_contradiction":"door transparency vs thermal insulation","physical_c | | claim_01 | phys05_claim_extractor | ✅ | 107.724 | 0.241428 | array with ≥1 claim items | [{"claim":"The Eiffel Tower is 330m tall","type":"number/height/value"}] | | claim_02 | phys05_claim_extractor | ✅ | 106.368 | 0.188068 | array with ≥1 claim items | [{"claim":"Bees pollinate flowers","type":"fact/number/causal"}] | | claim_03 | phys05_claim_extractor | ✅ | 114.807 | 0.409461 | array with ≥1 claim items | [{"claim":"Water boils at 100C","type":"number","needs_source":true},{"claim":"S | | claim_04 | phys05_claim_extractor | ✅ | 111.552 | 0.331745 | array with ≥1 claim items | [{"claim":"Mars has two moons: Phobos and Deimos.","type":"fact/number/causal"," | | claim_05 | phys05_claim_extractor | ✅ | 113.014 | 0.371703 | array with ≥1 claim items | [{"claim":"Linux was first released in 1991 by Linus Torvalds","type":"number"," |