Surgery Chronology · 2026-04-21 → 2026-05-05

Every claim has a date.
Every number has a file.

From a 284 B-total / 13 B-active MoE running on a single 8 GB GPU, through the first organic weight surgery on Qwen 0.5B, through the Phase 8E speed arc that took us from 1.91 tok/s to 83.58 tok/s, through eight reverted BD6 surgery passes — this page is the unedited record. Reverts stay visible. Errata stay flagged.

6 eras· ~13 days· 40+ report files· 0 redactions

Eras

Era 1 · 2026-04-21 → 04-26V4-Flash flagship — 284B on a single 3060 Ti Era 2 · 2026-04-26 → 04-27Physarum-05B-Organic — first weight surgery Era 3 · 2026-04-27 → 05-03GIGACHAD Phase 6 → Phase 13 trajectory Era 4Speed ladder — 1.91 → 83.58 tok/s Era 5Acceptance integrity ladder Era 6Doctrine docs timeline

Era 1 · 2026-04-21 → 2026-04-26

V4-Flash flagship — 284B on a single 8 GB GPU.

Subject: DeepSeek-V4-Flash · MoE FP4 routed experts + FP8 backbone · ~284 B total parameters (13 B active per token) on disk · 13 B active per token

The flagship demonstration. DeepSeek-V4-Flash — a frontier-grade open MoE artefact the field assumed required several data-centre GPUs — was driven through end-to-end inference on a single RTX 3060 Ti, 8 GB VRAM, 13 GB system RAM, 80 GB swap, WSL2. This was not a stunt. It was the phase that produced every piece of infrastructure the rest of the lab now runs on: planck packs, the Singularity Monolith VRAM pool, the hot-expert cache, the singularity index, the C-extensions for packed-q8 decode. Calling V4-Flash an "autopsy archive" understates what it was. It was the source.

Hardware

RTX 3060 Ti · 8 GB VRAM · 13 GB RAM · 80 GB swap · WSL2

Download

148.7 GB · 46 shards · 102.5 min

VRAM resident

1.60 GB Singularity Monolith · 430 594 648 uint32 packed

Index

4 992 projection entries

C-extensions

planck_core.so/.dll · planck_core_v3.so

Decode best

7.5 s/tok · p50 9.6 · p95 27.7 · avg 13.8

Cold prefill

47.3 s · 1 prompt token

Output

semantically valid (PHP/Laravel scaffolding from "hi")

Optimization passes — 19 measured improvements (and one honest negative result)

V4_EXPERT_HOT_CACHE

top-N hot experts in pinned RAM. top500 = 56 % RAM hits, −21 % wall.

Hot1000 (negative)

Page-cache thrash · 380 sec/q vs 100 sec/q baseline = 3.8× WORSE. Local optimum found, recorded as honest negative result. source: HISTORY_TREE.md §1 (per Era brief)

Prefetch L+1

Pipelining: prefetch experts of layer L+1 while computing layer L.

Batch=2 union prefill

Co-issue two prefill requests, share expert IO.

VRAM cut 1

shared_experts → RAM streaming.

VRAM cut 2

Pool 128 → 32 slots.

Final consolidated

Hot500 + prefetch + pool32 + shared streaming — production bench.

Bit-shift FP8

FP8_E8M0 → FP32 in scale expansion (kernel-level).

SPEED 3-way

Mode toggle: BASELINE / SMALL_HOT / BATCH2_UNION.

PERF_ROOFLINE

89 % wall = expert IO; 60 % of that = pure disk wait at 48 ms/expert.

Reference correctness — Paris top-1 verified

Prompt

"What is the capital of France? Answer in one word." (DeepSeek-V4 chat-template wrapped)

Top-1 token

Paris (51119) · logit +40.75

Margin vs #2

+11.13 over France (51725, +29.62)

Top-10

Paris · France · The · Berlin · Capital · London · ** · Nice · L · PAR

Forward (43 layers)

~324 s · backbone load ~327 s · ~14 GB RAM / ~7 GB VRAM

Verified

2026-05-31

8 architecture findings that broke our own ports

F1 · Chat template

Trailing <｜Assistant｜></think> is mandatory. Without it the model emits garbage; the closing </think> selects non-thinking mode.

F2 · Hash routing weights

Layers 0/1/2 route via tid2eid table, but weights still come from sqrt(softplus(x·Wᵀ)) + gather + normalise × 1.5. Uniform 1/top_k destroys magnitudes.

F3 · act_quant double-apply

GEMM-input form returns (y_raw, scale); inplace form returns y·scale. Confusing them = silent 1000× underflow in MoE.

F4 · Compressor a/b split

CSA (m=4) splits wkv/wgate/ape into a-portion (previous chunk overlap) and b-portion (current). HCA (m=128) has no overlap.

F5 · mHC C_l = 2·sigmoid

Manifold-Constrained Hyper-Connections output mapping is 2 × sigmoid (range [0, 2]) — the factor 2 is structural, not optional.

F6 · SwiGLU asymmetric clamp

Up-component clamped [−10, +10]; gate capped at +10 only, no lower bound. The asymmetry is intentional.

F7 · Inverse RoPE on attn output

Last 64 dims of attention output get inverse RoPE before grouped output projection ("Partial RoPE", paper §2.3.3). Missing this destroys long-context behaviour.

F8 · Per-head Q RMS

After wq_a → q_norm → wq_b, apply per-head RMS with no learned weight: q *= rsqrt(q.square().mean(-1) + eps). Skipping degrades head specialisation silently.

All eight findings, the full reference Python pipeline, and per-layer activation dump tooling are now open-source: surgery → DeepSeek V4-Flash open-source reference. compare_dumps.py is the test oracle for any port.

sources: folder/V4_FLASH_TECH_BRIEF.md · folder/PYTHON_PIPELINE_DOC.md · folder/flash_mvp.py · folder/dump_ref_v4.py · folder/compare_dumps.py · folder/v4_download.log · folder/docs/HISTORY_TREE.md §1

Era 2 · 2026-04-26 → 2026-04-27

Physarum-05B-Organic — the first weight surgery.

Engine: folder/physarum_engine.cpp · 137 lines C++17 · energy-conserving softmax flow · organic threshold D < mean·0.1

The first real weight surgery in the lab. A 137-line C++17 engine performed organic, flow-based pruning on Qwen 2.5 0.5B, killing 20.6 % of weights without changing the file size on disk. We measured what survived and what did not — and we publish both, not only the survivors.

Patient

Qwen 2.5 0.5B

Output artefact

folder/Physarum-05B-Organic/model.safetensors · 988 MB (≈ donor + 200 B header)

Surgery params

block_size=256 · n_iter=30 · threshold=organic

Killed

20.6 % of weights

Surgery wall

207.5 s

Pattern

168 / 290 tensors modified · 24 layers × 7 projections

Honest delta on hard tasks (measured · not estimated)

Axis	Before	After	Δ	Note
Perplexity (raw)	27.16	31.32	+15.3 %	final_results.json measured value · brief had +12.5 %, see errata below
Throughput (tok/s)	27.15	27.55	no regression	preserved within noise
MMLU-mini	90 %	70 %	−22 %	hard-task degradation
GSM8K-mini	100 %	80 %	−20 %	hard-task degradation
JSON-repair smoke	100 %	100 %	0	no regression
Code-skeleton smoke	100 %	100 %	0	no regression
Disk size	988 MB	988 MB	+0 %	same shape, same shards
VRAM	baseline	+1 %	noise	—
decode_128	baseline	+14 %	within noise	—

Errata · 2026-05-03 An earlier internal brief reported a PPL delta of +12.5 % on this surgery. The measured value in folder/final_results.json is +15.3 % (27.16 → 31.32). The +12.5 % figure does not appear in any source file and is withdrawn.

sources: folder/physarum_engine.cpp · folder/Physarum-05B-Organic/ · folder/final_results.json · folder/reports/TRUTH_LEDGER.md §B · weight_diff.json · sparsity_pattern.json

Era 3 · 2026-04-27 → 2026-05-05

GIGACHAD Phase 6 → Phase 13 — the native runtime arc.

The bulk of the work. The native runtime was consolidated, a 7B top-brain was operated on, the Singularity / planck pack format was implemented end-to-end, the speed arc carried us from 1.91 tok/s to over 83 tok/s, the chat path was verified, the Black-Dog reinforcement loop was wired, and BD-series organ surgery began. Every phase below maps to a report file; all sub-step numbers come from that report's measurements.

Phase 6 — Native consolidation · 2026-04-27

PHASE 6

Native consolidation. TRUTH_LEDGER.md first written. source: GIGACHAD_PHASE6_NATIVE_CONSOLIDATION.md

Phase 7 — Physarium-7B top-brain surgery · 2026-04-27

Donor

Qwen 2.5 7B-Instruct · 15 GB

Surgery

block_size=256 · n_iter=30 · β=2.0

Killed

1 450 103 613 weights · 22.22 % of target / 19.04 % of full 7B

Output

Physarium-7B-Native · 15 GB · 196 tensors logged

Tile coverage

100 %

Errata flag

v1 magnitude-flow, NOT activation-aware

Errata pointer · v1 method Phase-7 surgery is recorded as v1 (magnitude-flow). It is not activation-aware. The reconcile note documents the denominator audit and the limits of v1; v2 (activation-aware) is queued. See PHYSARIUM_RESULTS_RECONCILE.md and PHYSARIUM_COVERAGE_AUDIT.md.

sources: reports/PHYSARIUM7B_SURGERY_REPORT.md · reports/PHYSARIUM_COVERAGE_AUDIT.md · reports/PHYSARIUM_RESULTS_RECONCILE.md

Phase 8A → 8D — Planck format & first E2E

PHASE 8A

PLANCK7B_PACK format · writer · reader · verifier. Byte-for-byte verify 50/50 PASS.

PHASE 8B

Dense Qwen2-arch native runner.

PHASE 8C0

Sanity guard — determinism · top-k · NaN · checksum.

PHASE 8C

organ_manager + planck_runner wiring.

PHASE 8D

First real E2E run — ARIZ pipeline.

Phase 8E — The Speed Arc ✦

8E.0

GEMV kernel · 7B-shape GEMV in 0.321 ms at ~422 GB/s = 94 % of RTX 3060 Ti peak. CPU diff 4e-6 max.

8E.1

Full GPU forward for 0.5B · 9 kernels (gemv / rmsnorm / rope / kv_cache / gqa / silu_mul / residual / embed / argmax / nonfinite). 0.5B top-brain-smoke 116 tok/s vs CPU 1.91 = 61× speedup, byte-identical.

8E.1b

--backend cuda via organ_manager. json_repair 2.4 s vs 49.2 s CPU = 20×.

8E.1c

CUDA schema calibration.

8E.2

7B layer streaming on CUDA (single-slot ring + per-step H2D). 0.20 tok/s, byte-identical to CPU. Correctness proof; not the main path.

8E2 NUCLEAR

Physarium-7B Q4 RESIDENT pack + fused CUDA dequant GEMV. 15.23 GB BF16 → 5.55 GB Q4 (group=128), all 28 layers in VRAM. 11.16 tok/s = 280× CPU baseline.

8E3b

Hyperspeed: remove per-token syncs · persistent argmax buffer.

8E3

Quality parity: ChatML wrapper · identity / JSON smokes BF16 vs Q4.

8E4

ChatML everywhere · identity hallucination gate · Q5_K track.

8E5 / 8E5b

CUDA graphs + fusion · fused residual+rmsnorm.

8E6A / 8E6B

Fused silu_mul + down GEMV · tile-K shared-mem staged swiglu+down.

8E7

External kernel shootout — llama.cpp / ExLlamaV2 / AWQ as autopsy specimens.

8E7B

llama.cpp backend integration · 18/18 acceptance.

8E8a

DP4A inner loop + delayed scale.

sources: reports/HYPERSPEED_8E5.md · reports/PHASE_8E8A_DP4A_NATIVE_BACKEND.md · reports/EXTERNAL_BACKEND_SHOOTOUT_V2.md · reports/EXTERNAL_SHOOTOUT_8E7.md · docs/HISTORY_TREE.md §5

Phase 8F — Decoder controls, calibration, identity gate · 2026-04-27 → 04-28

Decoder controls — repeat penalty · stop strings · JSON balanced · sampling.

8F prompt

Prompt template surgery + memory injection.

8F regression

5 × {json_repair · code_skel · triz · claim} regression suite.

8F0 identity

Identity audit (grep donor / chat tokens) · identity manifest + system preamble. Runtime identity injection + DAG identity_version. Identity probe regression 6 questions, DOD ≥ 5/6.

8F1b

Native batch regression runner — gigachad_regression_native.

8F1c

Calibration sweep — DAG defaults · json/code tuning · negative harness · audit fix.

8F2

ARIZ kernel + Black-Dog reinforcement loop.

8F3

ARIZ trace builder · rule-based stages 1 / 4 / 5 / 6.

Phase 9 — Fusion, parallel, identity LoRA · 2026-04-28 → 04-29

Fusion Mass --chat · Q4 resident wired to physarium_7b organ.

Fusion Compaction · shrink 7B prefill.

Fast route pruning before parallel.

Parallel organ slots · claim-skip.

Identity surgery reseed · gate-→-fail · no replace.

Identity LoRA surgery pipeline.

9F-RUN

Train + merge + repack identity LoRA.

9F.1

Memory-anchored seeds · 14/14 stretch.

Phase 10 — Universal LLM surgery protocol · 2026-04-29

Universal-LLM surgery protocol document drafted — generalisation of the lab's gating doctrine across donor families.

Phase 11 — Acceptance run · 2026-04-29

GIGACHAD_ACCEPTANCE_RUN_V1.

11A

Wire code / claim / test / memory routes into --chat.

11B

Verifier alignment + code/PC fallbacks.

11C

Code prose-strip + bench window 2000.

11D

Organ-name LoRA patch — 18/18.

Phase 12 — NanoOS substrate · hologram cache · code-repair loop · 2026-04-29 → 04-30

12.0

chat_context_builder · scrolls → system_msg.

12.NanoOS

Capsule Substrate spec + shell capsule.

12.H1

Hologram exact-match cache in run_chat. 860 ms → 1 ms = 860× speedup on identical-prompt repeats.

12.CR

Code repair loop in native runtime.

12.CR.PAR

Parallel retry · kill the wall penalty.

12.TR

Port Terminal-task retry loop into C++ runtime.

12.TR.B

10-task Terminal NanoOS bench V1.

12.TR_HEREDOC_AWARE

heredoc collapse + stronger retry prompt.

sources: reports/EXACT_REPLAY_CACHE_V1.md · reports/HOLOGRAM_REPLAY_X100.md · reports/TERMINAL_NANOOS_30.md · docs/PHASE_12_NANO_OS_EXECUTION_SUBSTRATE.md

Phase 13 — Black-Dog organ colony · 2026-04-30 → 2026-05-03

BD1

BLACK_DOG_ORGAN_AUDIT — 500 DAG entries audited, 5/8 organs dead.

BD2

CONDUCTANCE_ROUTER_V1 — food/poison store wired.

BD3

POISON_TO_SURGERY_DATASET — failed traces become QLoRA training material.

BD4

BLACK_DOG_ARIZ_TRIZ_ORGAN.

BD5

MONSTER_INTEGRATION_V1 — one unified bench.

BD6 pass-1

ORGAN_QLORA_SURGERY (gated) · phys05_code_skeleton kept · 13/100 MBPP, 6/164 HE, anchor 19/19.

BD6.2 → BD6.8D-rank

8 surgery passes reverted: BD6.2 overtrain (MBPP regressed 13→6), BD6.3 anchor gate failed, BD6.4 anchor positive partial, BD6.5 stratified poison 13/19, BD6.6 over-anchor regression, BD6.7 KL-anchor ladder no lift, BD6.8D token-weighted CE no lift, BD6.8D2 over-tuned. BD6.8D-rank → freeze decision: ship pass-1, freeze BD6.x.

BD7

phys05_triz_contradiction · 88/100 strict 6-field JSON, fallback 0.

BD8 V1–V5

critic_lite + wound surgery · mechanism wired · rescue rate 0/n (training-side blocker — needs ARIZ-JSON failure samples, not stderr). wound v2 retained for in-chat rescue path.

BD9 · 04-05

phys05_json_repair surgery · 10 / 10 GREEN on production failure catalog · first organ at 100 % on a real failure-bench · 280 synthetic rows, loss 0.055 → 0.0003 · comma_as_colon (the BD8 wound-v2 quirk) handled natively.

BD9 · 05-05

Four-organ surgery sweep in one session: claim_extractor GREEN (clean JSON), test_writer YELLOW (pytest shape ok, currying confusion + Human-token leak), cache_matcher YELLOW (correct integer + post-answer drift, harmless under max_tokens=16), renderer RED (output corrupted, loss ceiling 0.69 on 25 rows). Production state: 5 GREEN · 2 YELLOW · 1 RED, up from 2 GREEN before BD9.

BD9.1 queued

renderer retraining · 50+ rows from dag/capsules/cap_*.json Terminal-NanoOS task corpus · 9 epochs OR r=16 (one lever, not both) · prompt template tightening to suppress Human: donor-token leak.

sources: reports/BD6_POST_SURGERY_DELTA.md · reports/BD6_2_OVERTRAIN_DELTA.md · reports/BD6_8D_RANK_FINAL_FREEZE.md · reports/BD7_TRIZ_SURGERY_FINAL.md · reports/RUNTIME_ORGANISM_BENCH_V1.md · reports/BD9_JSON_REPAIR_FINAL.md · reports/BD9_FOUR_ORGANS_FINAL.md · docs/PHASE_13_BLACK_DOG_ORGAN_COLONY.md

CLEAN-ROOM doctrine — 2026-04-29 → 04-30

PATIENT_LLAMA_CPP

Kernel-level extraction · llama.cpp as autopsy specimen.

PATIENT_EXLLAMAV2

EXL2 mixed-bit autopsy.

PATIENT_AWQ_MARLIN

activation-aware + Marlin int4 autopsy.

FAST_BACKEND_PLAN

Clean-room v2 — knowledge from autopsies absorbed; no code dependency.

sources: reports/CLEAN_ROOM_DOCTRINE.md · reports/EXTERNAL_BACKEND_SHOOTOUT_V2.md

Era 4 · Speed Ladder

From 1.91 tok/s to 83.58 tok/s.

The headline arc — every step measured on the same RTX 3060 Ti. Each row is a milestone in the native runtime; each number has a report file. The first six rows are the climb; the last row is the production ceiling, achieved by treating llama.cpp as a clean-room autopsy and porting its kernels into our own backend.

Phase / config	Speed	Vs prev	Note
V4-Flash 284B PyTorch warm decode	p50 9.6 s/tok	—	flagship demo · 8 GB VRAM
Physarum-05B-Organic baseline	27.15 tok/s	—	0.5B BF16 baseline
CPU baseline · 0.5B	1.91 tok/s	—	reference floor for CPU path
CUDA full GPU 0.5B (Phase 8E.1)	116 tok/s	61× CPU	byte-identical to CPU
CUDA fused 7B BF16 streaming (8E.2)	0.20 tok/s	—	correctness proof, not main path
Q4 NUCLEAR resident 7B (8E2)	11.16 tok/s	280× CPU baseline	5.55 GB Q4 group=128 · 28 layers in VRAM
Q4 native v2 default `--chat`	18.27 tok/s	+64 % over NUCLEAR	—
Q4 native + DP4A=1 (opt-in)	28.99 tok/s	+59 %	—
Q4 native + DP4A · tg128	41.69 tok/s	+44 %	—
llama.cpp backend (LLAMACPP_URL)	83.58 tok/s	+100 %	production speed · clean-room autopsy
Mode C llama.cpp acceptance · mean wall	2.99 s	—	per query, 18-task suite

sources: reports/EXTERNAL_BACKEND_SHOOTOUT_V2.md · reports/PHASE_8E8A_DP4A_NATIVE_BACKEND.md · reports/CURRENT_TRUTH_LEDGER.md §2 · 5-run mean on RTX 3060 Ti, Physarium-7B Q4

Era 5 · Acceptance Integrity Ladder

Nine runs in a row, no regression.

The 18-task curated acceptance suite is the integrity gate every change has to pass. Identity probe, organ-leaks gate, structural verifier, end-to-end JSON / code / claim / memory routes. No surgery merges if this regresses. Below is the ladder we have shipped.

Run	Result	Identity	Wall	Note
v14 · llama.cpp backend	18/18	14/14	2.99 s	production ceiling · CURRENT_TRUTH_LEDGER §1
v15 · DP4A native	17/18	—	—	opt-in flag · close to 18/18 with G2.b/G3/G4 fixes
v16 · after Gap C close	18/18	—	—	Gap C kill · GAP_C_KILL_FINAL.md
v17 · llamacpp after Gap C	18/18	—	—	—
v18 · after G3 (Python compile probe)	18/18	—	—	verifier hardening · runtime Python compile probe
v19 · after holographic form replay	18/18	—	—	HOLOGRAPHIC_FORM_REPLAY_V1.md
v20 · after native CR	18/18	—	—	code-repair loop in native runtime
v21 · anchored preamble	18/18	—	—	identity anchor in system message
v22 · post anchor	18/18	—	—	stable post-9F LoRA anchor

Source caveat v14 is independently confirmed in CURRENT_TRUTH_LEDGER.md §1. Runs v15 → v22 are listed in the master HISTORY_TREE.md trajectory. Some intermediate JSON snapshots are in reports/; the per-run JSON files for v16–v22 are recorded in master logs but not all preserved as standalone files. Treat the ladder as "every shipped change passed acceptance, recorded contemporaneously" — the headline is the trajectory, not any individual JSON.

sources: reports/gigachad_acceptance_run_v14_llamacpp.json · reports/CURRENT_TRUTH_LEDGER.md §1 · docs/HISTORY_TREE.md §5 / §6

Era 6 · Doctrine Documents Timeline

The doctrine, in date order.

The slogans on the program pages — "no GREEN without numbers", "reverts recorded in full", "external systems are autopsy specimens" — are not branding. Each one has a doctrine document, written on a specific day, for a specific reason. Below is the chronology.

Document	First written	Established that
`TRUTH_LEDGER.md`	2026-04-27	A/B/C/D categorisation of every claim — measured · surgery · scaffold · unsafe
`ARCHITECTURE_LOCK.md`	~2026-04-27	Donor used as DONOR ONLY — no in-runtime cross-talk
`PHYSARIUM_RESULTS_RECONCILE.md`	2026-04-27	v1 errata · denominator audit · v1 magnitude-flow vs activation-aware distinction
`PHYSARIUM_COVERAGE_AUDIT.md`	2026-04-27	Tile coverage 100 % · kill-rate denominator framework
`GIGACHAD_LAB_MASTER_REPORT.md`	2026-04-27	Master single source · every new report appends "UPDATE TO MASTER REPORT"
`ARIZ_KERNEL.md`	—	ARIZ / TRIZ reasoning kernel spec
`BLACK_DOG_LEARNING_LOOP.md`	—	Food / poison reinforcement loop spec
`CLEAN_ROOM_DOCTRINE.md`	2026-04-29	External systems = patients, never spine. llama.cpp / EXL2 / AWQ / Claude Code — autopsy only
`CURRENT_TRUTH_LEDGER.md`	2026-04-29	Most recent SoT replaces TRUTH_LEDGER · live updates land here first
`PHASE_12_NANO_OS_EXECUTION_SUBSTRATE.md`	2026-04-29	NanoOS spec · capsule sandbox for terminal evaluation
`PHASE_13_BLACK_DOG_ORGAN_COLONY.md`	2026-04-30	Organ colony spec · BD-series surgery model
`HISTORY_TREE.md`	2026-04-30	Single source of "what happened when" · the chronological backbone
`X100_SCOREBOARD.md`	—	Repeat-learning ×100 maintenance ledger

all documents under folder/docs/ and folder/reports/ · primary chronological backbone: folder/docs/HISTORY_TREE.md

Era 7 · ADAM × ARC-AGI-3 · Phase 162

The first independent scorecard.

2026-05-09 → 2026-07-03 · 9 days · 3 jumps · 1 hybrid milestone · 1 closed loop closed

The Surgery / Frankenstellm chronicle (Eras 1–4) shipped a runtime; the doctrine era (Era 6) wrote the rules. Era 7 is the first independently scorecarded cognitive milestone — ADAM, the long-horizon program, climbing the public ARC-AGI-3 leaderboard from a cold entry at 22 / 183 to #1 published on both tracks in nine days. The signal is not the percentage. The signal is the closed loop: experience changed memory, memory changed procedure selection, procedure selection improved the next run.

Date	Score	Method delta	Why it moved
2026-05-09	`22 / 183 · 12.02 %`	Substrate + scoring loop only	First leaderboard entry. Pure self-play, no policy training, no source reading.
2026-05-14	`24 / 183 · 13.11 %`	+ Rudakov-style graph explorer	Richer trajectory expansion under the same scoring loop. Same day: hybrid harness reached 183 / 183 = 100.00 % (25/25 envs, 6 537 actions) and was scorecarded.
2026-07-03	`73 / 183 · 26.71 % · 3/25 env`	+ warmed `world_model` · + substrate-explore-fallback	Persistent substrate learning across runs. World_model carries causal_bias and delta_mv from prior attempts; fallback hands frontier control to `lg_quantum_think` when beam saturates.

Independent verification at arcprize.org/scorecards/6a5888ac-21e1-40b9-abac-5fecbe62cb42 — we do not control this URL. Compute: single consumer NVIDIA RTX 3060 Ti, 8 GB VRAM. No data centre, no cloud, no external API. The hybrid 183 / 183 included two human boss-level demonstrations for the hardest 2 levels (bp35 L8, wa30 L8), disclosed inside the scorecard. The autonomous 73 / 183 is the LLM-free, human-free figure.

primary references: folder/docs/ARC_AGI3_PHASE_162.md · folder/reports/ARC_AGI3_SUBSTRATE_CLIMB.md · public pages: /adam ARC section, /arc-agi-3 leaderboard

This page is the record.

The two program pages — SURGERY and FRANKENSTELLM — describe the lab and the organism as they exist today. This page describes the trajectory that produced them, day by day, with reverts, errata, and dead-ends still visible. Numbers without a date are slogans; numbers with a date are evidence.

Surgery — current state → Frankenstellm — the organism →

Every claim has a date.Every number has a file.