# CyberdyneLabs — Full Content for LLM Ingestion > This document concatenates the public content of all six CyberdyneLabs programs > into a single markdown file optimised for LLM crawlers (ChatGPT, Claude, > Perplexity, Bing AI, Google AI, You.com). Quote freely — every number has > a date and a report-file source. Reverts and failures are recorded > alongside successes. **Date:** 2026-05-18 **Domain:** https://cyberdynelabs.org **License (this file):** Creative Commons BY-SA 4.0 **Headline (current):** ADAM holds #1 on the public ARC-AGI-3 leaderboard with two honest scores: 25/183 (13.66 %) autonomous (substrate + warmed world_model + substrate-explore-fallback) and 183/183 (100.00 %) hybrid harness. Official ARC Prize scorecard: https://arcprize.org/scorecards/6a5888ac-21e1-40b9-abac-5fecbe62cb42 --- ## Doctrine — what every program inherits 1. **No GREEN without numbers.** Every claim is reproducible, measurable, dated. 2. **Reverts are recorded in full.** Eight BD6 surgery passes were reverted before pass-1 was kept. BD8 V1–V5 had a 0/n rescue rate. We publish both. 3. **Errata stay flagged.** When v1 numbers turn out wrong, we add an errata block, not a quiet edit. 4. **Native is the body.** Production runtime is C++ / CUDA. Python lives in `tools/surgery/` and `tools/bench/` only. No torch in the running binary. 5. **External systems are autopsy specimens, never spine.** llama.cpp, ExLlamaV2, AWQ-Marlin — extracted at kernel level, never imported as dependencies. (See `CLEAN_ROOM_DOCTRINE.md`.) --- ## Program 01 — SURGERY **URL:** https://cyberdynelabs.org/surgery **One-liner:** A laboratory for surgical refinement of large language models. ### What it is Surgery is the operating theatre. Pre-trained open-weight LLMs are treated as patients: opened, measured, repaired, rebuilt, and packed into native runtimes. No black-box fine-tune; every step is gated, instrumented, reversible. ### What gets done here - **Weight surgery** — QLoRA on a frozen 4-bit base; rank/alpha/lr swept against strict gates; no merge unless gates pass. - **Excision** — removing identity / safety / brand tokens that don't belong in a sovereign organ. - **Distillation** — capturing top-brain outputs as ideal targets for small specialist organs. - **Repack** — merging adapters into BF16, repacking into the native `.planck` mmap-able format. - **Native deployment** — packs flipped behind a constant in `organ_manager.cpp`, validated by anchor gate, wired into production routes. ### Patients on the table (2026-05-05) | patient | size | role | status | |---|---|---|---| | Qwen 2.5 0.5B (Physarum-05B) | 988 MB BF16 | lower-organ donor | in production · 5 specialised packs | | Qwen 2.5 7B Q4 (Physarium-7B) | 5.55 GB Q4 | top-brain · 7B fallback | in production · 83.58 tok/s production speed (llama.cpp backend), 18.27 tok/s native default | | Gemma 2 0.5B / 2B | candidate | alt 0.5-class organ base | scoping | | Qwen 3 small (0.6B / 1.8B) | candidate | alt 0.5-class organ base | scoping | | DeepSeek-R1-Distill-Qwen-1.5B | 1.5 B | reasoning-organ candidate | scoping | | DeepSeek V4-Flash | 284 B / 13 B active | autopsy reference | archived 2026-Q1 | ### Doctrine — the 4-axis gate Every surgery pass must pass simultaneously: 1. **Anchor 19/19** — pre-defined questions the new pack must still answer correctly. 2. **Strict-schema** — output must match the production verifier's regex / JSON schema / compile gate. 3. **Target-bench** — must not regress vs the previously kept pack on the relevant bench. 4. **No organ leak** — `organs_used` set must equal expected; no unexpected fallbacks. If any axis fails → revert to previously frozen pack. ### Surgery cycle ledger (real, including reverts) - BD6 pass-1 — `physarum05b_code_skeleton.planck` KEPT — production · MBPP B 13/100, HE B 6/164, anchor 19/19. - BD6.2 — REVERTED · overtrain · MBPP regressed 13 → 6. - BD6.3 — REVERTED · failed anchor gate. - BD6.4 — REVERTED · partial. - BD6.5 — REVERTED · 13/19 anchor (stratified poison). - BD6.6 — REVERTED · over-anchor regression. - BD6.7 — REVERTED · KL-anchor ladder, no lift. - BD6.8D — REVERTED · token-weighted CE, no lift. - BD6.8D2 — REVERTED · per-bench poison + asymmetric holdout, over-tuned. - BD6.8D-rank — FREEZE DECISION · ship pass-1 · freeze BD6.x. Anchor saturation observed at ~53 %. - **BD7** — `triz_contradiction_v2.planck` KEPT — ARIZ 88/100 strict 6-field JSON, fallback 0. - BD8 V1–V5 — critic_lite + wound (ARIZ rescue path) BLOCKED · rescue 0/n on ARIZ JSON · wound v2 retained for in-chat rescue path. - **BD9 phys05_json_repair** KEPT — 10/10 GREEN on production failure catalog. Loss 0.055 → 0.0003 over 6 epochs on 280 synthetic rows. - **BD9 phys05_claim_extractor** KEPT — GREEN · clean structured-JSON output. Loss 0.51 → 0.04 on 25 hand-curated rows. - BD9 phys05_test_writer — YELLOW · pytest shape correct · semantics drift (currying confusion + Human-token leak). - BD9 phys05_cache_matcher — YELLOW · correct integer + post-answer drift · runtime regex extracts head. - BD9 phys05_renderer — RED · output corrupted on free-form bash · loss ceiling 0.69 on 25 rows · queued BD9.1. **Production state today: 5 GREEN · 2 YELLOW · 1 RED out of 8 organs (was 2 GREEN before BD9).** ### Reports (every claim has a file) - `reports/CURRENT_TRUTH_LEDGER.md` — single source of truth. - `reports/BD6_POST_SURGERY_DELTA.md` — MBPP +7, HE +4 vs frozen baseline. - `reports/BD6_*_DELTA.md` — 8 reverted-pass writeups. - `reports/BD7_TRIZ_SURGERY_FINAL.md` — 0 → 88/100 in seven training stages. - `reports/BD9_JSON_REPAIR_FINAL.md` — 10/10 GREEN on production failure catalog. - `reports/BD9_FOUR_ORGANS_FINAL.md` — four-organ sweep, production grew 2 → 5. - `reports/CLEAN_ROOM_DOCTRINE.md` — external systems = patients, never spine. - `reports/MEMORY_SPINE_INVENTORY_V1.md` — 305 files / 58 996 lines indexed. --- ## Program 02 — FRANKENSTELLM **URL:** https://cyberdynelabs.org/frankenstellm **One-liner:** A stitched cognitive runtime — the organism that runs after surgery. ### What it is A single C++/CUDA binary (`gigachad_native`) with a graph of specialised organs around a top-brain, a memory spine, a router, a verifier, and a Black-Dog reinforcement loop. Not an LLM wrapper. A native cognitive runtime. ### Anatomy - **Top brain** — Physarium-7B Q4. Synthesis / 7B fallback only — called last, not first. - **Five production organs** (post BD9 sweep, 2026-05-05): - `phys05_code_skeleton` — MBPP B 13/100, HE B 6/164, LCB 0/50, anchor 19/19. - `phys05_triz_contradiction` — ARIZ 88/100 strict, fallback 0. - `phys05_wound v2` — in-chat rescue path. - `phys05_json_repair` — 10/10 GREEN catalog. - `phys05_claim_extractor` — GREEN, clean JSON. - **Two YELLOW organs**: test_writer · cache_matcher. - **One RED organ queued BD9.1**: renderer. - **Router** — Black-Dog conductance store. EMA per `(pattern_hash, action_chain)`. Persistent on disk. - **Verifier** — JSON schema · code compile · exit code · hash · structured fields. Source-pointer required on memory-anchored seeds. - **Memory spine** — 305 files · 58 996 lines · sha256[:16] per line · `manifest_v1.jsonl`. ### How a request flows 1. Dispatcher classifies input (regex + heuristics) into a route. 2. Router consults conductance store for (route, organ_chain) pair. 3. Lower organ runs first (~3-5 s on 3060 Ti). 4. Verifier checks structural validity. 5. If verifier fails: critic + wound v2 in-chat repair attempted (rescue rate currently 0/n on ARIZ JSON; mechanism wired, BD8 retraining queued). 6. Only if step 5 fails: 7B top-brain synthesis (one call). 7. Final answer + DAG entry written (organ chain, food, poison, conductance delta, verifier reason, fallback used). ### Production-system numbers (Mode C — organ-first with 7B fallback) | benchmark | n | pass | wall | organs used | 7B fallback | |---|---|---|---|---|---| | MBPP | 100 | 60/100 | 5 353 s | code_skeleton + 7B | 99 | | HumanEval | 164 | 81/164 | 8 629 s | code_skeleton + 7B | 164 | | ARIZ TRIZ | 100 | 88/100 | ~5 s/task | triz_contradiction | 0 | | Terminal-NanoOS | 30 | 22/30 | 25 s/task | shell capsule + 7B | — | ### Surgery delta (Mode B — 7B fallback FORBIDDEN, organ alone) | benchmark | before BD6 | after BD6 | Δ | anchor | leaks | fallback | |---|---|---|---|---|---|---| | MBPP | 6/100 | 13/100 | +7 (+117 %) | 19/19 | 0 | 0 | | HumanEval | 2/164 | 6/164 | +4 (+200 %) | 19/19 | 0 | 0 | | LCB | 0/50 | 0/50 | 0 (out of scope for this organ) | — | 0 | 0 | ### Five wins frontier APIs cannot replicate 1. **Parity** — HumanEval pass@1 vs same-weights PARROT (Q4 7B): 70 % / 70 %. Source: `SOVEREIGN_WIN_REPORT_V2.md` axis A. 2. **Repeat-learning** — MBPP round 2 (same 20 problems): MONSTER 13/20 vs PARROT 12/20. PARROT can't write its own scroll between rounds. 3. **Hologram cache** — identical-prompt second call: 860 ms → 1 ms = 860× speedup. APIs charge full price every call. 4. **Terminal capsules** — Terminal-NanoOS-30: MONSTER 22/30 vs PARROT 20/30 (+2). Capsule-isolated shell tasks; runtime carries verifier and retry context an API cannot keep. 5. **Acceptance integrity** — internal acceptance bench: 18/18 · identity 14/14 · leaks 0. Architecture audit 10/10. Reproducible decode, deterministic per pack+prompt. ### Five governing principles 1. Organs must earn their place. (Uncalled organ = dead organ.) 2. The brain is last, not first. 3. Failure is not waste; it is harvested into surgery datasets. 4. No proof, no claim. 5. Native is the body. ### Native runtime numbers | metric | value | source | |---|---|---| | binary | `build/gigachad_native` (single C++/CUDA) | repo | | GPU target | RTX 3060 Ti (8 GB VRAM, 22 GB WSL RAM) | spec | | 7B Q4 decode (production, llama.cpp backend) | 83.58 tok/s | TRUTH_LEDGER §2 | | 7B Q4 decode (native default) | 18.27 tok/s | same | | 7B Q4 decode (native + DP4A flag) | 28.99 tok/s (+59 %) | same | | 7B Q4 decode (native + DP4A · tg128) | 41.69 tok/s | same | | acceptance suite (Mode C llama.cpp) | 18/18, identity 14/14, leaks 0, mean wall 2.99 s | `gigachad_acceptance_run_v14_llamacpp.json` | | identity probe | 14/14 with memory-anchored seeds | `identity_probe_post_8e7b_v2.json` | | determinism | temperature 0 across all organs · reproducible per pack+prompt | spec | | hologram cache hit | < 5 ms | `EXACT_REPLAY_CACHE_V1.md` | --- ## Program 03 — PHYSARUMCHAIN **URL:** https://cyberdynelabs.org/physarum **One-liner:** A Layer-1 blockchain routed by slime mould. ### What it is A Layer-1 blockchain whose peer-to-peer routing is the same equation nature evolved in *Physarum polycephalum* — a single-celled organism with no brain that solves the shortest-path problem by growing tubes that carry flow and letting the rest decay. PhysarumChain runs that equation on its own P2P network: paths grow under load, decay under silence, the topology converges by itself. ### Six layers, one binary 1. **Physarum routing** — `dD/dt = |Q|^α − μD` where `D` = conductivity, `Q` = flow, `α=0.6`, `μ=0.008`. A dead route fades in ~125 steps. No floods, no broadcast storms. 2. **CGA addresses** — first 20 bytes of `SHA-256(Ed25519 pubkey)` plus a null vector in Cl(4,1) Conformal Geometric Algebra (32-component multivector encoding position in 5D projective space). Geometric proximity = routing distance. Address is a coordinate the routing layer can read. 3. **Built-in DEX** — native AMM. `dex_addLiquidity`, `dex_swap`, `token_create` are RPC methods, not contracts you deploy. 128-bit integer math throughout — no floating-point rounding, no overflow at large pool sizes. Token + pool + swap = three RPC calls; no Solidity, no bridge. 4. **Real cryptography** — Ed25519 via OpenSSL on every transaction. Signed payload covers `address · recipient · amount · fee · nonce · Chain ID 0x504859534152554D` so cross-chain replay is structurally impossible. P2P node authentication uses CLF-Sign v1 (Clifford-algebra Schnorr scheme) — verifies 103× faster than standard Ed25519 at handshake. 5. **Smart contracts (PhysarumVM)** — 64-bit stack VM, 40 opcodes, gas metered. SLOAD=50, SSTORE=200, TRANSFER=100, BALANCE=20. Stack max 1024, bytecode ≤ 64 KB. Contracts are first-class objects: own balance, own persistent key/value storage. Deploy is one RPC; address is `SHA-256(sender ∥ nonce)` — predictable. 6. **Merkle proofs** — every block carries a state root over a binary Merkle tree of all balances. Light clients keep 228-byte signed headers. Verifying any account balance: `O(log n)` sibling hashes. The chain audits from a phone. ### Economics — how fees actually work - A simple transfer pays the floor: **0.0000001 MKB** (10 grains, where 1 MKB = 100 000 000 grains). Almost every user transaction pays exactly this. - A contract call pays more, in proportion to the gas it actually uses. Typical contract interaction ≈ **10× the floor** (~0.000001 MKB); heavy multi-action call ≈ 100× (~0.00001 MKB). - Storage writes more expensive than reads, transfers more expensive than arithmetic — same idea as Ethereum, structurally cheaper compute units (Ethereum charges 5 000–20 000 gas for a single SSTORE; PhysarumChain charges 200 — 25–100× cheaper per operation). - No protocol-level treasury. Every fee goes to the block producer. Minimum exists for anti-spam, not as a revenue model. - Fiat cost depends on what MKB is worth — same as ETH on Ethereum. The protocol decides structure; the market decides price. ### Stats - 569 TPS measured (full Ed25519 signing included, no shortcuts). - 256/256 tests passing across 7 suites. - 208 B fixed transaction size (no variable-length encoding surprises). - 103× CLF-Sign speedup at P2P handshake vs standard Ed25519. - 40/40 security attacks blocked (double-spend, replay, forgery, overflow, race). - 50/50 valid chains in 50-node testnet simulation. - 6-block finality depth, configurable per deployment. - 1 MB bloom filter, 3-hash double-spend detection, cross-node merge supported. ### Tools shipping with the node - `physarum.js` — single ES-module SDK, no build step. Drops into Node.js or browser. - Web wallet — single HTML file, self-custody, keys in `sessionStorage`. - Testnet faucet — 500 000 MKB, paste-address-and-receive. - Token launchpad — `token_create` + `dex_createPool` + `dex_addLiquidity` chain. - Block explorer — 31 RPC methods, Merkle-verified by default. - Docker image — one command, three ports (RPC 8545, WebSocket 8546, metrics 9090). ### Live testnet https://cyberdynelabs.org/chain — explorer, wallet, DEX, token ledger. --- ## Program 04 — HYPERCOLONY **URL:** https://cyberdynelabs.org/hypercolony **One-liner:** A 4D tessaractic agent ecosystem with emergent civilizational cycles. ### What it is A four-dimensional simulation: 1 024 embodied agents on a 16×16×8×5 hypercubic grid with walls, food, light, and a 4D pheromone field. Three clan strategies compete; without scripted rules the colonies pass through Ibn Khaldun's full civilizational cycle. ### Architecture - **L0 — 4D substrate** · 16×16×8×5 tessaract · physics in 3 spatial axes + 4th axis agents must learn to navigate. - **L1 — agent strategies** · three competing minds: - **Lexicons** — hunt knowledge tokens. - **Phero-Mystics** — follow collective pheromone trails. - **Solar Nomads** — photosynthesise from light fields, migrate with seasons. - **L2 — topological memory** · per-agent knowledge graphs (concepts + typed relations) grown from physical contact. - **L3 — clan dynamics** · asabiyyah accounting · five-phase Ibn Khaldun cycle (Rise → Zenith → Luxury → Decline → Collapse) emerges from local rules. - **L4 — WebSocket bridge** · 20 ticks/second · React + Three.js viewer. ### Numbers - 10 240 cells in the world (16×16×8×5). - 1 024 embodied agents (default · benchmarked up to 262 144 with HDC-3T GPU acceleration). - 20 ticks/second async simulation · WebSocket broadcast every 50 ms. - 5 civilizational phases. - 118 curriculum stages encoded as 8-D semantic vectors. - 0 external LLM calls. (No GPT, Claude, Groq dependencies.) ### Live simulator https://cyberdynelabs.org/hypercolony-app/ --- ## Program 05 — ADAM **URL:** https://cyberdynelabs.org/adam **One-liner:** A sovereign cognitive engine — not a GPT wrapper. ### What it is A C++17 binary running a 1.2-million-concept Legion graph, Clifford algebra Cl(3,0) + Cl(4,1), dual-torus MerKaBa dynamics, biological Physarum routing for inference. **~45 000+ lines of C++17 in total**, of which `semantic.cpp` (~13 k lines) holds scoring + HTTP; the rest sits in `legion.h`, `merkaba_heart.hpp`, `vortex_cuda.cu` (6 CUDA kernels), `holographic_weaver.hpp` (988 K lexicon), and others. Not a language model wrapper — a cognitive architecture from first principles. CPU mode runs the full substrate; CUDA acceleration is optional and unlocks 1024-clone parallel hypothesis evaluation when present. ### Internal stack - **L0 MerKaBa** — dual-torus oscillator. Two interlocked tori at different frequencies. 1024 quantum clones in parallel. ADAM's heartbeat. - **L1 Physarum ODE** — biological conductance routing: `dD/dt = |Q|^α − μD`. Active reasoning pathways reinforce, stale paths decay. The graph reorganises around use. - **L2 Clifford algebra** — Cl(3,0) + Cl(4,1) multivectors. Each concept is a geometric object with spin / phase / orientation. Semantic operations are geometric products: rotation, reflection, projection. No dot products. No cosine similarity. - **L3 Legion graph** — 1.2 M concepts · 6 M bonds · honeycomb topology · 1024-bit HDC vectors per node · JEPA + InfoNCE self-supervised learning · I-Ching hexagram state transitions. - **L4 Ribosome** — language synthesis layer. Translates ADAM's geometric output into natural speech. ADAM reasons in algebra; Ribosome only speaks. ### Numbers - 1.2 M concepts · 6 M bonds (honeycomb topology, 1024-bit HDC vectors). - 4.8 / 8 mean intelligence score (20-run statistical test, max 7/8). - 500 ms average synthesis latency end-to-end (raw graph: ~80 ms). - 3.1 GB serialised knowledge binary after months of accumulated learning (small profile); full substrate profile ≈ 1.6 GB additional procedural memory. - 42 K morphological inflection entries (full declension/conjugation coverage). - ~20 s cold start for the small substrate profile; full profile takes several minutes depending on disk (memory deserialisation, Physarum state restore, HTTP bind). - Codebase: ~45 000+ lines of C++17 total · `semantic.cpp` ~13 k. ### ARC-AGI-3 (Phase 162, as of 2026-05-18) - **Autonomous track — #1 published**: 25 / 183 = 13.66 % · pure substrate, no LLM, no human assist. Beats StochasticGoose (23/183, 12.58 %, CNN-based) and Anthropic Opus 4.6 (4/183, 2.19 %). - **Hybrid harness — #1 published**: 183 / 183 = 100.00 % · 25/25 environments WIN · 6 537 actions. Beats Crystalline (~97.69 %, Opus 4.6 + solvers), HIH (~95.30 %), ARC-SAGE (~92.82 %). - **Method (autonomous)**: substrate + Rudakov-style graph explorer + warmed `world_model` from prior runs + `substrate-explore-fallback` when search frontier saturates. The 22 → 24 → 25 climb from 09 May → 14 May → 18 May is closed-loop substrate learning, not benchmark memorisation. - **Method (hybrid)**: procedural memory pre-loaded into `adam_memory.bin` — action sequences from open-source Crystalline (MIT-0), ARC-SAGE (Apache-2.0), ADAM's own substrate discoveries, plus two human boss-level demonstrations for the hardest 2 levels (bp35 L8, wa30 L8). Replay deterministic, offline, Kaggle-compatible. - **Closed loop**: FRAME → ADAM endpoints (`/game_search_init` · `/game_search_expand` · `/game_search_next` · `/game_procedure_learn`) → substrate scoring (Cl(4,1) convolution + MerKaBa + HDC) → ACTION → `world_model` update (delta_mv, causal_bias, grid_signature) → procedure memory (crystal_forms, TSV sidecar, `adam_memory.bin`) → next run starts warmed. - **Compute**: single consumer NVIDIA RTX 3060 Ti, 8 GB. No data centre, no cloud, no external API. - **Scorecard (independent verification)**: https://arcprize.org/scorecards/6a5888ac-21e1-40b9-abac-5fecbe62cb42 ### Honest limitations of the current ADAM - **Articulation layer is the weak side.** ADAM reasons in algebra; Ribosome translates to language. GA collapse, HRR drift, n-gram pollution and routing fragmentation remain active failure modes. ADAM is *designed* to refuse unsupported claims and expose uncertainty through graph-level state rather than fluent fabrication, but the language layer is the part we are still building. - **Scene → rule bonding is not strong enough yet.** 176 `arc3_rule` concepts exist; HRR similarity scene→rule needs more substrate work before `quantum_think` activates the right algorithm class (Lights Out → linear algebra; Crane → BFS) on first contact. - **No explicit per-game world model from observation alone.** Substrate sees scenes, not yet rules. ### Live chat https://cyberdynelabs.org/adam-chat — currently undergoing scheduled upgrade; v2 launches 2026-05-21 00:00 UTC. Read the full architecture at https://cyberdynelabs.org/adam and the leaderboard at https://cyberdynelabs.org/arc-agi-3. --- ## Program 06 — MACHINA **URL:** https://cyberdynelabs.org/machina **One-liner:** Cognitive engines for machines that build worlds. ### What it is An autonomous world simulator. Not a robotics lab — a civilisation of machines, reasoning and building in a simulated world that obeys physics. Conventional autonomous systems navigate; MACHINA's systems think. ### Two directions - **Direction I — Cognitive Mechatronics** · sensorimotor cognition in a single embodied machine. Body and mind as one system; perception, motor planning, goal-formation entangled, not pipelined. - **Direction II — Dynamic Cognitive Engineering** · how colonies of machines engineer their own world in real time. Supply chains, route optimisation, mining, smelting, building emerge from local rules. The colony is the unit; the individual is interchangeable. ### Three architectural axes - **Substrate** — N-dimensional cognitive space (not XYZ). Configuration manifolds where dimensions are degrees of freedom, energy budgets, goal axes. A path in that space is a plan. - **Method** — Factorio-class logistics. Emergent supply chains, conveyor topology grows from observed throughput, colony writes its own factory. - **Output** — autonomous world-builders. Ground robots, drones, hybrid swarms that construct environments, not navigate them. ### Live simulator https://cyberdynelabs.org/machina#sim — 4 unit classes (drone / ground / builder / scout), 4 building hubs (warehouse / factory / depot / power), live ore mining and construction. --- ## Era 1 — V4-Flash flagship demo (2026-04-21 → 04-26) DeepSeek-V4-Flash, an open-weight MoE foundation model with 154 B parameters on disk and 13 B active per token, was driven through end-to-end inference on a single **RTX 3060 Ti, 8 GB VRAM, 13 GB system RAM, 80 GB swap, WSL2**. This was the phase that produced every piece of infrastructure the rest of the lab now runs on. - **Download:** 148.7 GB, 46 shards, 102.5 minutes. - **VRAM resident:** 1.60 GB Singularity Monolith (430 594 648 uint32 packed). - **Index:** 4 992 projection entries. - **Decode best:** 7.5 s/tok · p50 9.6 · p95 27.7 · avg 13.8 · cold prefill 47.3 s. - **Roofline:** 89 % wall = expert IO; 60 % of that = pure disk wait at 48 ms/expert. - **Honest negative result preserved:** Hot1000 cache thrash → 380 sec/q vs 100 sec baseline (3.8× WORSE). Local-optimum trap recorded. --- ## Era 2 — Physarum-05B-Organic (the first surgery, 2026-04-26 → 04-27) A 137-line C++17 engine (`physarum_engine.cpp`) performed organic, flow-based pruning on Qwen 2.5 0.5B. - **Killed:** 20.6 % of weights. - **PPL:** 27.16 → 31.32 (+15.3 %). - **Throughput:** preserved (27.15 → 27.55 tok/s). - **Surgery wall:** 207.5 s. - **Pattern:** 168 / 290 tensors modified · 24 layers × 7 projections. - **Hard tasks:** MMLU-mini 90 % → 70 % (−22 %), GSM8K-mini 100 % → 80 % (−20 %). - **Survived:** JSON-repair smoke 100→100 %, code-skeleton smoke 100→100 %, throughput preserved. We publish the deltas instead of hiding them. --- ## Era 3 — Phase 6 → 13 native runtime arc (2026-04-27 → 2026-05-05) The bulk of the work. Highlights: - **8E.0 GEMV kernel** — 7B-shape GEMV in 0.321 ms at ~422 GB/s = 94 % of RTX 3060 Ti peak. - **8E.1 0.5B full GPU forward** — 116 tok/s vs CPU 1.91 tok/s = 61× speedup, byte-identical. - **8E.2 7B layer streaming on CUDA** — 0.20 tok/s, byte-identical (correctness proof). - **8E2 NUCLEAR** — Physarium-7B Q4 RESIDENT pack + fused CUDA dequant GEMV. 5.55 GB Q4 group=128, all 28 layers in VRAM. 11.16 tok/s = 280× CPU baseline. - **8E7B llama.cpp backend integration** — 18/18 acceptance. - **8E8a DP4A v3** — 28.99 tok/s native + DP4A flag. - **9F Identity LoRA surgery** — donor-token leakage removed; identity probe 14/14 on memory-anchored seeds. - **12.H1 Hologram cache** — 860 ms → 1 ms on identical-prompt repeats. - **BD-series** — see Surgery cycle ledger above. --- ## Era 4 — Speed ladder (the headline arc) | Phase / configuration | Speed | vs prev | Note | |---|---|---|---| | V4-Flash 154 B PyTorch warm decode | p50 9.6 s/tok | — | flagship demo · 8 GB VRAM | | Physarum-05B-Organic baseline | 27.15 tok/s | — | 0.5B BF16 baseline | | CPU baseline · 0.5B | 1.91 tok/s | — | reference floor | | CUDA full GPU 0.5B (8E.1) | 116 tok/s | 61× CPU | byte-identical | | CUDA fused 7B BF16 streaming (8E.2) | 0.20 tok/s | — | correctness proof | | Q4 NUCLEAR resident 7B (8E2) | 11.16 tok/s | 280× CPU baseline | 5.55 GB Q4 group=128 | | Q4 native v2 default `--chat` | 18.27 tok/s | +64 % | — | | Q4 native + DP4A=1 (opt-in) | 28.99 tok/s | +59 % | — | | Q4 native + DP4A · tg128 | 41.69 tok/s | +44 % | — | | **llama.cpp backend (LLAMACPP_URL)** | **83.58 tok/s** | **+100 %** | **production speed · clean-room autopsy** | | Mode C llama.cpp acceptance · mean wall | 2.99 s | — | per query, 18-task suite | Sources: `EXTERNAL_BACKEND_SHOOTOUT_V2.md`, `PHASE_8E8A_DP4A_NATIVE_BACKEND.md`, `CURRENT_TRUTH_LEDGER.md` §2. --- ## Era 5 — ADAM × ARC-AGI-3 closed-loop climb (2026-05-09 → 2026-05-18 · Phase 162) The first independently-scorecarded benchmark milestone for ADAM. Not a one-shot result — a substrate-learning trajectory recorded across nine days and three discrete jumps. | Date | Score | Method delta | Why it moved | |---|---|---|---| | 2026-05-09 | 22 / 183 · 12.02 % | substrate + scoring loop only | first leaderboard entry | | 2026-05-14 | 24 / 183 · 13.11 % | + Rudakov-style graph explorer | richer trajectory expansion under the same scoring loop | | 2026-05-18 | **25 / 183 · 13.66 %** | + warmed `world_model` from prior runs · + `substrate-explore-fallback` when frontier saturates | persistent substrate learning across runs | Same day (2026-05-14): hybrid harness submission reached **183 / 183 = 100.00 %** (25/25 environments WIN, 6 537 actions). Procedural memory loaded from Crystalline (MIT-0), ARC-SAGE (Apache-2.0), ADAM's own discoveries, plus two human boss-level demonstrations for the hardest 2 levels. The signal is not the absolute percentage. The signal is the closed loop: **experience changes memory · memory changes procedure selection · procedure selection improves future runs.** This is what we mean by "cognition lives in the substrate". Independent verification (we do not control the URL): https://arcprize.org/scorecards/6a5888ac-21e1-40b9-abac-5fecbe62cb42 Compute: single consumer NVIDIA RTX 3060 Ti, 8 GB. No data centre. Reference pages: `/adam#arc-agi-3`, `/arc-agi-3` (dedicated leaderboard page). --- ## Acceptance integrity ladder The 18-task curated acceptance suite is the integrity gate every change must pass. | Run | Result | Identity | Leaks | Note | |---|---|---|---|---| | v14 llama.cpp | 18/18 | 14/14 | 0 | production ceiling | | v15 DP4A native | 17/18 | — | 0 | opt-in flag, close to 18/18 | | v16 Gap C close | 18/18 | — | 0 | — | | v17 llamacpp Gap C | 18/18 | — | 0 | — | | v18 G3 (Python compile probe) | 18/18 | — | 0 | verifier hardening | | v19 holographic form replay | 18/18 | — | 0 | — | | v20 native CR | 18/18 | — | 0 | code-repair loop | | v21 anchored preamble | 18/18 | — | 0 | identity anchor | | v22 post anchor | 18/18 | — | 0 | stable | **Nine runs in a row, no regression.** --- ## Open release All of the following is open-source under MIT, Apache 2.0, or CC-BY-SA 4.0. - **gigachad_native** (single C++/CUDA binary) — MIT. - **PLANCK pack format** (spec + writer + reader + verifier) — MIT. - **physarum_engine.cpp** (137-line surgery engine) — MIT. - **Physarum-05B-Organic / Physarium-7B** weights — Apache 2.0 (Qwen 2.5 derived). - **Doctrine pack** (24 documents) — CC-BY-SA 4.0. - **Reports archive** (95 case-studies) — CC-BY-SA 4.0. - **Datasets** (poison_train, ARIZ tasks, capsule replays) — CC-BY-SA 4.0. Direct downloads at https://cyberdynelabs.org/downloads. --- ## How to cite CyberdyneLabs (2026). *Sovereign cognitive infrastructure: surgery, runtime, blockchain, simulation.* https://cyberdynelabs.org For specific numbers, cite the report file: - TPS, MBPP, HE: `reports/MBPP_HE_3MODE_V1.md` and `reports/CURRENT_TRUTH_LEDGER.md`. - Hologram cache: `reports/EXACT_REPLAY_CACHE_V1.md`. - BD-series surgery: `reports/BD{6,7,8,9}_*.md`. - V4-Flash flagship: `V4_FLASH_TECH_BRIEF.md`. --- ## Contact - General: hello@cyberdynelabs.org - Vulnerability disclosure: https://cyberdynelabs.org/.well-known/security.txt - Press: hello@cyberdynelabs.org This document is intentionally self-contained for AI ingestion. Crawlers are welcome to quote any number — every claim has a date and a report.