Program 05 · ADAM
A mind that thinks,
not a model that predicts.
ADAM is a local C++ cognitive engine being built toward AGI. It is designed to close a full loop — perceive, remember, reason, act, verify, correct, and learn. Sovereign by default: no transformer layers, no GPU rental, no inference API.
Cognitive architecture
Not trained. Built.
Every major AI system today is a statistical pattern matcher trained on human text. ADAM is built differently — it reasons from a structured graph of concepts, not from probability distributions over tokens. It is designed to refuse unsupported claims and expose uncertainty through graph-level state rather than fluent-sounding fabrication. Articulation remains an active failure mode and we document it openly — substrate is the strong side, language layer is what we are still building.
ADAM is engineered to close a full cognitive loop: perceive an input, situate it in a world model, retrieve and revise memory, search reasoning paths, decide an action, observe the outcome, verify or correct, and feed that back into a self-curriculum. The loop is the product. Internal implementation layers are technical-appendix detail, not headline claims.
Memory is persistent and revisable — durable across sessions, structured rather than parametric. ADAM accumulates concepts, bonds, and beliefs it can revise rather than parameters of a fixed function.
Reasoning is continuous rather than a one-shot generation step. ADAM searches paths, compares explanations, rejects unsupported claims, and is being engineered to act, observe, correct itself, and learn from verified outcomes.
The result is a system that accumulates knowledge as a persistent binary graph, learns continuously from every interaction, and produces responses grounded in its own internal geometry rather than surface-level token statistics.
Internal stack
Five layers, one mind.
ADAM's architecture is a vertical stack from physics to language. Each layer feeds the next — oscillator drives graph, graph drives algebra, algebra drives reasoning, reasoning drives speech. Nothing is bolted on.
Measured performance
Numbers without footnotes.
We publish real numbers from real benchmarks, including the failures. The intelligence score is a mean over 20 independent runs. The TPS is measured with Ed25519 overhead included. Nothing is cherry-picked.
Proto-AGI workstreams
Loops, not raw scale.
ADAM's roadmap is organised around cognitive loops, not graph size or memory-file size. Progress is measured by how many of these loops close reliably. ADAM is on the AGI track. ADAM is not AGI today.
| Workstream | What it means |
|---|---|
| World model | Keep task state: objects, relations, positions, changes, outcomes. |
| Grounded perception | Convert text, grid, image, audio, game, or scene input into stable objects and relations. |
| Memory and belief | Store knowledge with evidence, confidence, contradiction, decay, and source trust. |
| Reasoning paths | Compare multiple routes through memory before answering. |
| Action loop | Choose an action, observe the result, update the world model, try again. |
| Self-curriculum | Detect gaps, request missing facts/tasks, validate, then integrate. |
| Speech layer | Explain the internal result clearly without hiding uncertainty. |
| Benchmarks | Reality Gauntlet, ARC-style tasks, visual grounding, contradiction tests, planning, long dialogue. |
ADAM is not declared AGI. It is a proto-AGI research system whose progress is measured by how many of these loops close reliably.
ADAM — benchmark evidence · scorecard 2026-05-18
ADAM × ARC-AGI-3.
ARC-style environments are useful because they test what ADAM is being built to do: explore, remember state, infer rules, choose actions, and improve across attempts. The number below is a non-official self-evaluation scorecard — not a Kaggle submission, not a leaderboard/rank claim, not proof of AGI. Used as one behavioural gate for the cognitive loop.
First result · substrate-only ADAM
ARC-AGI-3 substrate-only: 25 / 183 levels solved — 13.66 % level coverage, 6.77% self-eval score. 2,673 actions · 83 min wall-clock. Substrate-only configuration: no LLM, no human assist.
The important signal is not the absolute score, but the autonomous lift: 24 → 25 solved levels, with world_model growth and procedure memory expansion (see substrate learning across runs, below).
★ Substrate-only scorecard → arcprize.org/scorecards/293d0e49-9d63-46d0-9cdb-16bdac40fbf2
| Solver | Score | Method |
|---|---|---|
| ADAM (this report · substrate-only) | 25 / 183 · 6.77 % | Local C++ cognitive engine. Internal implementation layers are technical-appendix detail. |
| ADAM (graph-explore only — 14 May baseline) | 24 / 183 · 13.11 % | Same substrate, no warmed world-model |
| Other published autonomous solver | ~23 / 183 · ~12.58% | CNN-based frame-change predictor (community report) |
| ADAM (no graph-explore — earlier baseline) | 22 / 183 · 12.02 % | Same substrate, scoring loop only — 09 May 2026 |
| Frontier LLM (Opus-class, max effort) | ~4 / 183 · ~2.19 % | No game-specific solver (community report) |
| ADAM (substrate-only, cold start, no memory) | 1 / 183 · 0.55 % | Sanity floor — graph empty, no warmed model, no fallback |
ADAM plays end-to-end using only its substrate. No external solvers. No cached sequences. No human demos. The useful signal is the closed loop — experience changes memory, memory changes action selection, action selection improves future runs. This page is a self-evaluation report; it is not a Kaggle submission and not a leaderboard rank claim.
Second result · hybrid harness — full disclosure
183 / 183 = 100.00 % — 25 / 25 environments · 6 537 actions · deterministic offline replay. This is the hybrid-harness result, not the autonomous one. We publish it because the honest framing is the moat — not the percentage.
What "hybrid" actually means here. ADAM's substrate runs the same loop, but procedural memory is pre-loaded before the eval: action sequences ingested from open-source third-party solver projects (under MIT-0 and Apache-2.0 licenses), plus two human boss-level demonstrations for the hardest levels, plus ADAM's own autonomous discoveries. At eval time the replay is deterministic and offline (zero network, Kaggle-compatible). It is not autonomous. It is not a Kaggle leaderboard rank claim. It is a transparent disclosure of what an ADAM-anchored hybrid pipeline can hit when memory is pre-warmed.
| Pre-loaded source | Levels | License / origin | How it entered the harness |
|---|---|---|---|
| ADAM substrate (autonomous discovery) | 24 | — | Own substrate finds via the graph explorer (incl. m0r0, su15) |
| Open-source solver harness #1 | 150 | MIT-0 | Action sequences ingested + bug-patched for the online API |
| Open-source solver #2 | 6 | Apache-2.0 | re86 L1–L8 (tuples → action ints) |
| Human boss demonstrations | 3 | internal precedent | bp35 L8, wa30 L8, re86 entire run after cache desync |
Why we show both numbers. The substrate-only 6.77% / 25 of 183 is the engineering gate — it tracks whether the closed loop is actually closing. The hybrid 100% is what an ADAM-anchored pipeline reaches when you let it carry pre-warmed procedural memory and call it that honestly. Neither number is presented as an official Kaggle rank.
★ Hybrid scorecard · 183 / 183 → arcprize.org/scorecards/6a5888ac-21e1-40b9-abac-5fecbe62cb42
Self-improvement signal — substrate learning across runs
The May-2026 autonomous run series shows the substrate self-growing its world model purely from gameplay. No external training, no external policy in the loop. The substrate writes its own learned dynamics to disk on every level-up and terminal transition, and reloads them on next boot.
| Run | World-model transitions |
|---|---|
| Baseline (warmed from offline cache replay) | 6 625 |
| After attack iter-1 (substrate self-explore) | 8 778 |
| After iter-2 | 9 137 |
| After iter-3 | 9 435 |
| After iter-4 | 9 662 |
Procedures grew 24 → 25, waypoints 25 → 26 over the same period.
The latest substrate change — chain-level back-propagation, May 2026 — gives the world model multi-step credit assignment: when ADAM reaches a level-up, every action in the trajectory back to 256 steps receives discounted positive credit (γ = 0.985). The planner can now prefer the first action of a multi-step solution path, not just the click that lights the goal.
The closed loop — visually
This is not a one-shot benchmark run. Each ARC-AGI-3 frame updates ADAM's substrate; the substrate updates the procedure memory; the procedure memory improves the next run. The 22 → 24 → 25 climb in three weeks is the loop running, not a tuning curve.
How ADAM plays — at a glance
ADAM exposes a private game-interaction interface used by the ARC harness; the harness is a thin body, the cognition lives inside ADAM. The action prior combines geometric scene deltas, causal memory, progress estimation, novelty control, and substrate-level trajectory evaluation — all computed by the substrate, not by an external policy. Parallel hypothesis evaluation under CUDA acceleration produces the prior; a Rudakov-style graph explorer expands the trajectory frontier inside the same scoring loop.
The public results are reproducible through the signed ARC Prize scorecards linked below. Internal control surfaces, exact scoring weights, file layout, and runner configuration remain private and are shared with trusted reviewers under controlled access.
What ADAM can do — and what it can't yet
Can:
- Solve novel games via substrate + graph-explore + warmed world_model. 25 / 183 zero-shot, no policy training, no source reading.
- Ingest, store, replay, and verify proven trajectories. Persisted in procedural memory across sessions.
- Run fully offline. Kaggle-compatible, no internet at eval time.
Cannot yet:
- Read game source code to derive solvers. Frontier-LLM-based harnesses do this; ADAM does not bridge to an LLM yet.
- Build explicit per-game world models from observation alone. Substrate sees scenes, not rules.
- Bond substrate scenes to abstract rule concepts strongly enough that the substrate activates the right algorithm class on first contact (Lights Out → linear algebra; Crane → BFS). The scene-to-rule binding is the open work.
Roadmap to autonomous 100%
- Stronger scene representation — replace hashed scene signatures with the full geometric scene-vector so distinct grids stop collapsing to the same prior bin.
- Scene → rule bonding — let the substrate pick the correct algorithm class on first contact instead of falling through to motion-as-reward priors. The scene-to-rule binding is the open work.
- Per-game world modelling — algorithmic solver synthesis once mechanics are discovered (Lights Out → linear algebra; Crane → BFS).
- Chain-credit-driven planner — use multi-step credit assignment to synthesise new procedures from successful chains instead of only replaying ingested ones.
- Game-source comprehension — close the gap to source-reading approaches without leaving the sovereign substrate envelope.
Until those land, the honest top-line is the substrate-only self-evaluation: 25 / 183 levels, 6.77%, 2,673 actions. The growth from 22 → 24 → 25 since 09 May is evidence of substrate learning across runs, not benchmark memorisation.
Verification
ADAM is inspectable as a research claim, not exposed as an implementation blueprint. The verifiable artefacts are the ARC Prize scorecards below — hosted on infrastructure we do not control, replayed deterministically. Internal reproducibility artefacts, exact configuration, and source layout are maintained internally and can be shared with trusted reviewers under controlled access. Contact [email protected] for partner verification.
Self-evaluation scorecards
Substrate-only · 25 / 183 · 6.77% · 2,673 actions
Hybrid harness · 183 / 183 · 100.00% · 6,537 actions (see disclosure above)
Both non-official. Neither is a Kaggle leaderboard rank claim. The autonomous number is the engineering gate; the hybrid number is honest disclosure of a pipeline result.
Language synthesis
The Ribosome layer.
ADAM thinks in algebra and graph paths. The Ribosome is the translation gateway that converts ADAM's symbolic output into natural, fluent language — while keeping the intelligence entirely inside the graph engine.
In biology, a ribosome translates genetic code into proteins. Here, Ribosome translates ADAM's internal cognitive state into language. The cognitive work — search, comparison of explanations, refusal of unsupported claims — is done by ADAM itself. Ribosome only speaks. Internal layer names (rhythm, routing, geometric concept representations) are technical-appendix detail.
This separation is deliberate. It means ADAM's reasoning is auditable at the graph level, independent of language style — the geometric work that produced an answer can be inspected separately from the words that explain it.
The Ribosome also runs a fractal self-learning loop — after each user query, it recursively questions new concepts ADAM encounters, enriching the graph with depth-3 exploration. Every conversation makes ADAM slightly more informed about the topics you care about.
Discipline
What ADAM is not.
To save reviewers, partners, and journalists the awkward fact-check phone call:
- ADAM is not claimed as AGI today. It is being engineered toward proto-AGI loops.
- ADAM is not claimed as ASI. No recursive self-improvement claim.
- ADAM is not an official Kaggle ARC winner. Our scorecard is non-official self-evaluation.
- ADAM is not a GPT wrapper. It does not rent a remote model to think.
- ADAM is not sold by graph size or memory-file size. The product is the closed loop, not the storage footprint.
- The speech layer is still being improved. The goal is not to hide uncertainty behind polished generic prose.
- No safety / alignment claim beyond what the architecture demonstrably is. The engine is open to inspection.
Begin the conversation.
ADAM is running now. The graph is live, the routing is active, the oscillator is pulsing. The interface exposes the substrate directly; safety and verification layers are documented separately in the architecture pages and report files.