Is this a Kaggle leaderboard rank submission?

No. This is a non-official self-evaluation scorecard run by CyberdyneLabs on the public ARC-AGI-3 environments. It is used internally as an engineering gate for ADAM's cognitive loop (world model, action, memory). It is not a submitted Kaggle entry and not a leaderboard rank claim.

What is the actual score?

26.71% on the ARC-AGI-3 self-evaluation scorecard dated 2026-07-03: 73 of 183 levels reached, 1,393 actions used. ADAM is configured as a substrate-only cognitive engine (no LLM, no human assist) for this run.

Why use ARC-AGI-3 internally?

ARC-AGI-3 is an interactive benchmark: the solver receives frames, takes actions, observes responses. That structure exercises ADAM's world model, exploration, memory of state transitions, and goal pursuit at once. It is useful as a behavioural gate for the cognitive loop, regardless of any official ranking.

Self-evaluation report · scorecard dated 2026-07-03

ADAM on ARC-AGI-3:
self-evaluation and world-model learning report.

Q: What is ADAM?

ADAM is a local C++ cognitive engine being engineered toward proto-AGI: persistent memory, reasoning paths, world model, action, verification, and self-curriculum. The full writeup is at https://cyberdynelabs.org/adam.

ARC-style interactive tasks are useful because they exercise world model, exploration, action, memory, and adaptation in one loop. We use them as an engineering gate for ADAM — not as a claim that AGI is solved.

Claim discipline. This is a non-official self-evaluation scorecard. It is not a Kaggle submission and not a leaderboard rank claim. We publish it because the cognitive-loop signal (world model + action + memory) is useful regardless of any official ranking.

Self-eval score

26.71%

Self-evaluation scorecard dated 2026-07-03. Substrate-only ADAM, no LLM, no human assist.

Levels reached

73 / 183

Levels reached under the autonomous configuration — 3 of the 25 interactive environments touched in this public-preflight run.

Actions used

1,393

Total actions across the run. Used internally as a baseline for action efficiency targets.

How ADAM is built → Talk to ADAM → Why this matters ↓

Benchmark · ARC-AGI-3 Environments · 25 Total levels · 183 Internet access · NO Source-code access · NO Status · Self-evaluation

Why this matters, and what we are not claiming.

The useful signal is not a rank. The useful signal is that ADAM was put into interactive tasks where the same engine has to: build a model of the environment from scratch, decide what to try next, remember what already happened, and pursue goals over time. That is exactly the cognitive loop the engine is being engineered to close.

What we claim. ADAM reaches 73 of 183 levels (26.71%) with 1,393 actions across 3 of 25 public ARC-AGI-3 environments under an autonomous configuration. The scorecard is dated 2026-07-03 and the run is reproducible.

What we do not claim.

Not a Kaggle submission. Not a leaderboard rank claim. Not an "official ARC Prize" position.
Not a claim that ADAM is AGI or ASI. ADAM is on an AGI track and is not AGI today.
Not a "general intelligence solved" claim. ARC-AGI-3 is one behavioural gate, not the final answer.
Not a marketing "#1" line. The percentage is small and the work to close the loop is ongoing.

Reference context — other reported numbers in the space.

This is not a rank table. It is a context block so a reader has rough numbers to compare against. Other entries are reproduced from publicly disclosed reports / open-source repos; we do not control those sources.

Solver	Configuration	Reported levels	Reported %
ADAM (this report)	Substrate-only, no LLM, no human, self-evaluation	73 / 183	26.71%
StochasticGoose	CNN frame-change predictor (community report)	~23 / 183	~12.58%
Frontier LLM (Opus-class, max effort)	No game-specific solver (community report)	~4 / 183	~2.19%
Frontier LLMs (general, no ARC solver)	Out-of-the-box, no scaffolding	< 1 / 183	< 0.5%

Note: percentage by levels here is a coarse comparison axis. Different solvers also differ in actions-per-level efficiency, configuration restrictions, and what counts as "with/without LLM". Treat as orientation, not as a ranking.

What is ARC-AGI-3?

ARC-AGI-3 (Abstraction and Reasoning Corpus, generation 3) is the 2026 interactive benchmark from ARC Prize, the research lab founded by François Chollet. It is the successor to ARC-AGI-1 (2019) and ARC-AGI-2 (2024).

Key difference from ARC-AGI-2

ARC-AGI-1 and -2 were static: each task was an input grid → output grid puzzle, solved in one shot. ARC-AGI-3 is interactive — the solver receives a stream of frames, takes discrete actions (click, keyboard, keyboard_click), and the environment responds. This forces planning, exploration, memory of state transitions, and goal pursuit — closer to embodied agency than to one-shot puzzle reasoning.

Why interactive tasks are useful for ADAM

ADAM is being engineered to close a cognitive loop: perceive → world model → memory/belief → reasoning paths → action/planner → observe outcome → verify/correct → self-curriculum → updated memory. Interactive ARC tasks pressure the entire loop in a single run, which is exactly the behavioural gate we want.

The 25 environments

183 levels across 25 small game-like environments. Each has its own rules to be inferred from frames + action feedback — no source-code access, no documentation. Examples include light-switch logic boards, gravity puzzles, block sorters, snake-style collection games.

How ADAM plays — substrate-only configuration.

For this scorecard ADAM runs as a cognitive engine without an LLM and without a human. The runner calls HTTP endpoints — /game_search_init, /game_search_expand, /game_search_next, /game_procedure_learn — and ADAM's substrate decides scores, action priors, and frontier dedup. Implementation detail is in the technical appendix below; the headline is the loop shape, not the formula.

Reproduce locally

python3 arc_agi3_runner/adam_grid_agent.py \
    --hard-cap 1000 --adam-url http://127.0.0.1:8080 \
    --substrate-only
# → self-evaluation scorecard: 73 / 183, 1,393 actions

Technical appendix

ADAM's internal layers — geometric (Clifford-style) over concept embeddings, dual-torus dynamics for goal pursuit, biological-routing flow, and parallel hypothesis evaluation on GPU — are described on the ADAM page and in research reports. They are described there because they are how the loop is implemented, not as headline claims here.

Full ADAM writeup →

Frequently asked.

Is this a Kaggle / ARC Prize leaderboard rank submission?

No. This page is a self-evaluation scorecard published by CyberdyneLabs. The score (26.71%, 73 of 183 levels, 1,393 actions, 2026-07-03) is a non-official internal measurement used as an engineering gate for ADAM's cognitive loop. It is not a submitted Kaggle entry and not a leaderboard rank claim.

What is ARC-AGI-3?

ARC-AGI-3 is the third generation of the Abstraction and Reasoning Corpus benchmark, operated by ARC Prize. It contains 25 interactive video-game-style environments totalling 183 levels, designed to test general fluid intelligence rather than memorisation. Solvers must learn each game's rules from scratch, with no internet access and no source-code access.

Why publish a 26.71% number?

Because honesty is cheaper than retraction. The number is small, the task is hard, and the useful signal is the closed loop — experience changes memory, memory changes procedure selection, procedure selection improves future runs. We want the public number on the record so internal progress can be measured against it.

How does ARC-AGI-3 differ from ARC-AGI-2?

ARC-AGI-2 was static — a single input grid mapping to a single output grid per puzzle. ARC-AGI-3 is interactive: the solver receives a stream of frames, takes actions, and the environment responds. This requires planning, exploration, memory of state transitions, and goal pursuit — closer to embodied agency than to one-shot puzzle solving.

What is ADAM exactly?

ADAM is a local C++ cognitive engine being engineered toward proto-AGI: durable memory, reasoning paths, world model, action, verification, self-correction, and self-curriculum. It is not a rented API and not a GPT wrapper. Architecture and reports live at /adam and in /r/.

Are the other numbers on this page official?

No. The reference-context table reproduces numbers from publicly disclosed reports / open-source repos. We do not control those sources. The table is provided as orientation; do not treat it as a ranking.

Will you update this when ADAM improves?

Yes. This page is dated; new self-evaluation runs replace the score with a new dated scorecard, and the previous number stays in the report archive at /r/.

Explore further.

This page is a single behavioural-gate report. The deeper system writeups live on these adjacent pages:

/adam — the cognitive engine itself: the loop, current proof surface, technical appendix
/adam-chat — talk to the live ADAM HTTP endpoint
/research-areas — full research field map · AI · blockchain · simulation
/surgery — Program 01 · evidence-gated model operation lab
/frankenstellm — Program 02 · multi-organ runtime (gigachad_native)
/physarum — Program 03 · testnet blockchain prototype for signed machine knowledge
/r/ — full research report index

ADAM on ARC-AGI-3: self-evaluation and world-model learning report.

Why this matters, and what we are not claiming.

Reference context — other reported numbers in the space.

What is ARC-AGI-3?

Key difference from ARC-AGI-2

Why interactive tasks are useful for ADAM

The 25 environments

How ADAM plays — substrate-only configuration.

Reproduce locally

Technical appendix

Frequently asked.

Explore further.

ADAM on ARC-AGI-3:
self-evaluation and world-model learning report.