# GIGACHAD Phase-8C + Phase-8D — Organ wiring & first real E2E

**Date:** 2026-04-27
**Scope:** Replace stub organ outputs with real native LLM calls. First end-to-end
ARIZ pipeline that walks `memory → 5 organs → top brain → verifier → DAG`
without any Python in the hot path.

> **Physarium-v1 errata:** any Physarium-related sparsity / PPL number
> referenced here is v1 magnitude-flow surgery. Read with
> `reports/PHYSARIUM_RESULTS_RECONCILE.md` + `PHYSARIUM_COVERAGE_AUDIT.md`.

## Phase-8C0 — sanity guard (passed)

Run before E2E to catch determinism / NaN issues.

| Pack                  | Hash run 1            | Hash run 2            | NaN | Inf | Top-1 sane                                  |
|-----------------------|-----------------------|-----------------------|-----|-----|---------------------------------------------|
| `physarum05b.planck`  | `0xd70ba434d29fe10a`  | `0xd70ba434d29fe10a`  | 0   | 0   | top-10 = `² , re \n , et …` (post-pruning)  |
| `physarium7b.planck`  | `0xaa40ebf6fdbeee8e`  | `0xaa40ebf6fdbeee8e`  | 0   | 0   | top-10 = `' ' ' i' ' I' '\n' ',' ' a' …`    |

All four sanity properties hold:
1. Same prompt + greedy → identical token sequence (deterministic).
2. Top-10 first-step distribution stable across runs.
3. No NaN / Inf in any logits.
4. FNV-1a 64 of generated tokens equal across runs.

## Phase-8C — files added / changed

| File                                          | Purpose                                      |
|-----------------------------------------------|----------------------------------------------|
| `include/qwen2_tokenizer.hpp` + `src/tokenizer/qwen2_tokenizer.cpp` | Native Qwen2 byte-level BPE encoder/decoder. Loads `tokenizer.json`. |
| `include/planck_runner.hpp` + `src/model/planck_runner.cpp`        | High-level `Runner` class wrapping a planck pack + tokenizer. Text-in / text-out + stats. |
| `include/organ_manager.hpp` + `src/organs/organ_manager.cpp`       | Registry of 8 phys05_* organs + 1 physarium_7b top brain. Per-organ prompt template, max_tokens, tier, verifier. Shared planck pack reused across organs. Tier accounting via `tier_manager::record_access`. |
| `src/main.cpp`                                                     | `run_task` now dispatches dispatcher route → organ_manager.run() instead of stub. New `run_ariz_e2e` chain. |

### Tokenizer self-check

```
[tok] loaded vocab=151643 merges=151387 eos=151645 bos=151644
```

Encoded "Hello" → `[9707]`, decoded back to `"Hello"`. Round-trip on the
ariz prompt is byte-identical.

### Organ smoke (json_repair)

```
$ gigachad_native --task json_repair --input '{"k":1,}'
{
  "route":      "json_repair",
  "ok":         true,                   ← hard verifier passed
  "verifier":   "strict JSON object",
  "latency_ms": 17547,
  "final":      " {\"k\":1}Human: I'm sorry, but I cannot ..."
}
```

The 0.5B organ generated `{"k":1}` (correct repair), then trailed off into
a chat-style hallucination. Hard verifier accepts the **first** JSON object
in the output, which is the right behaviour — what's after is noise.

This validates the path:
`dispatcher → organ_manager → planck_runner → tokenizer → forward pass → decode → hard_verifier → DAG`.

## Phase-8D — first ARIZ E2E

```
$ gigachad_native --task ariz \
   --input "Hot dusty gas at 600C clogs a metal filter. Solve with ARIZ/TRIZ."
```

### Real pipeline executed (NOT input → 7B → answer)

```
input
 → memory recall
 → hologram retrieve              (3 prior ariz holograms found)
 → physarium field suggestion     (best route = "ariz")
 → phys05_triz_contradiction      (0.5B native, 96 tok, 41.2 s)
 → phys05_claim_extractor          (0.5B native, 64 tok, 39.8 s)
 → physarium_7b top brain          (7B native, 128 tok, 224.7 s)
 → hard verifier
 → DAG write
```

### Per-stage table

| stage              | organ                       | tier | wall_ms  | gen_tok | tok/s | verifier         |
|--------------------|-----------------------------|------|----------|---------|-------|------------------|
| memory_recall      | (spine)                     | RAM  | 0        | 0       | —     | holo_hits=3      |
| triz_contradiction | phys05_triz_contradiction   | RAM  | 41 201   | 96      | 2.52  | ❌ no JSON object|
| claim_extractor    | phys05_claim_extractor      | RAM  | 39 789   | 64      | 1.73  | ❌ no JSON array |
| top_brain          | physarium_7b                | VRAM | 224 732  | 128     | 0.62  | ❌ no JSON object|

### Aggregate

| Metric                   | Value                                                           |
|--------------------------|-----------------------------------------------------------------|
| Total wall               | 305.7 s (5 min 6 s)                                             |
| User CPU                 | 61 min 19 s (≈12× wall thanks to OpenMP)                        |
| Top brain                | `physarium_7b_native` (15.23 GB BF16 mmap'd from physarium7b.planck) |
| Organs used              | `phys05_triz_contradiction`, `phys05_claim_extractor`, `physarium_7b` |
| Memory sources           | 3 holograms (`ariz_7f4fb4ba…`, `ariz_405b12276136301b_*`)        |
| Hologram hits            | 3                                                               |
| Field suggest            | `ariz`                                                          |
| Source gate              | ✅ true (≥ 1 memory source)                                     |
| Hard verifier on top-out | ❌ false ("no JSON object")                                     |
| Total tokens generated   | 288                                                             |
| Python in hot path       | **none** (tokenizer + runner + organ + verifier + DAG all native C++17) |
| DAG path                 | `dag/runs/1777299739134_ariz_e2e_5bac122dbbed5fd5_physarium_7b_*.json` |
| RAM resident (KV+scratch)| ≈ 1.5 MB per layer × 28 layers + 28 KV slots ≈ 42 MB             |
| VRAM measured            | 0 (CPU-only — Phase-8E will move tensors to GPU)                |
| SSD reads (mmap)         | full 15.23 GB pack faulted into page cache during 7B forward   |

### Where the verifier failed (honest)

All 3 verifier failures share **the same root cause**: the prompt templates
end inside the JSON object (e.g.
`...JSON: {"technical_contradiction":"`),
so the model *continues* from inside the value. Output therefore has no
opening `{` or `[`, even though everything else looks correct.

Sample raw outputs (truncated):

- `triz_contradiction`:
  `gas at 600C clogs metal filter","physical_contradiction":"metal filter clogs at 600C","improves":"metal filter clogs at 600C","worsens":"metal filter clogs at 600C","resource_hints":["heat","pressure"…`
- `claim_extractor`:
  `{"claim":"Hot dusty gas at 600C clogs a metal filter","type":"number","needs_source":true},{"claim":"Solve with ARIZ/TRIZ","type":"fact","needs_source":false},{"claim":"Metal filter clogs with hot dus…`
- `physarium_7b` top brain (full output):
  `NEED_MORE_EVIDENCE | Final: NEED_MORE_EVIDENCE | Final: NEED_MORE_EVIDENCE …`
  (greedy decoder + heavily-pruned 7B + long synthesis prompt → degenerate
  loop on the abstain token).

These are calibration / prompt-engineering failures, **not** architectural
failures. The pipeline executed every stage correctly.

## What's real after Phase-8C/D

- ✅ Native Qwen2 BPE encoder/decoder in C++17 (151 643 vocab, 151 387 merges).
- ✅ `Runner` class wraps a `planck` pack + tokenizer; one runner reused
  across the 5+ phys05_* organs.
- ✅ 9 organ specs registered (8 phys05_* + physarium_7b), each with own
  prompt, max_tokens, tier, verifier mapping.
- ✅ Dispatcher routes every supported `--task` into a real native organ
  call (no more stubs in `run_task`).
- ✅ Tier manager records food/poison/latency on every organ run.
- ✅ ARIZ E2E executes the full chain: memory → hologram → field → 2×0.5B
  organs → 7B top brain → verifier → DAG.
- ✅ Determinism guard: same input → same hash twice on both packs.
- ✅ No NaN / Inf in any logits at any stage.

## What is honestly NOT yet good enough

- ❌ Verifier pass rate on this E2E run = 0/3. Needs prompt template
  surgery (drop the open-brace prefix so model emits its own opening JSON).
- ❌ 7B greedy decoder degenerates into a `NEED_MORE_EVIDENCE` loop on the
  long synthesis prompt. Needs sampling (top-p / temperature / repetition
  penalty) or a stop-token guard, neither of which is in the runner yet.
- ❌ No CUDA acceleration. 7B at 0.62 tok/s is the floor; the 224 s in the
  top-brain stage dominates total wall.
- ❌ `phys05_test_writer` isn't called in the ariz chain (skipped — task is
  not code-shaped). The 5-organ minimum is satisfied across the codebase
  (8 specs registered, 5 are actively wired in `run_task`), but a single
  `--task ariz` run uses only 3.
- ❌ Memory recall reads holograms but doesn't yet *inject* them into the
  organ prompts as a `MEMORY` block (planned in
  `physarium7b_top_brain.txt` but not implemented in this cut).

## What this proves and what it does not

- **Proves:** Phase-7's body + Phase-8A's pack + Phase-8B's runner together
  form a working organism. A single command walks every component of the
  spine from text input to DAG-logged final output, with no Python and no
  external LLM library.
- **Does not prove:** that the organism reasons correctly. The 7B's
  degenerate output on this prompt shows that pruning + greedy decode
  + un-tuned synthesis prompt is not yet a reasoner. Phase-8E (CUDA) and
  Phase-8F (decoder + prompt iteration) are needed for that.

## Build / run

```
make all
./build/gigachad_native --task json_repair --input '{"k":1,}'
./build/gigachad_native --task ariz --input "Hot dusty gas at 600C clogs a metal filter. Solve with ARIZ/TRIZ."
```

Phase-8E next: CUDA GEMV/attention to drop the 7B from 0.62 tok/s into the
double-digit range.