CyberdyneLabs · Reports · GIGACHAD_PHASE8AB_NATIVE_INFERENCE

GIGACHAD Phase-8A + Phase-8B — Native inference backend

reports/GIGACHAD_PHASE8AB_NATIVE_INFERENCE.md 1074 words raw markdown ↗

GIGACHAD Phase-8A + Phase-8B — Native inference backend

Date: 2026-04-27 Scope: Native pack format (Phase-8A) + dense Qwen2-arch CPU forward pass (Phase-8B). First real native generations through both Physarium-7B-Native and Physarum-0.5B-Organic.

Physarium-v1 errata: the pruned model used in this report came from
Physarium-v1 magnitude-flow surgery. Read sparsity claims with
reports/PHYSARIUM_RESULTS_RECONCILE.md + PHYSARIUM_COVERAGE_AUDIT.md.

What is real

| # | Deliverable | Status | Evidence | |---|-------------------------------------------------------|--------|-------------------------------------------------------| | 1 | planck7b pack format (mmap, 4 KB aligned, BF16/FP16)| ✅ | include/planck7b_pack.h | | 2 | Pack writer (parses safetensors, reads config.json) | ✅ | src/planck7b/planck7b_pack.cpp | | 3 | Pack reader (mmap) | ✅ | same | | 4 | Pack verifier (byte round-trip vs source) | ✅ | planck7b_tool verify 339/339 ok, 15 GB checked | | 5 | Physarium-7B-Native pack on disk | ✅ | physarium7b.planck (15.23 GB, BF16, 28 layers) | | 6 | Physarum-0.5B-Organic pack on disk | ✅ | physarum05b.planck (0.99 GB, FP16, 24 layers, tied) | | 7 | C++17 forward pass: embed/RMSNorm/QKV+bias/RoPE/GQA/SwiGLU/lm_head | ✅ | src/physarium7b/physarium7b_runner.cpp | | 8 | Greedy decode loop | ✅ | same | | 9 | Integration into gigachad_native --top-brain-smoke | ✅ | src/main.cpp | | 10 | Real native generation, 7B | ✅ | "Hello" → " 2018! I hope" (8 tokens) | | 11 | Real native generation, 0.5B | ✅ | "Hello" → "²\nA. 100" (8 tokens) |

Phase-8A — pack format

[HEADER (4 KB aligned)]
[embed_tokens BF16/FP16, 4 KB aligned]
[final_norm FP32]
[lm_head BF16/FP16  OR  alias of embed when tie_word_embeddings=true]
[layer 0 payload, 4 KB aligned]
  q_w  q_b  k_w  k_b  v_w  v_b  o_w
  gate_w  up_w  down_w
  inp_ln  post_ln
[layer 1 payload]
...
[layer N-1 payload]

source config.json

Phase-8A — verified results

Physarium-7B (BF16, 28 layers, separate lm_head)

| Metric | Value | |----------------------------------|--------------------------------| | Pack size | 15,231,977,472 (15.23 GB) | | Source size | ~15.23 GB (same, BF16 raw) | | Compression ratio | 1.0× (no quantization yet) | | Tensor count | 339 | | Total zeros | 1,450,103,690 | | Build wall | 199.2 s (3 min 19 s) | | Round-trip verify | 339/339 ok, 0 fail | | Bytes verified | 15,231,899,648 (≈15.23 GB) | | Verify wall | 236.5 s |

Physarum-0.5B (FP16, 24 layers, tied embed/lm_head)

| Metric | Value | |----------------------------------|--------------------------------| | Pack size | 988,241,408 (0.99 GB) | | Source size | 943 MB (same, FP16 raw) | | Tensor count (incl. tied alias) | 290 | | Total zeros | 73,807,859 (after surgery) | | Build wall | 7.9 s |

Phase-8B — runner

Modules implemented (all CPU, FP32 accumulation)

Smoke results

$ gigachad_native --top-brain-smoke --pack physarium7b.planck \
                   --prompt-tokens 9707 --max-new 8
{
  "load_mode":         "mmap_ro",
  "mmap_bytes":        15231977472,
  "tokens_generated":  8,
  "generate_ms":       8246.08,
  "tok_per_sec":       0.97,
  "output_token_ids":  [220, 17, 15, 16, 23, 0, 358, 3900],
}

decoded = "Hello 2018! I hope"
$ gigachad_native --top-brain-smoke --pack physarum05b.planck \
                   --prompt-tokens 9707 --max-new 8
{
  "load_mode":         "mmap_ro",
  "mmap_bytes":        988241408,
  "tokens_generated":  8,
  "tok_per_sec":       5.69,
  "output_token_ids":  [110, 198, 32, 13, 220, 16, 15, 15],
}

decoded = "Hello²\nA. 100"

Token 9707 = "Hello" (encoded offline via Qwen2 byte-level BPE).

Performance, what is and is not optimized

partial offload — CUDA path is Phase-8B-next.

C++ FP32 GEMV ~1 GFLOP; with OpenMP across cores measured ~1 tok/s warm.

What is honest about the output

The 7B output "Hello 2018! I hope" is plausible English. The 0.5B output "Hello²\nA. 100" is degraded — Physarum-0.5B-Organic was already heavily pruned. Neither has been parity-checked against an HF Python reference; this remains in Phase-8B-next. What is verified:

are finite and in expected magnitude band 8–22).

What Phase-8 still does NOT have

layer-0/all-28 hidden states, comparison to ε ≤ 1e-3).

apply to organ dispatch, not yet to layer-by-layer offload).

emits stubs; wiring phys05_* routes to the runner with per-organ prompts is Phase-8C).

→ top brain → hard verifier → DAG. The pieces all exist; they have not been chained through the new runner.

Build / commands

make all
./build/planck7b_tool build  --src Physarium-7B-Native      --out physarium7b.planck
./build/planck7b_tool verify --src Physarium-7B-Native      --pack physarium7b.planck
./build/planck7b_tool info   --pack physarium7b.planck

./build/gigachad_native --top-brain-smoke \
    --pack physarium7b.planck --prompt-tokens 9707 --max-new 8

Honest one-liner

Phase-7 produced the Physarium-7B body. Phase-8A wrapped that body in a native streaming pack (verified 100% round-trip). Phase-8B gave it a CPU nervous system that emits its first real tokens — "Hello" continues into " 2018! I hope" through 28 transformer layers running entirely in C++17, no Python in the hot path, no HF transformers, no PyTorch. The same backend also produces tokens for the Physarum-0.5B organ. What's missing is HF parity, CUDA, quantization, native tokenizer, and full pipeline plumbing — all of which are real engineering, not architecture decisions.