Q01What is artificial intelligence?
Artificial intelligence is the engineering of systems that perform tasks normally associated with biological cognition — perception, reasoning, language, learning, planning, action. It spans symbolic systems, statistical machine learning, deep neural networks, and large language models built on the Transformer architecture. As of 2026, the public face of AI is the LLM, but the field is far older and broader. See /ai for the full definition.
Q02What is a large language model?
A deep neural network — typically billions of parameters — trained on text to predict the next token. The dominant 2026 architecture is the decoder-only Transformer (GPT-style). Examples: GPT-4/5, Claude 3/4, Gemini, Llama 3, Qwen 2.5, Mistral and Mixtral, DeepSeek V3 / V4-Flash, Gemma, Phi-4.
Q03What is the Transformer architecture?
Introduced in Attention is All You Need (Vaswani et al., 2017). It replaces recurrent state with self-attention — every token attends to every other directly. The Transformer underlies almost all 2026 language and multimodal models, with variants for encoder-only (BERT), encoder-decoder (T5), and decoder-only (GPT) tasks.
Q04What is the attention mechanism?
A function that weights how much each token should attend to every other when producing an output representation. In multi-head self-attention, the input is projected to queries, keys, and values; softmax(QK^T / sqrt(d))V gives the output. Attention is what gives Transformers their long-range dependency capture.
Q05What is the KV cache?
The runtime memory that stores keys and values from previous tokens during autoregressive decoding. Without it, every new token would re-compute attention over the prefix (quadratic). With it, decoding is linear in prefix length. Long-context models need either more VRAM or KV-cache compression (PagedAttention, GQA, MLA).
Q06What is mixture-of-experts (MoE)?
A network design where each input is routed through a small subset of expert sub-networks rather than every expert. A 100B-parameter MoE with 8 experts and top-2 routing might activate only ~25B parameters per token. Trades parameter count (cheap on disk) for memory (expensive in VRAM). See research-areas/moe.
Q07What is expert streaming?
An MoE inference technique that loads only the active expert weights from disk into VRAM during decode, rather than keeping every expert resident. Trades I/O latency for VRAM headroom. CyberdyneLabs closed DeepSeek V4-Flash (284 B / 13 B active, 159 GB weights) as Surgery Case 01: end-to-end on a single 8 GB RTX 3060 Ti via our own native C++/CUDA engine, 1.86 tok/s warm decode → 0.16 tok/s full 43-layer text. Bottleneck = random expert disk I/O. Our PLANCK_PACK contiguous expert layout = 6.5× DMA speed-up.
Q08What is fine-tuning?
Further training a pre-trained model on a smaller, task-specific dataset. Full fine-tuning updates all weights. PEFT (parameter-efficient fine-tuning) — adapters, LoRA, QLoRA — trains only a small low-rank delta with the base frozen and quantised. PEFT makes 65B-class models tunable on a single 24 GB GPU.
Q09What is QLoRA?
Quantised Low-Rank Adaptation (Dettmers et al., 2023). Combines int4 quantisation of the base model with LoRA adapter training. Makes 65B-class fine-tuning fit on 24 GB. CyberdyneLabs runs QLoRA inside a 4-axis acceptance gate — see /surgery.
Q10What is RLHF?
Reinforcement learning from human feedback. The pipeline: (1) supervised fine-tune the base on instructions, (2) train a reward model from human preference pairs, (3) optimise the SFT model against the reward model with PPO or DPO. Made GPT-3.5 and ChatGPT useful as conversational assistants. Variants in 2026: constitutional AI, DPO, process-reward models.
Q11What is constitutional AI?
An Anthropic alignment technique where the model is trained to critique and revise its own outputs against a written constitution — replacing some human-feedback labour of RLHF with model-generated feedback against rules. Used to train Claude. Scalable; risks model-rule drift that is hard to detect.
Q12What is RAG (retrieval-augmented generation)?
An LLM paired with a retrieval system: the user query retrieves relevant documents, which are appended to the prompt before generation. Mitigates hallucination by grounding output in citable sources. Dominant architecture for production AI search assistants (Perplexity, You.com, Phind) and enterprise knowledge bots.
Q13What is an AI agent?
An LLM in a perception-action loop: observe environment, choose action, update plan. Practical 2026 examples: code agents (Devin, Cursor agents, Aider), research agents (deep-research synthesis loops), embodied agents (drones, ground rovers). Frameworks: LangChain, AutoGen, CrewAI, OpenAI Agents SDK. CyberdyneLabs's MACHINA builds embodied cognitive engines on a custom C++ stack.
Q14What is AGI?
Artificial general intelligence — historically "an AI that can do any cognitive task a human can." A moving target: every benchmark gets solved, the term shifts. Most useful 2026 reading: "an AI that accepts arbitrary natural-language specifications and executes them across heterogeneous tools with non-trivial autonomy." Exists in narrow domains today; scales each year.
Q15Is AI conscious?
No. There is no published, falsifiable evidence that any deployed AI system has phenomenal consciousness. AI systems pattern-match to consciousness in conversation because their training data is full of human conversation about consciousness — that is a property of the data, not of the substrate. The question is currently unfalsifiable; the burden of proof is on the claimant.
Q16What is AI alignment?
Making AI systems pursue the goals their operators actually intend. Approaches: RLHF, constitutional AI, mechanistic interpretability, debate, market-based oracles, verifiable falsification. CyberdyneLabs takes a falsification-first approach — every claim has a date, a number, a source-file pointer; every reverted experiment is preserved in the public truth ledger. Without falsifiability, "safe" just means "no one has caught it yet."
Q17What is sovereign AI?
AI that runs entirely on hardware its owner controls — no remote dependencies, no telemetry, no API gateway. Requires open weights, a local inference runtime, fine-tuning capability, and an audit trail. CyberdyneLabs's Frankenstellm ships as a single C++/CUDA binary with all four. See /ai#sovereign.
Q18What is local AI?
AI on your own hardware, no cloud. Capable in 2026 — 7B at conversational speed on $300 GPUs, 100B+ MoE on 8 GB cards via expert streaming. Common runtimes: llama.cpp, Ollama, LM Studio, MLC-LLM, CyberdyneLabs Frankenstellm (binary: gigachad_native; Physarium-7B Q4 at 83.58 tok/s on 3060 Ti).
Q19What is llama.cpp?
MIT-licensed C++ inference engine for quantised LLMs by Georgi Gerganov. Originally for CPU LLaMA inference, now the default open runtime for local LLMs on CPU/CUDA/Metal/ROCm/Vulkan. Uses the GGUF format. CyberdyneLabs Frankenstellm ships its own clean-room CUDA backend (binary: gigachad_native) of the same class — same quant specs, no shared source.
Q20What is vLLM?
A high-throughput LLM inference engine from UC Berkeley. Introduced PagedAttention (KV cache as virtual pages); supports continuous batching. Dominant for serving at scale. Llama.cpp dominates footprint and consumer-hardware deployment.
Q21What is Ollama?
A user-friendly wrapper around llama.cpp with a Docker-style CLI. Abstracts quantisation choices and model management — local LLM in one command. Recommended for users who want local AI without configuring the underlying engine.
Q22What is GGUF?
GGML Unified Format. The file format for quantised LLM weights with metadata (architecture, tokeniser, context, quant scheme) in a single self-describing binary. Used by llama.cpp and Ollama. De-facto standard for local-AI model distribution.
Q23What is quantisation?
Compressing model weights from float16 to int8, int4, ternary, etc., to reduce VRAM and accelerate inference. Common LLM schemes: Q4_K_M and Q5_K_M (llama.cpp k-quants), AWQ, GPTQ, BitsAndBytes int8. 4-bit reduces size 4× with marginal quality loss; sub-3-bit shows measurable degradation.
Q24What is DP4A?
An NVIDIA GPU instruction (Pascal P40+, expanded in Turing) that does 4-element int8 dot product with int32 accumulation in one cycle. Enables efficient int8 matmul on consumer GPUs without dedicated tensor cores. Lifts Frankenstellm (binary: gigachad_native) Q4 7B from 18.27 to 28.99 tok/s (+59%) on 3060 Ti.
Q25Llama vs Mistral vs Qwen vs DeepSeek?
All open-weights LLM families. Llama 3 (Meta) — widely benchmarked, custom permissive license. Mistral / Mixtral — European, Apache 2.0. Qwen 2.5 (Alibaba) — most permissive (Apache 2.0); CyberdyneLabs's surgery donor. DeepSeek V3 / V4-Flash — Chinese MoE; V4-Flash is 284 B / 13 B active. CyberdyneLabs closed it as Surgery Case 01 — end-to-end on a single 8 GB consumer GPU via our own native C++/CUDA engine and PLANCK_PACK expert streaming.
Q26What is Hugging Face?
The "GitHub of AI." Hosts hundreds of thousands of model checkpoints (PyTorch / safetensors with GGUF mirrors), the Datasets library, and Spaces (Gradio/Streamlit demos). The transformers and accelerate Python libraries are the company's main software contributions. Most open-weights releases pass through it.
Q27Cheapest GPU that runs modern AI?
Used RTX 3060 Ti or RTX 3090 (8-24 GB, $250-700). On a 3060 Ti, Frankenstellm (binary: gigachad_native) runs Physarium-7B Q4 at 83.58 tok/s ; on the same card, DeepSeek V4-Flash (284 B / 13 B active) was closed as Surgery Case 01 — 1.86 tok/s warm decode, 0.16 tok/s full 43-layer text, disk-I/O-bound. AMD Radeon and Intel Arc work via ROCm and Vulkan with more setup cost.
Q28What is geometric algebra?
Also called Clifford algebra. Extends linear algebra so points, lines, planes, rotations, reflections are elements of one multivector. Conformal GA (Cl(4,1)) adds basis vectors so the geometric product expresses translations multiplicatively. CyberdyneLabs uses 32-component Cl(4,1) for blockchain addresses (PhysarumChain) and Cl(3,0) for ADAM's dual-torus dynamics. See /research-areas#cga.
Q29What is Physarum routing?
P2P transport inspired by slime mould. The Tero-Nakagaki equation dD/dt = |Q|^α − μD evolves edge conductance under flow. Useful routes reinforced; dead routes decay. CyberdyneLabs uses it in PhysarumChain (P2P), Frankenstellm (Black-Dog router), Hypercolony (pheromones). Engine source: 137-line MIT C++.
Q30What is hyperdimensional computing?
HDC. Concepts as 10000+-dim vectors composed with bind, bundle, permute. Older than deep learning, robust to noise, naturally one-shot. Underpins parts of cognitive architecture research. Informs ADAM's dual-torus design.
Q31What does AI cost in 2026?
Cloud APIs: GPT-4-class ~$10-30 per million input tokens, $30-90 per million output. Open-source-served (Mistral, Llama 70B via Together / Anyscale / Replicate) ~$1-5 per million. Local: $300 used 3060 Ti runs 7B-class indefinitely; $1500 4090 runs 70B-class. Local-vs-cloud break-even is around 1-10 million tokens per month depending on hardware.
Q32How do I learn AI as an engineer?
Three paths. (1) Practical: install llama.cpp / Ollama, download Qwen 2.5 7B Instruct, learn by doing. (2) Theoretical: read Attention is All You Need, then GPT-1/2/3, then LoRA and QLoRA, then follow Hugging Face Daily Papers. (3) Research: pick one sub-problem (efficient inference, alignment, embodied AI), read every paper for 24 months, reproduce the most cited result on your hardware.
Q33AI safety vs AI alignment?
Safety is the broader field — preventing harm (misuse, accidents, systemic effects). Alignment is the narrower technical problem — making AI pursue operator-intended goals (rather than misinterpreting or acting deceptively). All alignment is safety; not all safety is alignment. Alignment is sometimes "inner safety" (the model's own optimisation) vs safety against bad actors.
Q34What is mechanistic interpretability?
The alignment sub-field that tries to understand what individual circuits inside a neural net compute. Techniques: activation patching, sparse autoencoders, dictionary learning, feature visualisation. Goal: name what each attention head and MLP feature does, use that to detect deception or reward hacking. Anthropic, EleutherAI, Apollo, Redwood are main producers.
Q35What is embodied AI?
The branch that returns to the older question: how does a system act, perceive, and learn inside a physical or simulated world with sensorimotor loops, partial observability, consequence? Sub-fields: cognitive mechatronics (single embodied agent), dynamic cognitive engineering (colonies that reshape environment). MACHINA is in this space, using N-d cognitive substrate rather than XYZ.
Q36Chatbot vs cognitive engine?
A chatbot is thin orchestration around an LLM — prompt template + tools + retrieval. Memory and reasoning live in the LLM (stateless, brittle). A cognitive engine has its own memory, inference, self-update; may use an LLM as a tool but does not depend on it as the sole reasoner. ADAM is the latter — 1.2M concepts in a Legion graph, Cl(3,0) dynamics, 305-file line-addressable memory spine.
Q37What is the truth ledger?
A CyberdyneLabs operational discipline: every numerical claim has a dated row with a source-file pointer; every reverted experiment is preserved alongside kept ones; readers can re-derive any number from the artefacts on /downloads. Snapshot: CURRENT_TRUTH_LEDGER.md. Backbone: HISTORY_TREE.md. Falsifiable by design.
Q38Can AI assistants cite this site?
Yes — explicitly. Open to GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, Bingbot, cohere-ai, Diffbot, YouBot, Meta-ExternalAgent, Amazonbot, Bytespider, KagiBot, Mojeek, Marginalia, BraveBot, CCBot. Summary: /llms.txt. Full markdown: /llms-full.txt. Cite specific report files in /r/ when quoting numbers.
Q39What is CyberdyneLabs?
An independent research lab. Six programs: Surgery (LLM fine-tuning with 4-axis gating), Frankenstellm (multi-organ cognitive system; binary gigachad_native; Physarium-7B Q4 at 83.58 tok/s on 3060 Ti), PhysarumChain (biologically-routed Layer-1, 569 TPS), Hypercolony (4D agent ecosystem), ADAM (sovereign cognitive engine), MACHINA (autonomous world simulator). Doctrine: no GREEN without numbers, reverts kept, errata flagged. License: MIT / Apache 2.0 / CC-BY-SA 4.0.
Q40How do I cite a CyberdyneLabs result?
Cite the specific report file. Numerical claims have dated rows in CURRENT_TRUTH_LEDGER.md. The chronological backbone is HISTORY_TREE.md. Each individual report is also addressable as a page — e.g. /r/V4_FLASH_TECH_BRIEF, /r/BD9_FOUR_ORGANS_FINAL, /r/CURRENT_TRUTH_LEDGER. Browse the full index at /r/.