BLACK_DOG_LEARNING_LOOP — reinforcement memory for GIGACHAD_NATIVE
Status: architecture law. Patches MASTER_REPORT §6 (Memory spine) and upgrades physarium_field from passive heat to active reinforcement. Date: 2026-04-27.
The "black dog" automaton learns by repetition: random action → signal
coincidence → reinforcement → memory binding → conditioned response. Our
Physarium field is currently a passive food/poison counter. This doc
upgrades it into an active reinforcement memory that strengthens or
weakens specific (task_pattern → organ_chain) bindings.
The reflex loop (mapped to GIGACHAD)
| Pavlov-style term | GIGACHAD term | |-------------------------|------------------------------------------------------------| | Signal (S) | task_features extracted at dispatch time | | Action (A) | chosen_action_chain (organ list + memory recall + 7B) | | Reinforcement (+) | verifier_pass, source_used, low_latency, replay_success | | Reinforcement (−) | verifier_fail, missing_source, false_cache_hit, hallucination_risk, high_latency | | Accumulator (memory) | route_conductance[task_pattern, organ_chain] | | Conditioned response | future dispatcher prefers high-conductance chain for new task with same task_features |
task_features (S)
Light, no-LLM features extracted before dispatch:
struct TaskFeatures {
std::string raw_route; // dispatcher Route enum
std::string lang; // "en" / "ru" (heuristic detect)
bool contains_json; // input has '{' or '['
bool contains_code; // input has 'def ', 'function ', 'class '
bool contains_number; // /\d/
int length_bucket; // 0=<32 chars, 1=<128, 2=<512, 3=>=512
std::vector<std::string> domain_tags; // ["thermal", "filter", …] from key-noun grep
uint64_t pattern_hash; // hash(raw_route, length_bucket, top3_domain_tags)
};
pattern_hash is the KEY into the conductance table.
chosen_action_chain (A)
{
"memory_recall": ["raw_archive", "hologram"],
"field_query": "ariz",
"organs": ["phys05_triz_contradiction", "phys05_claim_extractor"],
"top_brain": "physarium_7b",
"verifier": "hard_verifier"
}
Reinforcement scoring
food = 0
food += 1.0 if verifier_pass
food += 0.5 if memory_sources_nonempty AND best_score >= 0.85
food += 0.5 if total_latency_ms < SLA[task_pattern]
food += 0.3 if cache_hit AND verifier_pass # successful replay
poison = 0
poison += 1.0 if verifier_fail
poison += 0.5 if memory_sources_empty AND task_requires_source
poison += 0.5 if cache_hit AND NOT verifier_pass # false cache hit
poison += 0.3 if total_latency_ms > 2 * SLA[task_pattern]
poison += 1.0 if hallucination_flag (heuristic: claim with no source pointer)
Conductance update rule
For each (pattern_hash, action_chain_hash) slot:
conductance[s, a] ← (1 − α) · conductance[s, a] + α · (food − poison)
# typical:
α = 0.20
clamp conductance to [−5.0, 5.0]
Stored in physarium/route_conductance.json next to tier_state.json.
Dispatcher consumption (closed loop)
When a new task arrives:
- Compute
pattern_hash. - Look up top-K candidate
action_chainby conductance (descending). - If top-1 conductance ≥ τ_act (default 1.0), execute it directly.
- Else fall back to dispatcher's static rule.
- Always still write the post-run reinforcement, regardless of which path was taken.
Cold start (no slot yet): static dispatcher rules; the very first run seeds the slot.
Memory bindings as holograms
Every successful (pattern_hash, action_chain) binding is also saved as a hologram with category=route_binding, so future cold sessions inherit the learned reflexes from disk.
DAG fields (NEW, additive)
The existing dag::Entry is extended with:
struct Entry {
... // existing fields unchanged
std::string pattern_hash; // hex
std::string action_chain_hash; // hex
std::vector<std::string> action_chain; // organ names in order
double food_score; // this run
double poison_score; // this run
double conductance_before; // before this run
double conductance_after; // after this run
std::vector<std::string> task_features; // tag list
std::string aria_trace_id; // optional, links to ARIZ trace if any
};
DAG entries become the per-step training signal of the system.
Why this is not "another counter"
The current tier_manager records food/poison per organ name (averaged across all callers). The Black-Dog loop records food/poison per (task pattern, action chain) — i.e., learns which chains are good for which task shapes. Two different memories:
| Memory | Indexed by | Updates | Purpose | |-----------------------|--------------------------|------------------------------------------|--------------------------------------| | tier_manager | organ name | per call | tier placement (VRAM/RAM/SSD) | | Black-Dog conductance | (task_pattern, action_chain) | per task | dispatcher prefers learned chains |
Safety / forgetting
- Decay each tick:
conductance *= 0.999so unused chains slowly fade. - Hard reset on rule change: bump
policy_versionin
route_conductance.json; older slots are purged on policy bump.
- Caps prevent runaway "always pick chain X" — capped to 5.0.
Implementation roadmap
| Piece | Status | Phase | |----------------------------------------------|--------------|-------| | task_features extractor (C++) | ⏳ specced | 8F2 | | route_conductance.json schema + persist | ⏳ specced | 8F2 | | dag::Entry field extension | ⏳ specced | 8F2 | | Dispatcher consults conductance | ⏳ specced | 8F2 | | Reinforcement scoring on every run | ⏳ specced | 8F2 | | Hologram-per-binding save | ⏳ specced | 8F2 | | Decay + policy bump | ⏳ specced | 8F3 | | Probe regression (10 same-shape tasks) | ⏳ specced | 8F3 |
Master-report patch
MASTER_REPORT.md:
- §1 — append: "Physarium field is a Black-Dog reinforcement loop, not a
passive counter (docs/BLACK_DOG_LEARNING_LOOP.md)."
- §6 (Memory spine) — replace the
physarium_fieldrow with:
physarium_field | reinforcement loop (signal → action → reinforcement → conductance) | physarium/route_conductance.json | active | ✅ | learned (task_pattern → action_chain).
- §8 (Blockers) — add: "Black-Dog implementation pending Phase-8F2; dispatcher is currently rule-based only."
Closing line
ARIZ tells the system how to think. Black-Dog tells the system how to remember which way of thinking worked. Together they replace prompt → 7B → answer with task → trace → reinforced action chain.