CyberdyneLabs · Reports · BLACK_DOG_LEARNING_LOOP

BLACK_DOG_LEARNING_LOOP — reinforcement memory for GIGACHAD_NATIVE

reports/BLACK_DOG_LEARNING_LOOP.md 896 words raw markdown ↗

BLACK_DOG_LEARNING_LOOP — reinforcement memory for GIGACHAD_NATIVE

Status: architecture law. Patches MASTER_REPORT §6 (Memory spine) and upgrades physarium_field from passive heat to active reinforcement. Date: 2026-04-27.

The "black dog" automaton learns by repetition: random action → signal
coincidence → reinforcement → memory binding → conditioned response. Our
Physarium field is currently a passive food/poison counter. This doc
upgrades it into an active reinforcement memory that strengthens or
weakens specific (task_pattern → organ_chain) bindings.

The reflex loop (mapped to GIGACHAD)

| Pavlov-style term | GIGACHAD term | |-------------------------|------------------------------------------------------------| | Signal (S) | task_features extracted at dispatch time | | Action (A) | chosen_action_chain (organ list + memory recall + 7B) | | Reinforcement (+) | verifier_pass, source_used, low_latency, replay_success | | Reinforcement (−) | verifier_fail, missing_source, false_cache_hit, hallucination_risk, high_latency | | Accumulator (memory) | route_conductance[task_pattern, organ_chain] | | Conditioned response | future dispatcher prefers high-conductance chain for new task with same task_features |

task_features (S)

Light, no-LLM features extracted before dispatch:

struct TaskFeatures {
    std::string raw_route;        // dispatcher Route enum
    std::string lang;              // "en" / "ru" (heuristic detect)
    bool        contains_json;     // input has '{' or '['
    bool        contains_code;     // input has 'def ', 'function ', 'class '
    bool        contains_number;   // /\d/
    int         length_bucket;     // 0=<32 chars, 1=<128, 2=<512, 3=>=512
    std::vector<std::string> domain_tags;  // ["thermal", "filter", …] from key-noun grep
    uint64_t    pattern_hash;      // hash(raw_route, length_bucket, top3_domain_tags)
};

pattern_hash is the KEY into the conductance table.

chosen_action_chain (A)

{
  "memory_recall":  ["raw_archive", "hologram"],
  "field_query":    "ariz",
  "organs":         ["phys05_triz_contradiction", "phys05_claim_extractor"],
  "top_brain":      "physarium_7b",
  "verifier":       "hard_verifier"
}

Reinforcement scoring

food = 0
food += 1.0  if verifier_pass
food += 0.5  if memory_sources_nonempty AND best_score >= 0.85
food += 0.5  if total_latency_ms < SLA[task_pattern]
food += 0.3  if cache_hit AND verifier_pass        # successful replay

poison = 0
poison += 1.0  if verifier_fail
poison += 0.5  if memory_sources_empty AND task_requires_source
poison += 0.5  if cache_hit AND NOT verifier_pass  # false cache hit
poison += 0.3  if total_latency_ms > 2 * SLA[task_pattern]
poison += 1.0  if hallucination_flag (heuristic: claim with no source pointer)

Conductance update rule

For each (pattern_hash, action_chain_hash) slot:

conductance[s, a] ← (1 − α) · conductance[s, a] + α · (food − poison)

# typical:
α = 0.20
clamp conductance to [−5.0, 5.0]

Stored in physarium/route_conductance.json next to tier_state.json.

Dispatcher consumption (closed loop)

When a new task arrives:

  1. Compute pattern_hash.
  2. Look up top-K candidate action_chain by conductance (descending).
  3. If top-1 conductance ≥ τ_act (default 1.0), execute it directly.
  4. Else fall back to dispatcher's static rule.
  5. Always still write the post-run reinforcement, regardless of which path was taken.

Cold start (no slot yet): static dispatcher rules; the very first run seeds the slot.

Memory bindings as holograms

Every successful (pattern_hash, action_chain) binding is also saved as a hologram with category=route_binding, so future cold sessions inherit the learned reflexes from disk.

DAG fields (NEW, additive)

The existing dag::Entry is extended with:

struct Entry {
    ...                                   // existing fields unchanged
    std::string                pattern_hash;        // hex
    std::string                action_chain_hash;   // hex
    std::vector<std::string>   action_chain;        // organ names in order
    double                     food_score;          // this run
    double                     poison_score;        // this run
    double                     conductance_before;  // before this run
    double                     conductance_after;   // after this run
    std::vector<std::string>   task_features;       // tag list
    std::string                aria_trace_id;       // optional, links to ARIZ trace if any
};

DAG entries become the per-step training signal of the system.

Why this is not "another counter"

The current tier_manager records food/poison per organ name (averaged across all callers). The Black-Dog loop records food/poison per (task pattern, action chain) — i.e., learns which chains are good for which task shapes. Two different memories:

| Memory | Indexed by | Updates | Purpose | |-----------------------|--------------------------|------------------------------------------|--------------------------------------| | tier_manager | organ name | per call | tier placement (VRAM/RAM/SSD) | | Black-Dog conductance | (task_pattern, action_chain) | per task | dispatcher prefers learned chains |

Safety / forgetting

route_conductance.json; older slots are purged on policy bump.

Implementation roadmap

| Piece | Status | Phase | |----------------------------------------------|--------------|-------| | task_features extractor (C++) | ⏳ specced | 8F2 | | route_conductance.json schema + persist | ⏳ specced | 8F2 | | dag::Entry field extension | ⏳ specced | 8F2 | | Dispatcher consults conductance | ⏳ specced | 8F2 | | Reinforcement scoring on every run | ⏳ specced | 8F2 | | Hologram-per-binding save | ⏳ specced | 8F2 | | Decay + policy bump | ⏳ specced | 8F3 | | Probe regression (10 same-shape tasks) | ⏳ specced | 8F3 |

Master-report patch

MASTER_REPORT.md:

passive counter (docs/BLACK_DOG_LEARNING_LOOP.md)."

physarium_field | reinforcement loop (signal → action → reinforcement → conductance) | physarium/route_conductance.json | active | ✅ | learned (task_pattern → action_chain).

Closing line

ARIZ tells the system how to think. Black-Dog tells the system how to remember which way of thinking worked. Together they replace prompt → 7B → answer with task → trace → reinforced action chain.