# BLACK_DOG_LEARNING_LOOP — reinforcement memory for GIGACHAD_NATIVE

**Status:** architecture law. Patches `MASTER_REPORT §6` (Memory spine) and
upgrades `physarium_field` from passive heat to active reinforcement.
**Date:** 2026-04-27.

> The "black dog" automaton learns by repetition: random action → signal
> coincidence → reinforcement → memory binding → conditioned response. Our
> Physarium field is currently a **passive** food/poison counter. This doc
> upgrades it into an **active reinforcement memory** that strengthens or
> weakens specific `(task_pattern → organ_chain)` bindings.

## The reflex loop (mapped to GIGACHAD)

| Pavlov-style term       | GIGACHAD term                                              |
|-------------------------|------------------------------------------------------------|
| Signal (S)              | `task_features` extracted at dispatch time                 |
| Action (A)              | `chosen_action_chain` (organ list + memory recall + 7B)    |
| Reinforcement (+)       | `verifier_pass`, `source_used`, `low_latency`, `replay_success` |
| Reinforcement (−)       | `verifier_fail`, `missing_source`, `false_cache_hit`, `hallucination_risk`, `high_latency` |
| Accumulator (memory)    | `route_conductance[task_pattern, organ_chain]`             |
| Conditioned response    | future dispatcher prefers high-conductance chain for new task with same `task_features` |

## task_features (S)

Light, no-LLM features extracted before dispatch:

```c++
struct TaskFeatures {
    std::string raw_route;        // dispatcher Route enum
    std::string lang;              // "en" / "ru" (heuristic detect)
    bool        contains_json;     // input has '{' or '['
    bool        contains_code;     // input has 'def ', 'function ', 'class '
    bool        contains_number;   // /\d/
    int         length_bucket;     // 0=<32 chars, 1=<128, 2=<512, 3=>=512
    std::vector<std::string> domain_tags;  // ["thermal", "filter", …] from key-noun grep
    uint64_t    pattern_hash;      // hash(raw_route, length_bucket, top3_domain_tags)
};
```

`pattern_hash` is the KEY into the conductance table.

## chosen_action_chain (A)

```json
{
  "memory_recall":  ["raw_archive", "hologram"],
  "field_query":    "ariz",
  "organs":         ["phys05_triz_contradiction", "phys05_claim_extractor"],
  "top_brain":      "physarium_7b",
  "verifier":       "hard_verifier"
}
```

## Reinforcement scoring

```
food = 0
food += 1.0  if verifier_pass
food += 0.5  if memory_sources_nonempty AND best_score >= 0.85
food += 0.5  if total_latency_ms < SLA[task_pattern]
food += 0.3  if cache_hit AND verifier_pass        # successful replay

poison = 0
poison += 1.0  if verifier_fail
poison += 0.5  if memory_sources_empty AND task_requires_source
poison += 0.5  if cache_hit AND NOT verifier_pass  # false cache hit
poison += 0.3  if total_latency_ms > 2 * SLA[task_pattern]
poison += 1.0  if hallucination_flag (heuristic: claim with no source pointer)
```

## Conductance update rule

For each `(pattern_hash, action_chain_hash)` slot:

```
conductance[s, a] ← (1 − α) · conductance[s, a] + α · (food − poison)

# typical:
α = 0.20
clamp conductance to [−5.0, 5.0]
```

Stored in `physarium/route_conductance.json` next to `tier_state.json`.

## Dispatcher consumption (closed loop)

When a new task arrives:
1. Compute `pattern_hash`.
2. Look up top-K candidate `action_chain` by conductance (descending).
3. If top-1 conductance ≥ τ_act (default 1.0), execute it directly.
4. Else fall back to dispatcher's static rule.
5. **Always** still write the post-run reinforcement, regardless of which path was taken.

Cold start (no slot yet): static dispatcher rules; the very first run seeds
the slot.

## Memory bindings as holograms

Every successful (pattern_hash, action_chain) binding is **also** saved as a
hologram with `category=route_binding`, so future cold sessions inherit the
learned reflexes from disk.

## DAG fields (NEW, additive)

The existing `dag::Entry` is extended with:

```c++
struct Entry {
    ...                                   // existing fields unchanged
    std::string                pattern_hash;        // hex
    std::string                action_chain_hash;   // hex
    std::vector<std::string>   action_chain;        // organ names in order
    double                     food_score;          // this run
    double                     poison_score;        // this run
    double                     conductance_before;  // before this run
    double                     conductance_after;   // after this run
    std::vector<std::string>   task_features;       // tag list
    std::string                aria_trace_id;       // optional, links to ARIZ trace if any
};
```

DAG entries become the per-step training signal of the system.

## Why this is not "another counter"

The current `tier_manager` records food/poison **per organ name** (averaged
across all callers). The Black-Dog loop records food/poison **per (task
pattern, action chain)** — i.e., learns *which chains are good for which
task shapes*. Two different memories:

| Memory                | Indexed by               | Updates                                  | Purpose                              |
|-----------------------|--------------------------|------------------------------------------|--------------------------------------|
| `tier_manager`         | organ name               | per call                                 | tier placement (VRAM/RAM/SSD)        |
| Black-Dog conductance  | (task_pattern, action_chain) | per task                          | dispatcher prefers learned chains    |

## Safety / forgetting

- Decay each tick: `conductance *= 0.999` so unused chains slowly fade.
- Hard reset on rule change: bump `policy_version` in
  `route_conductance.json`; older slots are purged on policy bump.
- Caps prevent runaway "always pick chain X" — capped to 5.0.

## Implementation roadmap

| Piece                                        | Status       | Phase |
|----------------------------------------------|--------------|-------|
| `task_features` extractor (C++)               | ⏳ specced   | 8F2   |
| `route_conductance.json` schema + persist     | ⏳ specced   | 8F2   |
| `dag::Entry` field extension                  | ⏳ specced   | 8F2   |
| Dispatcher consults conductance               | ⏳ specced   | 8F2   |
| Reinforcement scoring on every run            | ⏳ specced   | 8F2   |
| Hologram-per-binding save                     | ⏳ specced   | 8F2   |
| Decay + policy bump                           | ⏳ specced   | 8F3   |
| Probe regression (10 same-shape tasks)        | ⏳ specced   | 8F3   |

## Master-report patch

`MASTER_REPORT.md`:
- §1 — append: "Physarium field is a Black-Dog reinforcement loop, not a
  passive counter (`docs/BLACK_DOG_LEARNING_LOOP.md`)."
- §6 (Memory spine) — replace the `physarium_field` row with:
  `physarium_field | reinforcement loop (signal → action → reinforcement → conductance) | physarium/route_conductance.json | active | ✅ | learned (task_pattern → action_chain)`.
- §8 (Blockers) — add: "Black-Dog implementation pending Phase-8F2; dispatcher is currently rule-based only."

## Closing line

ARIZ tells the system *how to think*. Black-Dog tells the system *how to
remember which way of thinking worked*. Together they replace
`prompt → 7B → answer` with `task → trace → reinforced action chain`.
