Dopamine Lab
Prediction error moves from reward to cue
This is still stable-first: deterministic temporal-difference learning, local chart rendering, and no dependency on the Worker boundary. It is the last major learning-curve module before we consider AI-backed pages.
Prediction error across a trial
Snapshot traces through learning
Value function
Final trial expectation
Learning curve
Cue versus reward responses
Summary
Learned prediction
Final cue response
0.454
Final reward response
0.042
Omission dip
-0.998
Cue dominance emerges around trial 11
The goal is not just to make reward responses large. The system learns to predict reward early enough that the cue inherits the strongest positive prediction error.
Interpretation
A temporal-difference learning model approximating dopamine reward-prediction error signals described by Wolfram Schultz and colleagues.
Model notes
- At first, an unexpected reward produces a strong positive prediction error at reward time.
- With learning, the error shifts earlier toward the predictive cue as value propagates backward in time.
- If reward is omitted after expectation has formed, prediction error becomes negative around the expected reward time.