Dopamine Lab

Prediction error moves from reward to cue

This is still stable-first: deterministic temporal-difference learning, local chart rendering, and no dependency on the Worker boundary. It is the last major learning-curve module before we consider AI-backed pages.

Prediction error across a trial

Snapshot traces through learning

cuerewardNovel rewardMid learningReward omittedWell learned

Value function

Final trial expectation

Learning curve

Cue versus reward responses

Summary

Learned prediction

Final cue response

0.454

Final reward response

0.042

Omission dip

-0.998

Cue dominance emerges around trial 11

The goal is not just to make reward responses large. The system learns to predict reward early enough that the cue inherits the strongest positive prediction error.

Interpretation

A temporal-difference learning model approximating dopamine reward-prediction error signals described by Wolfram Schultz and colleagues.

Model notes

  • At first, an unexpected reward produces a strong positive prediction error at reward time.
  • With learning, the error shifts earlier toward the predictive cue as value propagates backward in time.
  • If reward is omitted after expectation has formed, prediction error becomes negative around the expected reward time.