A framework for associative learning
How do animals learn to associate environmental cues with delayed outcomes such as rewards? Temporal difference reinforcement learning (TDRL) is the most widely accepted model of reward learning. Jeong et al. now challenge this view by introducing a formal account and model of dopamine’s role in learning. They present an algorithm that supports retrospective causal learning, allowing animals to infer the causes of an event. To directly compare the two competing hypotheses, the authors devised eight experimental tests. In each of the eight tests in every animal tested, they challenge the TDRL signaling model and show that every observation is consistent with their causal inference algorithm. These findings provide a fresh physiological and theoretical framework for associative learning in the brain. —PRS
Structured Abstract
INTRODUCTION
How do animals learn to prospectively predict delayed outcomes such as rewards from environmental cues? Considerable evidence shows that the neuromodulator dopamine is critical for such associative learning. This evidence has supported the hypothesis that midbrain dopamine neurons and nucleus accumbens dopamine release signal a temporal difference (TD) reward prediction error (RPE)—the difference between received and predicted reward. Hence, it is widely believed that animals use the TD algorithm to learn reward predictions. Recent results, however, suggest that dopamine signaling may not be fully consistent with RPEs. Despite this, the TDRPE hypothesis remains the best explanation of associative learning in the brain because no alternate model exists that successfully explains experimental observations inconsistent with TDRPE while also capturing experimental phenomena currently best explained by TDRPE.
RATIONALE
Here, we propose a new theory of associative learning. Since causes must precede outcomes, we propose that animals learn associations by looking back in memory to infer causes of meaningful outcomes such as rewards. For instance, imagine a child learning that the bell of an ice cream van predicts ice cream. Learning what cues precede ice cream (retrospective) is simpler than learning the future outcome of a near-infinite number of cues, only a few of which may result in ice cream (prospective). Thus, once a meaningful outcome such as ice cream is realized, the child can look back in memory to infer that the bell causes ice cream. Such learning only requires a memory of the previous bell when ice cream is received, and not active maintenance of predictions of ice cream prior to its receipt. We developed a model based on this concept—adjusted net contingency for causal relations (ANCCR, pronounced “anchor”)—and tested whether this model is capable of learning reward predictions and explaining nucleus accumbens dopamine release better than TDRPE. In this model, a current event is considered a “meaningful causal target” (i.e., an event whose cause should be learned) when it is either innately meaningful (e.g., ice cream) or when it has been learned to be a cause of other meaningful causal targets (e.g., bell of ice cream van after it has been learned to be a cause of ice cream). Our central hypothesis is that mesolimbic dopamine conveys the learned meaningfulness of events by signaling ANCCR and guides causal learning of the current event.
RESULTS
We first showed using simulations that ANCCR identifies relationships between events spread in time (e.g., cues and rewards) better than TD without loss of generality across timescales. Next, we showed that many classic experimental results supporting TDRPE coding are also consistent with ANCCR signaling. Thus, we sought to test dopamine activity during experiments that distinguish these two hypotheses. To this end, we devised eleven experimental tests that are capable of qualitatively distinguishing between both hypotheses. We performed these tests by using fiber photometry of an optical dopamine sensor (dLight1.3b) to measure nucleus accumbens dopamine release. We did so across tasks including repeated exposure to random rewards, variations of cue-reward learning, and sequential conditioning with and without optogenetic inhibition of dopamine release. In all cases, we observed that the results were inconsistent with TDRPE but consistent with ANCCR.
CONCLUSION
Collectively, these results demonstrate that dopamine function can be better understood as providing a signal to initiate learning of causes of a meaningful stimulus. These results reshape current understanding of the algorithms and neural mechanisms of associative learning in the brain.
Abstract
Learning to predict rewards based on environmental cues is essential for survival. It is believed that animals learn to predict rewards by updating predictions whenever the outcome deviates from expectations, and that such reward prediction errors (RPEs) are signaled by the mesolimbic dopamine system—a key controller of learning. However, instead of learning prospective predictions from RPEs, animals can infer predictions by learning the retrospective cause of rewards. Hence, whether mesolimbic dopamine instead conveys a causal associative signal that sometimes resembles RPE remains unknown. We developed an algorithm for retrospective causal learning and found that mesolimbic dopamine release conveys causal associations but not RPE, thereby challenging the dominant theory of reward learning. Our results reshape the conceptual and biological framework for associative learning.
Jeong, H., Taylor, A., Floeder, J. R., Lohmann, M., Mihalas, S., Wu, B., ... & Namboodiri, V. M. K. (2022). Mesolimbic dopamine release conveys causal associations. Science. [LINK]
Speaker: Huixin Lin
Time: 1:00 pm, 2022/12/28
Location: CIBR A6 Meeting Room