- submitted: 2025-05-05 CASE Science, Center for the Advancement of Science Education, National Taiwan University
- published: 2025-11-27 [中文] : 多變多巴胺——第十部:遇見AI,是晴還是雨? (I)
- DOI:
- articlePlus: @medium ; @vocus
- continues DOIs: https://doi.org/10.5281/zenodo.17264636
In the late 20th and early 21st centuries, dopamine got itself a reputation. People wrapped it up in “addiction theory,” and neuroscientists tied it to associative learning,especially this neat idea called “reward prediction error”. The story goes like this: dopamine neurons are like little bookkeepers. They keep track of what reward you expect when you’re about to do something. Then, when the actual reward shows up, they compare it to the expectation. If the reward is better than expected, they throw a party: lots of dopamine. If it’s worse, they sulk and shut down. Simple, right? But not the whole story.
Meanwhile, over in computer science, AI folks were cooking up something called “Temporal Difference learning” (TD learning). Predicting the future is hard, but TD found a clever shortcut. Instead of trying to calculate the whole future reward in one go, it just predicts the next step, then uses that prediction to bootstrap the next one, and so on. It’s like learning to play chess not by planning the whole game but by saying, “Let’s see what happens if I move this pawn,” then adjusting after each move. Computers got pretty good at this, better and better at long-term predictions. And here’s the kicker: neuroscientists noticed that dopamine neurons fire in patterns that look suspiciously like TD’s prediction error signals. The brain and the computer seemed to be speaking the same language.
But hold on. There are problems.
- TD can learn the average value of an action. For example, a coffee lover’s dopamine neurons first light up only when tasting coffee. After a few trips to the café, they start firing earlier when leaving the office, because the brain has learned to anticipate the reward. But TD doesn’t capture all the possible outcomes.
- In real life, not all dopamine neurons sing the same tune. Some fire for the coffee, some for the chance of a free pastry, some grumble when the coffee’s burnt, and some stall when the barista is slow. TD doesn’t explain this diversity.
So scientists borrowed another trick from AI: distributional reinforcement learning. Instead of predicting just one average outcome, you predict a whole spread: a distribution of possibilities. It’s like a weather forecast: not just “rain or shine,” but “40% chance of drizzle, 20% chance of thunderstorms, 40% chance of sunshine.” Dopamine, it turns out, isn’t just encoding the “typical coffee experience.” It’s encoding the whole range, from the best-case surprise pastry to the worst-case burnt sludge.
This idea that the brain encodes a distribution came straight from AI. And it makes sense. Life isn’t about averages; it’s about uncertainty. The brain, like a good forecaster, needs to juggle optimism and pessimism, risk and reward.
DeepMind took this further. They trained recurrent neural networks using dopamine-like TD signals and compared them to real neural data. And guess what? They could reconstruct the reward distribution just from dopamine firing rates. The neurons weren’t all singing the same note, they were a choir. Some bass, some soprano, each tuned to optimism or pessimism. Together, they made harmony. And in AI systems, this diversity speeds up learning. Maybe the brain uses it for the same reason: richer signals, faster adaptation.
Fast-forward to February 2025. Harvard researchers pushed the frontier again. We already knew the mesolimbic dopamine system updates average reward representations in the striatum. But where do neurons encode the higher-order stuff:the variance, the risk, the shape of the whole distribution? Using AI-based distributional reinforcement learning, plus fancy tools like neural pixel recordings, dopamine lesions, calcium imaging, and optogenetics, they found the answer.
In mice, striatal neurons didn’t just encode the mean reward. They encoded variance, risk, and the full distribution. D1 receptors leaned toward optimism, D2 toward caution. Medium spiny neurons became little statisticians, weighing the odds of different outcomes. Suddenly, dopamine wasn’t just about “good” or “bad.” It was about probability, uncertainty, and the whole spectrum of possible futures.
So here’s the punchline: dopamine and AI are converging. The brain doesn’t just average rewards, it forecasts distributions, like a weather report for life’s surprises. And AI, inspired by this, is learning to do the same. Sunshine or rain, pastry or burnt coffee, optimism or pessimism, the brain encodes it all. And that’s how we learn to navigate uncertainty, one prediction error at a time.
(To be continued…)
REFERENCE
1. Bakermans, J. J., Muller, T. H., & Behrens, T. E. (2020). Reinforcement learning: full glass or empty—depends who you ask. Current Biology, 30(7), R321-R324.
2.Dabney and Kurth-Nelson, Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI. Google DeepMind, Jan 15 2020
3.Lowet, A. S., Zheng, Q., Meng, M., Matias, S., Drugowitsch, J., & Uchida, N. (2025). An opponent striatal circuit for distributional reinforcement learning. Nature, 1-10.
4. Medium Spiny Neurons (MSNs): MeSH: D000094242





















