April 22, 2013 / neurograce

Knowledge is Pleasure!: Reliable reward information as a reward itself

Pursuing rewards is a crucial part of survival for any species. The circuitry that tells us to seek out pleasure is what ensures that we find food, drink, and mates. In order to engage in this behavior, we must learn associations between rewards and the stimuli that predict them. That way we can know that our caffeine craving, for example, can be quenched by seeking the siren in a green circle (it’s possible that I do my blog writing at a Starbucks–cuz I’m original like that). Studying this kind of reinforcement learning is big business, and there is still a lot left to find out. But what has been known for some 15 years now is that dopaminergic cells in the midbrain which encode reward value also encode reward expectation. That is, in the ventral tegmental area (VTA), cells increase their firing in response to the delivery of an unexpected reward, such as a sudden drop of juice on a monkey’s tongue. But cells here also fire in response to a reward cue, say a symbol on a screen that the monkey has learned predicts the juice reward. What’s more, after this cue, the arrival of the actual reward causes no change in the firing of these cells unless it is higher or lower than expected. So, these cells are learning to value the promise of a pleasurable stimulus, and signal whether that promise is fulfilled, denied, or exceeded. Suddenly, the sight of the siren is a reward on its own, and getting your coffee is merely neutral.

But the world is rarely just a series of cues and rewards. It’s complex and dynamic: a symbol may predict something positive in one context and punishment in another; reward contingencies can be uncertain or change over time; and with a constant stream of incoming stimuli how do you even figure out what acts as a reward cue in the first place? Luckily, Ethan Bromberg-Martin and Okihide Hikosaka are interested in explaining just these kinds of challenges, and they’ve made a discovery that offers a nice framework on which to build a deeper understanding. In this Neuron paper, Bromberg-Martin and Hikosaka developed a task to test monkeys’ views on information. To start, the monkey was shown one of two symbols, A or B, to which the he had to saccade. After that, one of a set of four different symbols appeared: if A was initially shown then the second symbol would be A1 or A2, and likewise for B. The appearance of A1 always predicted a big water reward, and A2 always predicted a small water reward (which, to greedy monkeys who know a larger reward is possible, is essentially a punishment). But for B1 and B2, the water amount was randomized; these symbols were useless in providing reward information. So, the appearance of A meant that an informative symbol was on its way, whereas B meant something meaningless was coming. Importantly, the amount of reward was equal on average for A and B, it was only the advanced knowledge of the reward that differed.

Recording from those familiar midbrain dopaminergic cells, the authors saw an increase in activity following the appearance of the information-predicting cue A, and a decrease in response to B. These cells then went on to do their normal duty: showing a large spike in response to A1 (the large reward cue), a decrease to A2, and no change in response when these predicted rewards were actually delivered; or, alternatively, little change in response to B1 and B2, and a spike/dip when an unpredictable large/small reward was delivered. What the initial response to A and B shows is that the VTA is responding to the promise of information about reward in the same way is it responds to the promise of a reward or a reward itself. This is further supported by the fact that when monkeys were presented with both A and B and allowed to choose which to saccade to, they overwhelming preferred A—leading them down the path of reward information.

This may seem like a silly preference. Choosing to be informed about the reward size beforehand doesn’t provide a greater reward size or allow the monkey any more control, so why bother valuing the advanced information? The authors put forth the notion that uncertainty is in someway uncomfortable, so the earlier it is resolved the better. But I’m more inclined to believe their second assertion: the informative path (A) is preferred because it provides stable cue-reward associations that can be learned. The process of learning what cue predicts what reward assumes that there are cues that actually do predict reward. So if we want to achieve that goal we have to make sure we’re working in a regime where that base assumption is true—this isn’t the case for uninformative path B. Living in a world of meaningless symbols means all your fancy mental equipment for associating cues and rewards is for naught, and it leaves you with little more than luck when it comes to finding what you need. So there is a clear evolutionary advantage in finding reward in (and thus seeking out) stable cue-reward associations.

But like most good discoveries, this one leaves us with a lot of questions, mainly about how the brain comes to find these stable associations rewarding. We know that for a cue to be associated with a reward, it needs to reliably precede that reward. Then through….well, some process that we’re working out the details of….VTA neurons start firing in response to the cue itself. So presumably in order for the brain to associate a certain cue with reward information, the cue has to reliably precede that information. Here’s where we hit a problem. It is easy enough to understand how the brain is aware that the cue was presented (that’s just a visual stimulus, no problem there), and we can equally as well conceive of how it acknowledges the existence of a reward (again, just a physical stimulus which ends up making VTA cells spike), but how can the brain know that information is present? The information that a cue contains about an upcoming reward isn’t a physical stimulus out there in the world; it’s something contained in the brain itself. If we are to learn to associate an external cue with an internal entity like information, the brain needs to be able to monitor what’s happening inside itself the same way it monitors the outside world.

Luckily, there are possible mechanisms for this, and they fit well with the existing role of VTA cells. Here is the equation the brain seems to be using to make basic reward associations:

visual stimulus + VTA cell firing due to some delayed reward = VTA cell firing to visual stimulus.

But VTA cell firing is VTA cell firing, so we can substitute the second term with the righthand side of the equation and get:

visual stimulus #2 + VTA cell firing due to visual stimulus = VTA cell firing to visual stimulus #2

If pseudomath isn’t your thing: basically, the fact that the brain can learn to treat reward cues as reward means that it can learn to treat cues for reward cues as reward. And cues for cues for reward cues? Maybe, but I wouldn’t bet on it. While they did fire in response to the promise of information signified by cue A, the VTA cells still had their biggest spike increase in response to A1, the cue that signaled a big reward. It seems there’s a limit on how far removed a cue can be from an actual reward. Interestingly, this ability of any kind of metacognition appears restricted to more cognitively complex animals such as primates, and probably contributes to their adaptability as a species. While this kind of study hasn’t been done in rats or mice, my guess is you’d be hard-pressed to find such a preference for information in those lower animals.

Of course these findings leave us with something of a chicken-and-egg problem. Our desire for information is supposed to motivate us to pursue situations with stable cue-reward associations. But we can’t develop that desire until the cue-reward association is already mentally established, so what good is it then? There is also the question of how these results fit into the well-established desire that people (and animals) have for gambling. You’re not going to find a roulette wheel that will tell you where its ball is going to land, or a poker player willing to show you their cards. So what allows us to selectively love risk and uncertainty? Some theories suggest that the possibility for huge payoffs can lead to a miscalculation in expected reward and overpower our better, more rational instincts. But it’s still an area of research in economics as well as neuroscience. Basically, the evidence that reliable information is valued and sought after provides many insights into the process of reinforcement learning, but in order to fully understand its role and consequences, we are going to need more–you guessed it–information.

Bromberg-Martin, E., & Hikosaka, O. (2009). Midbrain Dopamine Neurons Signal Preference for Advance Information about Upcoming Rewards Neuron, 63 (1), 119-126 DOI: 10.1016/j.neuron.2009.06.009

← Frontiers in Nature: A potential opportunity to reverse the peer-review process

Methodological Mixology: The harmful side of diversity in Neuroscience →

Knowledge is Pleasure!: Reliable reward information as a reward itself

Leave a comment Cancel reply

Recent Articles

Knowledge is Pleasure!: Reliable reward information as a reward itself

Share this:

Related

Leave a comment Cancel reply

Recent Articles