Abstract-1
The selection and timing of actions are subject to determinate influences such as sensory cues and internal state as well as to effectively stochastic variability. Although stochastic choice mechanisms are assumed by many theoretical models, their origin and mechanisms remain poorly understood. Here we investigated this issue by studying how neural circuits in the frontal cortex determine action timing in rats performing a waiting task. Electrophysiological recordings from two regions necessary for this behavior, medial prefrontal cortex (mPFC) and secondary motor cortex (M2), revealed an unexpected functional dissociation. Both areas encoded deterministic biases in action timing, but only M2 neurons reflected stochastic trial-by-trial fluctuations. This differential coding was reflected in distinct timescales of neural dynamics in the two frontal cortical areas. These results suggest a two-stage model in which stochastic components of action timing decisions are injected by circuits downstream of those carrying deterministic bias signals.
Murakami, M., Shteingart, H., Loewenstein, Y., & Mainen, Z. F. (2017). Distinct sources of deterministic and stochastic components of action timing decisions in rodent frontal cortex. Neuron, 94(4), 908-919.[LINK]
Abstract-2
Reinforcement learning has been widely used in explaining animal behavior. In reinforcement learning, the agent learns the value of the states in the task, collectively constituting the task state space, and uses the knowledge to choose actions and acquire desired outcomes. It has been proposed that the orbitofrontal cortex (OFC) encodes the task state space during reinforcement learning. However, it is not well understood how the OFC acquires and stores task state information. Here, we propose a neural network model based on reservoir computing. Reservoir networks exhibit heterogeneous and dynamic activity patterns that are suitable to encode task states. The information can be extracted by a linear readout trained with reinforcement learning. We demonstrate how the network acquires and stores task structures. The network exhibits reinforcement learning behavior and its aspects resemble experimental findings of the OFC. Our study provides a theoretical explanation of how the OFC may contribute to reinforcement learning and a new approach to understanding the neural mechanism underlying reinforcement learning.
Zhang, Z., Cheng, Z., Lin, Z., Nie, C., & Yang, T. (2018). A neural network model for the orbitofrontal cortex and task space acquisition during reinforcement learning. PLOS Comp Biol, 14(1), e1005925.[LINK]
Abstract-3
Intrinsic motivation generates behaviors that do not necessarily lead to immediate reward, but help exploration and learning. Here we show that agents having the sole goal of maximizing occupancy of future actions and states, that is, moving and exploring on the long term, are capable of complex behavior without any reference to external rewards. We find that action-state path entropy is the only measure consistent with additivity and other intuitive properties of expected future action-state path occupancy. We provide analytical expressions that relate the optimal policy with the optimal state-value function, from where we prove uniqueness of the solution of the associated Bellman equation and convergence of our algorithm to the optimal state-value function. Using discrete and continuous state tasks, we show that `dancing', hide-and-seek and a basic form of altruistic behavior naturally result from entropy seeking without external rewards. Intrinsically motivated agents can objectively determine what states constitute rewards, exploiting them to ultimately maximize action-state path entropy.
Ramírez-Ruiz, J., Grytskyy, D., & Moreno-Bote, R. (2022). Seeking entropy: complex behavior from intrinsic motivation to occupy action-state path space. arXiv, 2205.10316. [LINK]
Speaker: Huixin Lin
Time: 9:30 am, 2022/09/09
Location: CIBR A6 Meeting Room