Camilo Libedinsky

2papers

2 Papers

NEJun 25, 2021
A nonlinear hidden layer enables actor-critic agents to learn multiple paired association navigation

M Ganesh Kumar, Cheston Tan, Camilo Libedinsky et al.

Navigation to multiple cued reward locations has been increasingly used to study rodent learning. Though deep reinforcement learning agents have been shown to be able to learn the task, they are not biologically plausible. Biologically plausible classic actor-critic agents have been shown to learn to navigate to single reward locations, but which biologically plausible agents are able to learn multiple cue-reward location tasks has remained unclear. In this computational study, we show versions of classic agents that learn to navigate to a single reward location, and adapt to reward location displacement, but are not able to learn multiple paired association navigation. The limitation is overcome by an agent in which place cell and cue information are first processed by a feedforward nonlinear hidden layer with synapses to the actor and critic subject to temporal difference error-modulated plasticity. Faster learning is obtained when the feedforward layer is replaced by a recurrent reservoir network.

NEJun 7, 2021
One-shot learning of paired association navigation with biologically plausible schemas

M Ganesh Kumar, Cheston Tan, Camilo Libedinsky et al.

Schemas are knowledge structures that can enable rapid learning. Rodent one-shot learning in a multiple paired association navigation task has been postulated to be schema-dependent. We still only poorly understand how schemas, conceptualized at Marr's computational level, are neurally implemented. Moreover, a biologically plausible computational model of the rodent learning has not been demonstrated. Accordingly, we here compose an agent from schemas with biologically plausible neural implementations. The agent gradually learns a metric representation of its environment using a path integration temporal difference error, allowing it to localize in any environment. Additionally, the agent contains an associative memory that can stably form numerous one-shot associations between sensory cues and goal coordinates, implemented with a feedforward layer or a reservoir of recurrently connected neurons whose plastic output weights are governed by a 4-factor reward-modulated Exploratory Hebbian (EH) rule. A third network performs vector subtraction between the agent's current and goal location to decide the direction of movement. We further show that schemas supplemented by an actor-critic allows the agent to succeed even if an obstacle prevents direct heading, and that temporal-difference learning of a working memory gating mechanism enables one-shot learning despite distractors. Our agent recapitulates learning behavior observed in experiments and provides testable predictions that can be probed in future experiments.