Peer-reviewed abstract. Corresponding author: Raymond Chua.
Biological agents like primates exhibit lifelong learning and adaptation, processes that are mirrored in the continual Reinforcement Learning (RL) frameworks of computational models. In RL, Successor Representations (SRs) enable RL agents to exhibit flexible behaviour by capturing the world’s transition dynamics using discounted state visitations, akin to how animals learn from experiences over time. Successor Features (SFs) build upon SRs by using artificial neural networks to extract features, an area where SRs, with their discrete representations, fall short, especially in complex environments. However, learning SFs from scratch risks representation collapse and poor performance due to missed key features. To address this, two strategies have been explored: pre-training, unsuitable for continual RL scenarios due to out-of-distribution challenges, and auxiliary losses, including constraints like reconstruction on basis features. While the latter may reduce mismatches, it introduces inductive biases that hamper learning, as shown by our experiments. To tackle this, we devised a novel method that streamlines learning SFs from pixels, using a simplified modification of Temporal-difference (TD) loss, thus removing the necessity for pre-training and auxiliary losses. In single-task scenarios within both 2D (Minigrid) and 3D (Miniworld) mazes, our model matches with the standard RL model: Deep Q Network (DQN). More notably, in a continual RL setup involving two tasks, our model exhibits superior transferability when re-encountering tasks, outperforming both DQN and other SF models trained with auxiliary constraints. Additionally, through dimensionality reduction and geospatial color mapping, we visually observed that our SFs effectively capture the statistical structure of the environments. Interestingly, our findings also hint at similarities between our computational model and hippocampal predictive representations in dynamic contexts. This suggests new avenues for exploring diverse neural functionalities, in particular, how reward-driven representations differ from others, thereby enriching our understanding of neural adaptability and learning.