Context-Adaptive Reinforcement Learning using Unsupervised Learning of Context Variables
In Reinforcement Learning, an agent interacts with an environment through receiving observations, executing actions, and receiving rewards. The goal of the agent is to maximise the cumulative reward that is defined based on the task at hand. In some scenarios however, the behaviour of the environment as well as the distribution of the observations can change over time. Under certain conditions, the change in the observation distribution is caused by some variability that changes the context of the environment. Therefore, a change in context affects the distribution of the environment's observations. As such changes may occur numerous times, not only does the agent have to adapt to the new contexts, but it also has to remember the previous ones. This problem is known as Contextual Reinforcement Learning (CRL). This post is dedicated to investigating the problem of CRL, and the unsupervised learning of context variables.