In Reinforcement Learning, an agent interacts with an environment through receiving observations, executing actions, and receiving rewards. The goal of the agent is to maximise the cumulative reward that is defined based on the task at hand. In some scenarios however, the behaviour of the environment as well as the distribution of the observations can change over time. Under certain conditions, the change in the observation distribution is caused by some variability that changes the context of the environment. Therefore, a change in context affects the distribution of the environment's observations. As such changes may occur numerous times, not only does the agent have to adapt to the new contexts, but it also has to remember the previous ones. This problem is known as Contextual Reinforcement Learning (CRL). This post is dedicated to investigating the problem of CRL, and the unsupervised learning of context variables.
We made Le Cake for my PhD defense party!
Stritzel, with a persian touch!.
In this post, I will show how the decision boundaries of neural networks can be wrong for unseen test data from mismatched distributions, and how to fix it using a simple normalisation!
Homemade sourdough with homebbrewed sourdough starter!
We grew tomatoes on our balcony!