The Silver Challenge - Lecture 1
I have challenged myself to watch the 10 David Silver lectures on reinforcement learning in 10 days. I’m calling it the Silver Challenge! For each day I am going to try to write about one thing that I found useful from the lecture - not that I will only be learning one thing a day, but I just want to capture a quick highlight.
Today in the introductory lecture, I really liked Silver’s explanation of how the value function and Q-value function work, because that is a topic that I have been struggling to understand well. I took a screenshot from the lecture and annotated it to show what I learned that I found useful.
Figure 1 - Source  (plus Emma’s annotations)
The value function tells you the prediction of the expected future reward . It depends on the agent’s behavior, so an agent that has learned to behave well (i.e. it has an optimized policy) will have higher future rewards than an agent that has a poor policy . I liked Silver’s presentation because it shows the value function in action. In the Atari game “Breakout”, the agent is constantly looking into the future to predict how much reward it could potentially earn . The spikes in the value function corresponded to moments when there was a lot of opportunity to earn a high reward, and the sharp decreases highlighted moments where that opportunity had passed . I liked seeing how the value function was constantly changing in time as the game progressed.
Similarly, it was also helpful to see how the Q-values can help the agent choose the best action at every time step. The Q-values were constantly varying, and there was often not much variation between the Q-value for every action . But given small improvements in one action over another, the agent was able to consistently choose the most optimal action at that time step . Again, it was useful to see the role of the Q-values in real time to get a more intuitive understanding of why we need Q-values and how they work.
 Silver, D. “RL Course by David Silver - Lecture 1: Introduction to Reinforcement Learning.” YouTube. 13 May 2015. https://www.youtube.com/watch?v=2pWv7GOvuf0 Visited 06 Aug 2020.