Q-Learning - Random Notes Go Brrrrrrr

> [!abstract] > **Q-Learning** is a method of estimating the Q-value in reinforcement learning. given by the update equation $Q(s_{t},a_{t})\leftarrow \underbrace{(1-\alpha)\cdot Q(s_{t},a_{t})\vphantom{\operatorname*{max}_{a\in\mathcal{A}}}}_{\text{decayed retention}}+\underbrace{\alpha \cdot r_{t+1}\vphantom{\operatorname*{max}_{a\in\mathcal{A}}}}_{\substack{\text{immediate}\\\text{rewards}}}+\underbrace{\alpha\gamma\operatorname*{max}_{a\in\mathcal{A}(s_{t})}Q(s_{t+1},a)}_{\substack{\text{discounted future}\\\text{total rewards}}},$which is tuned with parameters $\alpha$ (the learning rate) and $\gamma$ (discount rate of future rewards). It is not based on deep learning methods. > [!cite] Resources > [Wikipedia](https://en.wikipedia.org/wiki/Q-learning) > An alternative learning algorithm is SARSA -- see [this article](https://webcache.googleusercontent.com/search?q=cache:https://towardsdatascience.com/walking-off-the-cliff-with-off-policy-reinforcement-learning-7fdbcdfe31ff).