When studying Reinforcement learning, and exactly when it comes to Model-Free RL, there are two methods we use generally:
- TD learning
- Monte Carlo
When is each one of them used over the other? In other words, how do we figure out what method is best for our problem?
Sections 6.1 and 6.2 of Sutton & Barto give a very nice intuitive understanding of the difference between Monte Carlo and TD learning.
Having said that, there's of course the obvious incompatibility of MC methods with non-episodic tasks. In that case, you will always need some kind of bootstrapping.