Could anybody please clarify how (as I have not found any clear example anywhere) The MCTS algorithm iterates for the second player.
Everything I seem just seems to look like it is playing eg P1 move every time. I understand the steps for one agent but I never find anything showing code where P2 places its counter, which surely must happen when growing the tree.
Essentially I would expect:
for each iter:
select node Player1 expand Player1
select node Player2 expand player 2
rollout backpropogate
next iter
Is this right?? Could anybody please spell out some psuedocode showing that? Either iteratively or recursion i don't mind.
Thanks for any help.
The trick is in backpropagation part, where you update "wins" variable from the point of view of player whose move led into this position.
Code for MCTS
Notice under UCT function, specially the comments:
IF you follow the function call, you would realize
visitvariable is always updated;winshowever, is not.