Monte Carlo Tree Search Alternating

360 Views Asked by At

Could anybody please clarify how (as I have not found any clear example anywhere) The MCTS algorithm iterates for the second player.

Everything I seem just seems to look like it is playing eg P1 move every time. I understand the steps for one agent but I never find anything showing code where P2 places its counter, which surely must happen when growing the tree.

Essentially I would expect:

for each iter:

select node Player1 expand Player1

select node Player2 expand player 2

rollout backpropogate

next iter

Is this right?? Could anybody please spell out some psuedocode showing that? Either iteratively or recursion i don't mind.

Thanks for any help.

1

There are 1 best solutions below

4
Shihab Shahriar Khan On

The trick is in backpropagation part, where you update "wins" variable from the point of view of player whose move led into this position.

Code for MCTS

Notice under UCT function, specially the comments:

 #Backpropagate
    while node != None: # backpropagate from the expanded node and work back to the root node
        node.Update(state.GetResult(node.playerJustMoved)) # state is terminal. Update node with result from POV of node.playerJustMoved
        node = node.parentNode

IF you follow the function call, you would realize visit variable is always updated; wins however, is not.