Quasi Best-First (QBF) for opening book generation with Monte Carlo Tree Search (MCTS)

16 Views Asked by At

I was reading the Chaslot papers[1,2] on opening book generation and am very confused. It seems that it will never explore more than one winning sequence. The Quasi Best-First (QBF) algorithm is described in [2] as:

Algorithm 1 The “Quasi Best-First” (QBF) algorithm. λ is the number of machines available. K is a constant. g is a game, defined as a sequence of game states. The function “MoGoChoice” asks MOGO to choose a move.

QBF(K, λ)
while True do
  for l = 1..λ, do
    s =initial state; g = {s}.
    while s is not a final state do
      bestScore = K
      bestMove = Null
      for m in the set of possible moves in s do
        score = percentage of won games by playing the move m in s
        if score > bestScore then
          bestScore = score
          bestMove = m
        end if
      end for
      if bestMove = Null then
        bestMove = MoGoChoice(s) // lower level MCTS
      end if
      s = playMove(s, bestMove)
      g = concat(g, s)
    end while
    Add g and the result of the game in the book.
  end for
end while

When the book is updated with its first winning sequence, subsequent iterations will always pick that move which wins 100% of games played with that move. I can think of a number of modifications to prevent this, but I was trying to reproduce the original results as a baseline for comparison.

Some assumptions I am applying which may be incorrect:

  • The outer while True indicates the top level - this is not actually being executed as the tree policy (selection strategy) or simulate policy (playout strategy) for some outer MCTS. (I.e., QBF itself is the "higher level MCTS".)
  • The initial state is the start state of the game (i.e., an empty go board)

I'd appreciate any tips!

[1] Audouard, P., Chaslot, G., Hoock, J.-B., Perez, J., Rimmel, A., Teytaud, O., 2009. Grid Coevolution for Adaptive Simulations: Application to the Building of Opening Books in the Game of Go, in: Giacobini, M., Brabazon, A., Cagnoni, S., Di Caro, G.A., Ekárt, A., Esparcia-Alcázar, A.I., Farooq, M., Fink, A., Machado, P. (Eds.), Applications of Evolutionary Computing, Lecture Notes in Computer Science. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 323–332. https://doi.org/10.1007/978-3-642-01129-0_36

[2] Chaslot, G.M.J.-B., Hoock, J.-B., Perez, J., Rimmel, A., Teytaud, O., Winands, M.H.M., 2009. Meta Monte-Carlo Tree Search for Automatic Opening Book Generation, in: Proceedings of the IJCAI’09 Workshop on General Intelligence in Game Playing Agents. Presented at the IJCAI 2009.

0

There are 0 best solutions below