Monte Carlo Tree Search (MCTS) is a conceptually simple algorithm that has enjoyed recent success in computer Go and General Game Playing. Simply stated, it is a tree search look-ahead algorithm that uses random playout sampling for state evaluation. It is compelling because it is in effect an algorithm that can behave strategically across different domains without human intervention. As a contrasting example, most state-of-the-art chess playing algorithms rely heavily on human-supplied heuristics for state evaluation and search-tree pruning. MCTS requires no such embedded domain-specific heuristics because it simply relies on the random playout sampling for state evaluation. This makes it widely adaptable, and often surprisingly effective. For example, most current competitive Go algorithms are based on MCTS.
In this post we will implement the MCTS algo in clojure and apply it to the domain of tic tac toe (TTT).
Ultimately our goal is a function that pits a MCTS player against an arbitrary opponent in a game of TTT. By repeatedly invoking this function, we can gather win/loss statistics and infer the superior player.
1. Function: play-game
Let's think about playing a single game of TTT. What do we need to keep track of? Two things: the state of the board and which player is to move. Then we simply ask the current player to update the board corresponding to the player's chosen move, switch the current player (i.e. from X to O), and recur until the board state is terminal. In TTT a board state is terminal when a player has won or there are no remaining moves which results in a draw. The following abridged code listing implements this basic logic:
(if (ttt-terminal? board-state) ;return result
(case to-move
:X
(let [mem (mcts-analyze-state board-state 30) ;!1
move (mcts-select mem board-state)] ;!2
(recur move (opp-player to-move)))
:O
(let [move (rand-nth
(ttt-children board-state))]
(recur move (opp-player to-move))))))
The interesting bits happen at !1 and !2, where the MCTS player is asked to analyze the current board state and then select a move based on this analysis.
2. Function: mcts-analyze-state
This function is the implementation of the MCTS algorithm. The basic idea is to build a database of board states and store visit counts and success rates from each state. The algo then uses this data for action selection.
Building this database is quite simple: from a given state, determine if all possible actions in that state have been explored. If so, pick the highest valued state and recur. If not, pick a random action then randomly sample its win/loss statistics by random playout. Add the resulting state from that action and its statistics as a child to the parent state in the database. Backpropagate the statistics up the child path to the parent.
The above process is repeated until a computational budget has been reached. Then the highest-rated action is chosen.
(defn mcts-analyze [mem state]
(loop [mem mem, state state, path (list state)]
(if-let [result (ttt-terminal? state)]
(mcts-backprop mem path result)
(if (not-empty (mcts-unexplored mem state))
(mcts-grow mem path)
(let [ch (mcts-select mem state)]
(recur mem ch (cons ch path)))))))
3. Results
There are a number of schemes for demonstrating the statistical (in-)effectiveness of a game playing algorithm. We will keep things simple. The main parameter of MCTS is its computational budget, or the amount of computational effort it spends on analyzing a board state. We can vary this parameter and determine if increasing the computational budget increases the effectiveness of MCTS. The following is a table showing the win percentages (and draws) of 50 games of TTT at each computational budget level. Thus for example the first row shows that the random player won 44% of the time versus MCTS's 36% when MCTS had 0 computational budget (and was thus effectively a random player as well).
0 {:mcts 0.36, :rnd 0.44, :draws 0.20}1 {:mcts 0.52, :rnd 0.42, :draws 0.06}2 {:mcts 0.54, :rnd 0.40, :draws 0.06}3 {:mcts 0.70, :rnd 0.20, :draws 0.10}4 {:mcts 0.74, :rnd 0.22, :draws 0.04}5 {:mcts 0.68, :rnd 0.30, :draws 0.02}10 {:mcts 0.84, :rnd 0.08, :draws 0.08}100 {:mcts 0.82, :rnd 0.06, :draws 0.12}
MCTS appears to increase the quality of its strategic play as a function of its computational budget. The next obvious step would be to compare MCTS against a perfect player. This might be done in the next post.
4. Code repo
Please note there are nomenclature differences in the repo compared to this blog post (namely in the repo I refer to the algo as "uct" not "mcts"). Also the repo contains the unabridged versions of functions discussed, and the script for generating the results discussed above.
I get the impression the computational quota isn't actually passed anywhere. You pass '30' as 'state' to (mcts-analyze) in your call in (play-game). There's no check for a computational quota. What is the computational quota anyways? Allowed number of recursive calls? ms allowed to spend? Please elaborate.
ReplyDeleteFor the blog post I edited the source to make the logic easier to see. Unfortunately this introduced errors. Please refer to the linked source in the post.
DeleteCool! Thanks for the post and the code on Github!
ReplyDelete