Simple Algorithm to Play a Simple Game

Karthik Nadig

Anyone who has worked with Machine Learning would have used the Logistic Regression algorithm, or at least heard about it. Well, here is a simple application that observes you play, and learns, how to play the game of Tic-Tac-Toe.

How does it work?

As you play the game the algorithm observes the game. It isolates the choices that you made which led you to win the game. The draw and losing cases are not used. It then uses Logistic Regression to train the underlying intelligence, let us represent it by “\theta”. As it gains more experience, it’ll be able to make more accurate predictions about the game and hence it learns to play the game.

The math behind it

It uses Gradient Descent along with Logistic Sigmoid function to train and predict. Prediction in this application is done by nine individual Logistic Regression units, and the outcome of each of them corresponds to one of the nine possible spaces  in the Tic-Tac-Toe game.

inputs

outputs

Layout of Inputs

Layout of Outputs

Figure 1

Let k = 0, 1, \ldots, 8 be the indexes of nine spaces in the game. Also let, y_0, y_1, \ldots, y_8 represent the space chosen by you (to mark either X or O). Let, X = \lbrace x_0, x_1, \ldots, x_8 \rbrace represent the input to the algorithm (see Figure 1). The intelligence that maps the input to a predicted output be represented by \theta = \lbrace \theta_0, \theta_1, \ldots, \theta_8 \rbrace. The prediction h_{k}(X) for the k^{th} space is made as per the equations shown below:

\nu_{k}(X) = b_{k} + \sum_{i=0}^{8}(\theta_{ki}.x_i)

h_{k}(X) = \frac{1}{1 + e^{-\nu_k(X)}}

Note: b_{k} is the bias parameter. It has the same effect as applying Affine Transform to the \nu_{k}(X) polynomial.

Here, h_{k}(X) represents the predicted output and y_{k} represents the move made by you. So, the purpose of training process is to adjust the values of \theta until h_{k} \approx y_{k}. This is done using the Gradient Descent algorithm.

\theta_{kj} := \theta_{kj} - \alpha.\sum_{i=0}^{m}( h_k(X^{(i)}) - y_k^{(i)}).x_j^{(i)}
\alpha Learning Rate, a real valued number that indicates how fast the training algorithm converges.
m Number of training samples.
\theta_{kj} Intelligence parameter indicating how the j^{th} input affects the k^{th} output.
y_k^{(i)} Actual output for the k^{th} space given i^{th} sample as input.
h_k(X^{(i)}) Predicted output for the k^{th} space given i^{th} sample as input.
X^{(i)} Represents the i^{th} input sample.
X^{(i)} = \lbrace x_0^{(i)}, x_1^{(i)}, \ldots, x_8^{(i)} \rbrace
x_j^{(i)} Represents the value of the j^{th} space in the i^{th} input sample.

How does all that work?

Below is a Silverlight version of the application, try it out. After you play the first game, you should see some prediction. The darkest circle is where the prediction algorithm thinks that you will make the next move. Here is the link to the source code: TicTacToe.zip. Note, you have to move for both the players. The predictor only predicts the moves. Until you play the first game with a winning outcome the predictor will not be trained.



Leave a Reply