Simple Algorithm to Play a Simple Game

Karthik Nadig

Anyone who has worked with Machine Learning would have used the Logistic Regression algorithm, or at least heard about it. Well, here is a simple application that observes you play, and learns, how to play the game of Tic-Tac-Toe.

How does it work?

As you play the game the algorithm observes the game. It isolates the choices that you made which led you to win the game. The draw and losing cases are not used. It then uses Logistic Regression to train the underlying intelligence, let us represent it by “ $\theta$ ”. As it gains more experience, it’ll be able to make more accurate predictions about the game and hence it learns to play the game.

The math behind it

It uses Gradient Descent along with Logistic Sigmoid function to train and predict. Prediction in this application is done by nine individual Logistic Regression units, and the outcome of each of them corresponds to one of the nine possible spaces in the Tic-Tac-Toe game.

$inputs$	$outputs$
Layout of Inputs	Layout of Outputs
Figure 1

Let $k = 0, 1, \ldots, 8$ be the indexes of nine spaces in the game. Also let, $y_0, y_1, \ldots, y_8$ represent the space chosen by you (to mark either X or O). Let, $X = \lbrace x_0, x_1, \ldots, x_8 \rbrace$ represent the input to the algorithm (see Figure 1). The intelligence that maps the input to a predicted output be represented by $\theta = \lbrace \theta_0, \theta_1, \ldots, \theta_8 \rbrace$ . The prediction $h_{k}(X)$ for the $k^{th}$ space is made as per the equations shown below:

$\nu_{k}(X) = b_{k} + \sum_{i=0}^{8}(\theta_{ki}.x_i)$

$h_{k}(X) = \frac{1}{1 + e^{-\nu_k(X)}}$

Note: $b_{k}$ is the bias parameter. It has the same effect as applying Affine Transform to the $\nu_{k}(X)$ polynomial.

Here, $h_{k}(X)$ represents the predicted output and $y_{k}$ represents the move made by you. So, the purpose of training process is to adjust the values of $\theta$ until $h_{k} \approx y_{k}$ . This is done using the Gradient Descent algorithm.

$\theta_{kj} := \theta_{kj} - \alpha.\sum_{i=0}^{m}( h_k(X^{(i)}) - y_k^{(i)}).x_j^{(i)}$

$\alpha$	Learning Rate, a real valued number that indicates how fast the training algorithm converges.
$m$	Number of training samples.
$\theta_{kj}$	Intelligence parameter indicating how the $j^{th}$ input affects the $k^{th}$ output.
$y_k^{(i)}$	Actual output for the $k^{th}$ space given $i^{th}$ sample as input.
$h_k(X^{(i)})$	Predicted output for the $k^{th}$ space given $i^{th}$ sample as input.
$X^{(i)}$	Represents the $i^{th}$ input sample. $X^{(i)} = \lbrace x_0^{(i)}, x_1^{(i)}, \ldots, x_8^{(i)} \rbrace$
$x_j^{(i)}$	Represents the value of the $j^{th}$ space in the $i^{th}$ input sample.

How does all that work?

Below is a Silverlight version of the application, try it out. After you play the first game, you should see some prediction. The darkest circle is where the prediction algorithm thinks that you will make the next move. Here is the link to the source code: TicTacToe.zip. Note, you have to move for both the players. The predictor only predicts the moves. Until you play the first game with a winning outcome the predictor will not be trained.

Tags: Artificial Intelligence, Gradient Descent, Logistic Regression, Machine Learning, Silverlight, WPF

This entry was posted on Tuesday, February 16th, 2010 at 23:46and is filed under . You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Normal