<template>
  <div class="d-flex justify-content-evenly">
    <p><small>Number of games played:</small> {{ this.ngames?.format('0,0') }}</p>
    <p><small>AI winrate:</small> {{ this.ai_winrate?.format('0.00') }}%</p>
  </div>
  <h1>Project AIGammon</h1>
  <p>
    This project is an implementation of the paper <q>Temporal Difference Learning and TD-Gammon</q> by Gerald Tesauro,
    1995.
    The project deviates from the paper on some aspects, it uses more nodes, it does not implement any position
    encoding (encoding was in a later paper demonstrated by Tesauro as unnecessary) and it uses a binary win condition
    i.e
    no gammon or backgammon rules. Other than that it's should represent the approach used by Tesauro. The AI learns
    to play backgammon from nothing more than a set of random weights and an environment to play in.
    The learning principle is as follow, evaluate the state value of each available action using the neural network.
    Then having made the move, evaluate the actual value of that state also using the neural network except when
    the game is won by either players, then return 1.0 if the game was won by WHITE or 0.0 if BLACK won.
    Below is the exact formula used to update the weights from one move to another.
  </p>
  <figure>
    <img alt="TD(lambda) formula" src="/9a05bf7064be0e9babe7707883cbc4db340d0635.svg" title="TD(lambda) formula">
    <figcaption>Weight updates are defined by this TD(&#955;) formula</figcaption>
  </figure>
  <p>
    This setup will adjust the weights of the network to benefit moves that leads to winning the game in the long term.
    One way to interpret the state value is the likelihood that WHITE wins the game, this allows for a simple minimax
    policy where WHITE tries to maximize the state value while BLACK wants to minimize it.
  </p>
  <p>
    Links worth visiting:
    <a href="https://en.wikipedia.org/wiki/TD-Gammon">Wikipedia</a>,
    <a href="https://bkgm.com/articles/tesauro/tdl.html">Original Paper</a>,
    <a href="https://medium.com/clique-org/td-gammon-algorithm-78a600b039bb">This blog post</a>
  </p>

  <h3>Why Backgammon?</h3>
  <p>
    The short answer is that Backgammon just happen to have very beneficial features for reinforcement learning.
    An very important part of reinforcement learning algorithms is the exploration vs exploitation balance.
    In backgammon this balance is already in the game, there are enough stochasticity in the dice throws that
    explicit exploration is not needed. Backgammon is a two player game with a reasonable branching factor. The
    rules are not too complex and an environment can easily be coded in less than 4 weeks. All these aspects makes
    Backgammon a perfect game to demonstrate many important areas of reinforcement learning and does so with more depth
    than tic-tac-toe.
  </p>
</template>

<script>
import BoardService from "@/services/board-service";

export default {
  name: "AboutPage",
  data() {
    return {
      ngames: null,
      ai_winrate: null,
    }
  },
  async created() {
    try {
      let numeral = require('numeral');
      let response = await BoardService.stats();

      this.ngames = numeral(response.ngames);
      this.ai_winrate = numeral(response.aiwinrate * 100);
    } catch (err) {
      console.error(err);
    }
  }
}
</script>

<style scoped>
figure {
  margin: 1em 0;
  display: flex;
  flex-direction: column;
  align-items: center;
}

figure > figcaption {
  font-size: smaller;
  margin-top: 1em;
}

img {
  width: 50%;
}
</style>
