By Tyler Johnson and Isaac Reibman

In the world of game theory, we refer to games like Catan, Risk, and Civilization 6 as large scale strategy games. The defining characteristic of these games is their elements and how they interact. Games often give players the option to compete against the computer. These computer players are called Artificial Intelligences (AIs). The purpose of this AI is to challenge the players equally.

Most AIs follow a given algorithm to decide which move to make. However, due to the scale and complexity of these games, writing a perfect algorithm is incredibly difficult, if not impossible. Thus most current AIs have flaws that can be exploited by the player. While a human can adapt its strategy when it realizes it is failing, the algorithmic AI will continue to make the same decisions. This removes the challenge and defeats the purpose of AI.

Charles Madera, Vincent Corubal and Gaber Ramalho, researchers at Pierre and Marie Curie University, decided to approach the AI ​​problem in a different direction. Reinforcement learning (RL) is a form of machine learning based on the concept of trial and error. After the AI ​​makes a move, it determines whether the move was good or bad. It then adjusts its strategy accordingly and continues. This allows him to learn from his mistakes. The team wanted to see if applying RL to large-scale strategy AI could help computers make better decisions.

Reinforcement learning works well in games like backgammon, but large-scale strategy games are more complicated. To demonstrate the difference, the researchers used John Tiller’s Battleground™ for their AI. It is a complex game with a battlefield of hexagons and hundreds of units. There are 10 total in the whole game of backgammon20 Independent States – Boards can be at any time. By comparison, Battleground™ has 10 in just one scene1887 possible states. To cope with the enormous scope of this game, the researchers decided to divide the learning into phases.

In Battleground™ each player has control over all aspects of his army. It covers everything from the general’s orders to the actions of each individual soldier. Learning to control every aspect of the army is challenging, so the researchers only allowed the AI ​​to learn to control part of the hierarchy. Prebuilt AI, as it is called Bootstrap AI, Control the rest of the army and the opponent. Thus, learning AI only needs to study a small part of the decision-making process at any given time.

In order to make good decisions, RL AI needs to understand the state of the game, as it is known State of play. This lets the computer know what to consider when making a decision. He also needs to know what moves he can make, so is known action space. Finally, it uses a “reward function” to know how well it is doing. These three things vary from game to game and even the AI ​​for the same game.

This is the most challenging part of making RL AI. AI can’t make smart moves if it doesn’t understand the game state and action space properly. If the reward function is flawed, the AI ​​won’t be able to tell if it’s doing a good or bad job. Thus it will not adjust its behavior properly.

The researchers determined that the positions of the units relative to the layout of the terrain were the most important for the AI ​​to understand. This will allow the AI ​​to identify strategic locations, such as places to hide troops and the best lines of sight and fire. They used the specific scenario of a battle between the French and Russian armies in 1812 to test the AI.

They used two different variations of AI. One, called neural network (NN) LAI 1, used a single “brain” for all operations. Another, called NN LAI 45, uses different “brains” for each task. Both the learning AI controlled the French army while the bootstrap AI controlled the Russian army. They trained the AI ​​10,000 times and then tested the training results against random movements, professional AI and humans.

After 500 games, the AIs reached the skill level of the average human player. A single “brain” AI consistently scored in the 50-100 range. The multiple “brain” AI took much longer to learn, but scored higher on average, between 100-175. An average human player has a score of around 80 and a professional AI has a score of -380.

Both AIs were able to adapt to the game and outperform professional AIs. A single, brained AI kept a constant score just like a human player. The multiple brain AI took more risks, so its score varied more in the game but ultimately performed better. This means that learning AI could be the next step in large-scale strategy AI. They can challenge players more equally, creating a more engaging single-player experience.