CS 63: Artificial Intelligence
Lab 4: Genetic Algorithms
by Andy Zuppann
SOLVING THE PRISONER'S DILEMMA
CONTENTS:
PROBLEM DESCRIPTION:
The problem of the Prisoner's dilemma is a typical example of a theoretical game. Imagine two people are captured. They are isolated in jail, and are deciding whether to rat on their partner or not. They know the following conditions:
This is the Prisoner's Dilemma. It can be represented in a 2x2 choice matrix:
Player 2 Player 1
C D C 3,3 0,5 D 5,0 1,1Here, the values represent (5 - years) in jail for Player 1 and Player 2 respectively. The goal for each player is to maximize the value they recieve.
Interestingly, for a single round of the game, both players will always choose to defect. Although this is the dominant strategy, it prevents the best outcome - 3,3. What is more interesting is behavior when the game is iterated - an indefinite number of rounds is played. This allows players to enforce the best possible solution by punishing bad behavior.
A strategy for an Iterated Prisoner's Dilemma is not obvious. The goal of my genetic algorithm tests was to come up with the best possible strategy.
STRING REPRESENTATION
For the representation, I used the model presented in Melanie Mitchell's "Introduction to Genetic Algorithms." It is based on a method developed by Robert Axelrod.
To decide on a current move, a current player will look at the past three rounds before making a decision. If a single round can be represented by 2 bits: CC, CD, DC, DD, a three round memory would require 6 bits. By choosing to represent a cooperation (C) as a 0 and a defection (D) as a 1, a simple history pattern can be recorded. Examples of such a history are:
In order to choose a current move, a Player must have a strategy for ALL possible previous three rounds. In other words, a Player must have 2^6 or 64 different strategies on how to play. Also, a strategy consists of defecting or cooperating, which can be represented by a simple bit - 0 for cooperate, 1 for defect.
These facts allow the creation of our bit string for the GA - a 64-bit string. Each bit represents a strategy for a given past three rounds. For instance, bit_string[0] would give the strategy when the past three rounds were 000000. The index value for the bit string is directly computable from the 6-bit history; it is merely the conversion of the 6-bit history into a number in base-10.
For added robustness, this 64-bit string has an additional 6 bits representing the history that a Player should consider when deciding his/her first move.
So, the representation for the prisoner's dilemma is a 70-bit string. The first 64-bits are to be used to determine a current move based on past moves. The last 6 bits are to be used to get things rolling.
FITNESS FUNCTIONS
The fitness of a given strategy was determined by running the strategy against a barrage of standard tests, all of which represent simple possible behavior strategies. These strategies are:
Some of these strategies were taken from http://www.lifl.fr/IPD/ipd.html.
An initial fitness function was made that only ran tit-for-tat against the generated strategy. Although an effective strategy was created against tit-for-tat (always cooperate), I thought adding the challenge of dealing with an array of strategies would provide a more robust solution.
I then played around with the possibility of pitting strategies against each other. However, I felt that such a tournament structure was unlikely to provide very robust strategies. The solutions would be able to handle battles against the population members, but that did not seem to provide a very solid strategy for challenging other, more accepted strategies.
In the end, I felt a barrage of possible strategies would effectively gauge the skill of a generated bit-string.
PARAMETERS
To test the effect parameters had on the results, I used a baseline for all the parameters based on what seemed reasonable. I then ran several tests, modifying one parameter to another, experimental value. The results of these tests will be discussed later, but, for reference, here are the baseline and modified parameter values
Parameter Baseline Modified To pop_size 10 20 generations 500 1000 crossover rate 0.7 0.1 mutation rate 0.001 0.4
RESULTS
I ran the genetic algorithm to search for a solution to the iterated prisoner's dilemma problem with 5 different parameter settings. Each parameter was run 10 times. The results of those runs are here. One thing to note is that optimal fitness for a strategy is the number of rounds (25), times the best possible outcome for each round (5), giving an optimal fitness of 125. However, the social optimum is a better scale against a good strategy, as we would hope both players recieve 3 on every round. Therefore, comparing a solution's fitness to 75 is the best measure for the tests I ran.
TEST 1: BASELINE
RUN FITNESS GENERATION 1 63.3 449 2 64.9 422 3 67.1 486 4 64 252 5 64 18 6 65.3 267 7 63 1 8 65.6 3 9 64.8 180 10 63.6 4 AVG 64.56 208.2TEST 2: Population size doubled from 10 to 20
RUN FITNESS GENERATION 1 64.8 20 2 65.3 78 3 63.5 436 4 65.1 2 5 64.2 1 6 66.9 67 7 66.5 221 8 64.9 10 9 65 27 10 64.2 5 AVG 65.04 86.7TEST 3: Generations doubled from 500 to 1000
RUN FITNESS GENERATION 1 65.6 911 2 66.3 777 3 66.8 879 4 62.2 1 5 63.1 133 6 66.2 630 7 66.1 107 8 65.3 396 9 66.2 238 10 63.5 868 AVG 65.13 494TEST 4: Crossover rate changed to 10% from 70%
RUN FITNESS GENERATION 1 62.3 0 2 64 225 3 65.3 186 4 66.6 147 5 64.6 369 6 63.8 63 7 61.4 0 8 65.7 0 9 62.9 270 10 61.6 134 AVG 63.82 139.4TEST 5: Mutation rate changed from 0.1% to 40%
RUN FITNESS GENERATION 1 65.8 254 2 66.2 130 3 67 313 4 66.9 111 5 67.7 83 6 66.6 5 7 67.5 384 8 67.2 129 9 67.6 168 10 67.2 431 AVG 66.97 200.8
DISCUSSION
All in all, I was pleased with the results of the genetic algorithm on solving the Prisoner's Dilemma problem. Although no strategy was ever generated that could routinely average above the social optimum fitness of 75, the progress of the strategies as the GA worked was quite satisfactory. Average fitness consistently rose from the low 50s to the mid 60s, indicating better strategies were being generated.
One unfortunate point in this experiment is the capability of strategies to outperform humans. I included functionality for the problem such that after the GA had finished, a human could play the best individual found. Although the computer-generated strategy was good, it was often easy to pick out flaws in its strategy. I easily defeated several opponents, and, in the process scored substantially over 75.
One of the reasons for this behavior is the selection of tests that I ran against members of the population. Although there were 10 tests, they are but a small subset of possible strategies. There is also no indication that they are general enough to simulate omitted strategies. Axelrod in his initial studies used 8 general strategies to simulate the whole set. I was unable to discover the exact strategies he used, and was forced to use several sensible and suggested patterns.
This problem suggests that the solution that the GA provides is extremely good at fighting off the strategies I tested it against. However, it is unlikely to be robust against all possible strategies or a human player adept at finding its weakness and exploiting it. If I were to continue research into this subject, a primary goal would be the expansion of tests to generate a more robust solution.
Another interesting feature of the data is the changed results of the GA when the parameters were modified. The results suggest that more generations leads to better solutions on average, although an increased population needs fewer generations on average to arrive at the best solution. However, both of these improvements are costly in time. A much better strategy is likely found in modifying the mutation rate. A shift from 0.1% to 40% led to a marked improvement in average optima, from 64.56 to 66.97. The potential advantage from a correct mutation rate specification could lead to substantial gains in results with very little cost added. (Note that my average estimates have not gone through rigorous statistic evaluation. A proper study would need to be conducted to determine whether these differences are actually significant. Currently, the data are merely suggestive of the conclusions I describe).