Yoshi's GA pattern finder



Lab Description


Images of every 20th generation and every perfect match. Scaled up by 100.

Program Domain

My GA tries to find a set of predetermined patterns. Those patterns are the digital numbers one, four, seven and eight. The GA first tries to much eight, then as soon as it finds it, tries to find one, then seven, then four, then back to eight, ad infinitum.

Mapping bit string to image

The patterns I used for this lab is sized 5 pixels by 7 pixels, which means that there are 35 pixels in the image. A chromosome of 35 bits represent an image, pretending to be a two dimenstional array of [5][7]. If the bit is 1, then the corresponding pixel is colored. Else it is black. Therefore, if the 8th bit of the chromosome is 1, it means that the pixel at the second row and the third column is colored (8 = 1*column size + 3).

Examples:

00000 00100 00100 00100 00100 00100 00000


00000 01110 01010 01110 01010 01110 00000


Fitness Function

First I made a simple fitness function that returned the number of grids that the chromosome and each of the patterns matched. This turned out to be very boring, as the choromosomes always converged into the pattern eight. As all the other patterns except one is a subset of eight (you can draw two by taking out a grid here and there from eight), the chromosomes can score best by becoming eight.

So I changed the fitness function to be dependent on the number of generation they are in. For example, until the 100th generation, the chromosome can score high by matching eight, then until the 200th generation, it can score high by matching seven, and so on. However, this turned out to contain a similar problem. If the population produced the perfect match at the generation 80, between the 80th and 100th generations the population kept converging on eight. Therefore, once it was now time to find seven, the population was too homogenous to produce divergent successors.

So I finally settled down to the fitness function that changes the generation it found the perfect match. As soon as it finds a perfect match, it changes the target pattern to the next pattern, so that it does not give the GA time to converge into the previous pattern. This turned out to work quite well, and it succeeded to produce many matching patterns.

Parameters

The parameters I used are as follows:
mutation rate = 0.001
crossover = 0
size = 500
generation = 700

The mutation rate turned out to be extremely important in how the population behaves, which I will discuss in the following sections.

Table of results

For each of the parameters, I have recorded the number of patterns the GA was able to find (this is more informative than the record of the best fitness found, since the fitness function is always changing).

size = 500, generation = 700, mutation = 0.001
Number of matchs = 10, 9, 9, 10, 9, 9, 9, 9, 9, 10

size = 500, generaion = 700, mutation = 0
Number of matchs = 1, 1, 2, 2, 2, 2, 1, 1, 2, 2

*Around 280th generation, all the population converges into the same bit string.

size = 500, generation = 700 mutation = 0.01
Number of matchs = 7, 8, 8, 8, 9, 9, 9, 8, 8, 8

size = 300, generation = 700, mutation = 0.001
Number of matchs = 7, 7, 7, 7, 7, 5, 6, 6, 5, 5

size = 100, generation = 700, mutation = 0.001
Number of matchs = 2, 2, 1, 0, 3, 2, 3, 2, 2, 2

Discussion of results

The first of set of results, with s = 500, g = 700, m = 0.001, is the settings that I used to construct the image at top. In order to test how the mutation rate affected the behavior of the population, I changed the mutation rate to 0 and ran the test. It turned out that elliminating mutation has a devastating impact on the performance. Without mutation the population converges too much so that it can not find the next pattern. So I decided to increase the mutation rate to 0.01, which surprisingly performed worse than at the rate of 0.001. When the mutation rate is too high, on contrary to the last setting, the population cannot converge fast enought to find the pattern fast. Therefore, in order to get the best performance, the mutation rate needs to be tuned to somewhere around 0.001 (I was lucky to start working on this with the default setting of 0.001).
I experimented with various size of populations, but I find that I need at least the population of around 300 to get good results. If the size is too small, as you can see from the results with s = 100, the population is not diverse enough to find the next pattern after the previous match.

Conclusion

In general, I would say that my GA was successful given its domain. It can keep finding the next pattern without the population converging into one pattern, even if I set the generation to infinity.
The crucial (yes, this is very important) aspect of my GA is not that it can find a set of predetermined patterns, but that it produces patterns that are in between. The same algorithm can be applied to, for example, matching different human faces or landscapes. Given two different images, the GA would be able to produce a succession of images that lie between them. If we told the GA to first match a human face then to a landscape, we would observe first the decompsition of the human face, then a succession of images that slowly converges into a landscape. That would be something interesting to see, and when I find the time (god knows when) I might make a more efficient GA algorithm (not an array of int to represent a single bit) that can work on a million lenghthed bit string. Even if I don't go that far, I could probably be able to make images of reasonable size (100 by 100 or so) that are colored, which could be used for screen savers or something rather.