CS 63: Artificial Intelligence
Lab 9: Neural Networks
by Andy Zuppann and Jonah Volk
DIGIT RECOGNITION WITH NEURAL NETWORKS
CONTENTS:
PROBLEM DESCRIPTION:
The problem we attempted to solve was one of digit recognition. Given a data set of possible digit representations, we used various neural network architectures and tried to teach them to recognize which representation indicated which digit.
The starting point for evaluating digits is to graphically represent a digit as a 6x6 matrix, with 1s and 0s placed to look like a digit. For example, the number one could be represented as:
0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 1 1 1There are, however, several different possibilities for depicting the number one in a 6x6 matrix. Given these different representations, the goal of our neural network was to identify all these possibilities as the number one. Our data set, then, was comprised of 24 different sequences of 6x6 matrix representations of the numbers 0 through 9, giving a total of 240 different matrices as inputs.
The goal of the project was to design various neural network architectures and test their effectiveness at learning to recognize a digit given a representation. We started with a simple architecture where every input node (matrix position) was hooked up to ten hidden nodes. We then tried a variety of different configurations, including focusing on rows, columns and quadrants. For example, our initial attempt assigned 4 hidden nodes to all the inputs, and then 6 hidden nodes taking input from the 6 rows.
RESULTS
In this section, the various architectures we tested are presented and analyzed. All 8 configurations we tested are described and then the average success results for the networks are presented.For all our tests, we ran the network 5 times for 15000 rounds, with the threshold for correct recognition set at 85%. We would like to thank Nori Heikkinen, Oliver Hsu, and Andrew Stout for providing a Perl script that was used to analyze the effectiveness of a network's learning.
Network Configurations:
- Fully Connected: All 36 inputs were connected to all 10 hidden nodes. Image of network.
- Row Connected: 4 global hidden nodes and 6 nodes assigned to each row.
- Column Connected: 4 global hidden nodes and 6 nodes assigned to each column
- Quadrants: 6 global hidden nodes and 4 nodes each assigned to a 3x3 quadrant of the matrix.
- Quadrants and Rows: 6 nodes assigned to each row and 4 nodes assigned to a 3x3 quadrant of the matrix.
- High Localization: 9 nodes assigned to the nine 2x2 matrices and 1 global
- Hemispheres: 6 globals and 4 nodes assigned to the hemispheres - top, bottom, left, and right halves. Image of network.
Average Results:
Average % Success Average # of Digits Found Fully Connected 72.8% 174.6 Row Connected 68.9% 165.6 Column Connected 64.7% 155.4 Quadrants 70.7% 169.8 Quadrants and Rows 50.6% 121.4 High Localization 56.8% 135.8 Hemispheres 74.5% 178.8
DISCUSSION
As seen in the results table, only one of our architectures performed better than the initial configuration of a fully connected network. In fact, several of them performed significantly worse. It is also true that the architectures that performed comparably to the fully connected network were still, for the most part, globally connected. It was only as high degrees of localization and specialization came into play that the network's learning abilities dropped. The reason for this seems obvious. Localization is, by definition, cutting a hidden node off from observing what the rest of the network is observing. Therefore, a node with very few connections is highly detached from what the network as a whole is working towards, limiting its usefulness in helping the network. Contrasting this to a node with several connections, we can see that a fully connected node has the option of only using one or two of its inputs (based on weights of connections), but has a wider set of options when learning.
This explanation for network learning patterns definitely fits with the data we collected. The 'Hemispheres' architecture was, for the most part, globally connected, and the nodes that were local were still massively connected to half of the matrix. Not surprisingly, then, this almost fully connected network did as well as the fully connected network. Furthermore, our tests with the 'Quadrants and Rows' configuration illustrate how leaving out globals harms the system's performance. In that configuration, there was not a single node that was fully attached to the input. The greatest degree of attachment was 4 nodes looking at a 3x3 portion of the matrix, while the other 6 nodes were looking at the rows of the matrix. This limiting of learning possibilities showed in the networks performance, with an average digit recognition rate of 50.6%.
CLUSTER ANALYSIS
We ran a cluster analysis on the results produced by both the original hidden node configuration (suggested by Lisa Meeden) and our most successful hidden node configuration ('Hemisphere'). Both cluster analyses demonstrate a large degree of clustering of like inputs. The original was able to group most inputs together, although it had some problems with 4's, 6's, and 9's. It placed several non-4 inputs into the main 4 section, including 0's, 3's, 9's, and 7's. With both the 6's and 9's, it created sub-groups that were separated from the main group, although this sub-group was much larger for the 6's. Our version also performed well overall, but it tended to have problems with 3's, 9's, and, to a lesser extent, 6's. The net was not able to distinguish between 3's and 9's at all, clustering them all together. Most of the 6's were clustered together, but there were some outliers. Overall, there was a bit more scattering than in the original configuration. Our theory was that, given that our node configuration worked by looking at 4 halves of the board (dividing the board both horizontally and vertically), it would have trouble distinguishing between digits that looked alike in one or more halves. For example, the right halves and/or bottom halves of many 3's and 9's are quite similar, which would lead to their being grouped together.
The clusters analysis print-outs can are linked here: