A section discussing whether our programs worked or not and how we tested them to find this out.
After we had written the programs, the testing phase was fairly simple -- we compiled and ran the programs and checked to see whether the data we observed matched with the expected outcomes.
For determining endianness, we checked to see whether our program printed out a '0' or a '3'. This let us know the order data was represented internally. Both Pentium processors we tested were Little Endian, while the UltraSparc 2 was Big Endian.
The next program we wrote was to test the cache structure. After writing the code, we then tweaked with running times and size of cache to arrive at generally good standards for determining differences between memory access times when data was stored in cache and when the data was in main memory. For the programs were we used blocks of memory much greater than the size of cache, access time dropped, as expected. Also, our 2-way nice program behaved correctly when ran on different machines. That program took the same amount of time as the mean program on the UltraSparc, which uses direct mapping. However, on the Pentium family processors, this program ran extremely fast.
While testing the cache structure, we also encountered some problems regarding L2 cache which are detailed on our Problems page.
We then wrote a program to test the speed of various number representations. We expected that speeds would be based on the size of each data type, with 8-bit integers being the quickest, and 8-byte doubles taking the most time. We also suspected that both forms of integer represenation would be faster than the two real number representations. Neither of these hypotheses were correct. By far, the fastest performance occurred with the float calculations. Even doubles performed respectably. The biggest surprise was that the 8-bit integer representations performed the slowest on both the P3s and UltraSparcs. We suspect that this is because the 8-bit integer is much smaller than the word size on these machines so padding may occur to maintain size.
Our final program was used to test the superscalar architecture of processors. We ran four tests, each with a varying level of dependencies in the code. Each of these ran as expected, with the function without dependencies taking the least amount of time and the function where every line was dependent on the previous line taking the most time. This made sense, because as more dependencies occurred, there was less opportunity for processors to take advantage of multiple ALUs or other superscalar functions.
Click here to go to the previous part of the lab write-up.
Click here to go to the next part of the lab write-up.
Click here to return to the main menu.