Adventures in Vowel Harmony 2003

Research Diary, Summer 2003

Emily Thomforde '04
Swarthmore College
ethomfo1@swarthmore.edu

"'The language' is a statistical abstraction." (Steels 1999)

Swarm Doc

Allspice

Mark

David

Edinburgh

VHFC Results

Swarm Primer

Simulation

Last Week
Next Week
Current

Summer 2003: Modelling the emergence and subsequent dispersion of vowel harmony in artificial agent populations, with an impetus from homophony. See Previous Work

11 August 2003

Before the summer is over, I will have to have completed the parameter adjustments and come to a conclusion about an optimal range of values, figured out the Hungarian prefixing problem, documented the current version of the VHFC, and written up my changes to the simulation. I also need to get all my files updated and archived. My piece of that for today will involve Hungarian vowels and zeroed probMisspeak. Wish me luck.

With regard to Hungarian vowels, I have posted a revised list of disharmonic words here. The list is in alphebetical order, so it is easy to spot duplicates, and also convenient for detecting special prefixes. I do not know what these prefixes are, but if I did, I could compare their relative presence in harmonic and disharmonic contexts.

Alas, simulation woes again. None of the runs I've done recently behave the way I'd like. Number 84 is the one I like best, because harmony does not return to initial levels. It dissipates only so far as is required to reduce homophony below the sensitivity threshold. I think this is theoretically sound and consistent with our homophony model.

Daily total hours: 5.5

_______________________________________________________

12 August 2003

I've spent the day organizing my simulation data into a usable format. Now you can click on any of the parameter headings and look specifically at the impact of changing that parameter. There are still a few holes in the data, but they're filling up quickly. The run of the day is now 110. I like it because the downward curve is the closest to sigmoid I've seen yet. The first part, that is. There's a bump later, but running the simulation longer will get more of those bumps, as harmony can now propogate without the threat of homophony. You can see the effect over 2500 years here. I was also impressed by how low the harmony values got, though I shouldn't be, because probMisspeak was relatively high. The homophony graph is surprisingly symmetrical, and sicks to zero instead of squiggling around like other runs. The PP graph, too, is regular and sigmoid. The only feature 110 leaves to be desired is a smooth up-curve. But it gets there eventually.

Inspiration struck me today as I contemplated peaks in the harmony graphs. How high should I expect them to go? Is height a property of the parameter settings, or of the original lexicon, or of the particular initial distribution of words in the lexicon, or of the average of all agents' lexica at any particular time step, or of the union of all agents' lexica at any particular time step? And could I devise an interface to the VHFC from Objective C code? Ideally, I would like a second line on the harmony graph keeping track of the expected harmony levels for the current lexicon.

Daily total hours: 6.5

__________________________________________________________

13 August 2003

Victory at last! I spent the whole day trying to track a harmony threshold in the simulation. Results can be seen at run 117. Now I'm trying to put together a defense of why probMisspeak should be kept at zero, based on the relationship between observed and expected harmony. I think that a theoretically sound model would never cause observed harmony to dip significantly far below expected harmony. We have Uzbek to thank for this intuition.

Daily total hours: 5.5

__________________________________________________________

14 August 2003

Despite being locked out of the lab for most of the morning, I've come to some conclusions about the harmony threshold.

First, a description of my changes to the simulation:

All action take place as usual. No agents do anything different, and consequently the results at the standard randomness (no -s) remain the same. The only difference is that an additional measurement is being taken. I added methods to Bug.m, Word.m, and ModelSwarm.m to calculate vowel distributions and the expected harmony levels for individual agent lexica. I use a modified formula from the VHFC to calculate the harmony threshold: P(harm) = P(f&1)*P(f&2)+P(b&1)*P(b&2) This is a lot easier because the lexicon contains only disyllabic words, and no neutral vowels. To find the probabilities of vowel classes in syllables, I use a Maximum Likelihood Estimate and assume P(f&1)=#front vowels in first syllable / number of words. The other values are estimated accordingly. I added an extra probe to the harmony graph, so it now shows observed and expected harmony as two separate lines in the same window.

In doing this, my intention was to give some meaning to the changes in harmony. I observed that certain runs would peak at 0.7, and some as high as 0.9, but I had no way of testing if that was a property of the lexicon, or of some other factors. Now I have a concrete measure of the average lexicon to compare to the average harmony. With this, I hope to be able to classify the differences in scale I see so often.

In addition, access to probabilistic information will also provide supporting evidence for the effects of changing probMisspeak. I feel instinctively that probMisspeak exists only to bring down harmony levels to an artificially ow point. I would like to see if different values of probMisspeak cause harmony to dip below the expected line. In doing so, it may violate the pattern established by the data from the VHFC, indicating a deficiency in theory.

Daily total hours: 6.5

____________________________________________

15 August 2003

Today I'm tying up loose ends: finishing the documentation on my changes, updating the Swarm primer, filling gaps in the simulation tests, and trying to get any value but 0.5 for a harmony threshold.

The methods I added to the simulation are as follows:

In Word.m:

findFirst: harmonyClasses - determines the class of the first vowel in a word; returns 1 if in class 1, 2 if in class 2.

findSecond: harmonyClasses - determines the class of the second (last) vowel in a word; returns 1 if in class 1, 2 if in class 2.

These two methods were added to enable bugs to calculate vowel class frequencies in their lexica. It was easier to write two separate methods than one with two return values.

In Bug.m:

evalThresholdOfLexicon - calculates the distribution of vowel classes in syllables and uses this data to determine the expected harmony levels of a bug's individual lexicon with the formula P(harm) = P(f&1)*P(f&2)+P(b&1)*P(b&2). Returns the harmony threshold as a float value.

In ModelSwarm.m:

evalThresholdOfPopn - calculates the simple average of all the individual harmony thresholds of the agents in the population.

Daily total hours: 4

Weekly total hours: 28