Research Diary, Summer 2003

Emily Thomforde '04
Swarthmore College
ethomfo1@swarthmore.edu

"'The language' is a statistical abstraction." (Steels 1999)


Swarm Doc

Allspice

Mark

David

Edinburgh


VHFC Results

Swarm Primer

Simulation


Last Week
Next Week
Current

Summer 2003: Modelling the emergence and subsequent dispersion of vowel harmony in artificial agent populations, with an impetus from homophony. See Previous Work

18 August 2003

All of today will be spent in preparation for reporting all I've done to David tomorrow. This includes conclusions from parameter adjustments on the simulation, the addition of expected harmony to the results, the theory behind a mutable homophony tolerance, and the progress on Hungarian vowels.

And here is my theory on expected harmony: Because we are always dealing with the same base lexicon of thirty words, and because all words are two syllables in length, and because there is no syllable bias for vowel class occurance, the expected harmony under any combination of parameters will always hover around 0.5, usually within the range of 0.48-0.52. It follows that, because we are simulating the same 'language' in each run, expected harmony for that language will remain the same. We know from testing actual languages that observed harmony is no lower than expected harmony in languages that are historically harmonic. Therefore, any parameter settings that cause observed harmony to end up considerably (where 'considerably' is subjective) below 0.5 for this lanugage, are unsuccessful.

On the subject of Hungarian vowels, I have extracted all those words which are disharmonic in the first two syllables and ordered them alphabetically. From this list we should be able to detect Anna's prefixes, if present. There are 85 forms disharmonic in the first two syllables, and 13 in the last two. I am unqualified to reduce this list to lemmatized forms, but if I were to try, I would get 31 and 9. This data is consistant with the hypothesis that fewer words will be disharmonic in the last syllables than the first. I cannot draw any further conclusions without knowledge of Hungarian prefixes.

I think I've concluded from my many simulation runs that the parameters are best as they stand, at least from the upward curve's perspective. On the downward side, the parameters with the most dramatic effect are the proscriptively downward ones: probMisspeak and homophTolerance. I like these at 0 and 0.25, respectively. The other parameters behave as expected, and show successful results within a reasonable range. But for the sake of simplicity, their current defaults are most desirable.

Daily total hours: 5

_________________________________________________________

19 August 2003

I've summarized my findings from the simulation runs, and posted them very briefly on the primer page, along with links to the results, sorted by parameter.

For tomorrow: print exemplars from simulation page, reformat to print nicely; fix vowel inventory in Spanish; collect text from all written sources; decide if monosyllables are counted in frequency data; find parameters that make O(harm) hover at 0.5; get corpus size on VHFC chart.

Daily total hours: 5

________________________________________________________

20 August 2003

Only half the simulation pages are reformatted; beware when printing. With reference to yesterday's list, everything is done and yes, monosyllables are included in frequency data.

Daily total hours: 5.5

________________________________________________________

21 August 2003

New Uzbek corpus, new Uzbek numbers (on VHFC page). Working on Finnish; need tomorow's news today. Meeting with David Tuesday. Make spreadsheet of vowel distributions in all corpora, front/back.

Daily total hours: 4

_________________________________________________________

22 August 2003

I finished the Finnish corpus and ran the numbers. Results are on the VHFC page. Still the matter of diphthongs, though. You can see the frequency of all the diphthongs here. The next task is to compile a comprehensive spreadsheet of vowel distributions.

Daily total hours: 3

Weekly total hours: 22.5

Hours left: 9.5