8 August, 2012

Hundreds of forms, but nowhere to check them

A previous post showed just how many inflectional forms there are for a single verb in Maltese. But while writing the algorithms for producing such tables, I repeatedly find that for many of these forms, there is no real way of checking them for correctness, because no such other resource exists.

There is the Korpus Malti, but despite containing nearly 100 million tokens, there are numerous grammatically-correct verb forms which do not occur anywhere in the corpus. No traditional dictionary would contain every possible inflected form for each verb, for reasons of size, so in many cases I must simply resort to “best guesses” and intuition. There are so-called verb models which are used in Maltese verb conjugations, e.g. the verb lagħab (he played) should be conjugated as seraq (he stole), but that only covers radical-placement, not vowel changes. For example, which is correct: naqtgħak or naqtgħek? The former does not appear at all in the corpus, and the latter appears just once, from a public blog entry. Not exactly hard evidence, is it?

