“the j stands for Jonah”
about me | research blog | wordpress plugins | jQuery plugins

15 February, 2013

Determinate and indeterminate noun plural forms

Maltese nouns have two potential plural forms, determinate and indeterminate. The distinction is exhibited in examples such as:

English Singulative Determinate plural Indeterminate plural
road triq triqat toroq
tooth sinna sinniet snien

However it seems that in reality there are very few nouns which actually have both forms. An analysis of the 184 nouns in the GF Resource Grammar Library mini-lexicon shows that:

  • 14 (~7%) have both forms, though I would argue that many of these sound kind of arcane, e.g. ġbiel (ġebliet), xgħur (xagħariet), għejun (għajnejn).
  • 158 (~86%) have just a determinate plural
  • 3 (~1%) have just an indeterminate plural
  • 9 (~5%) have neither plural form. This is usually compensated by a collective form (e.g. baqar), a dual (e.g. riġlejn) or simply a singulative (e.g. plastik).

While this distinction can have some linguistic importance, for the purposes of the GF implementation will be simplified slightly, by storing only one plural form. This change will be made internally in the noun representation, so that the paradigm constructors are not affected and as such we still have this information available (although it is just being ignored for our purposes).

Another solution is to have indeterminate plural forms stored simply as variants of the determinate plural. I think that in most cases one could get away with this, though for now I am steering clear of all variant just to keep testing simple.

20 January, 2013

Removing inferred roots from verb smart paradigms

In the Maltese resource grammar implementation I had some code which tried to extract the radicals from a so-called mamma verb form. So for example:

classifyVerb "ħareġ"

would give (amongst other information) the radicals Ħ-R-Ġ in record form. This works well most of the time, except for cases where it is completely impossible to guess the missing radicals from weak-root verbs. For example, dar is actually the mamma of two distinct verbs, one with root D-W-R and another with D-J-R.

The usual way of dealing with this is to have a less-smart fallback in your paradigm, which takes an explicit root in such ambiguous cases. But the reality is that in this case we don’t even need the smarter version of the paradigm. The set of root-and-pattern verbs in Maltese is a closed set, so there are no new such verbs being added to the language (all new verbs are today added as loan verbs). Furthermore, this list has already been compiled by Michael Spagnol is his PhD thesis, and we even now have it in database form here. I am using this to directly build a monolingual Maltese verb database in GF, and since I already have the radicals for all these verbs, there really is no need at all to try and determine it automatically in a smart paradigm. As my professor Aarne Ranta likes to say, “don’t guess what you know.”

Changing the verb implementation

Perhaps I saw the signs earlier than I would like to admit, but it has become clear now that my current implementation of Maltese verb morphology in GF has taken the wrong direction and needs to be significantly re-written. Having an inflection table with close to 1000 forms is not just a headache implementationally, but also arguably not linguistically accurate either.

So the new plan, which is what is done in the implementations for Italian and Finnish, so to remove pronominal suffixes from the verb’s inflection table, and instead use binding on the syntax level to produce these forms. Reducing the inflection table is the easy part, but getting the rest to produce correct results might be tricky since the stem sometimes changes depending on the pronoun being suffixed.

So anyway I have created a new branch to work on this, so that at any point I can switch back to the original implementation if I want to compare something or if I end up wanting to use that approach again.

27 July, 2011

Gedit syntax highlighting for Grammatical Framework source code

There is now an official page on the Grammatical Framework website about GF Editor Modes.

For correct syntax highlighting in Ubuntu’s default text editor (gedit) for anyone editing Grammatical Framework source code, put the code below into the file ~/.local/share/gtksourceview-2.0/language-specs/gf.lang (under Ubuntu).

Some helpful notes/links:

  • The code is based heavily on the haskell.lang file which I found in /usr/share/gtksourceview-2.0/language-specs/haskell.lang.
  • Ruslan Osmanov recommends registering your file extension as its own MIME type (see also here), however on my system the .gf extension was already registered as a generic font (application/x-tex-gf) and I didn’t want to risk messing any of that up.
  • This is a quick 5-minute job and might require some tweaking. The GtkSourceView language definition tutorial is the place to start looking.
  • Contributions are welcome!

14 July, 2011

Geany syntax highlighting for Grammatical Framework source code

There is now an official page on the Grammatical Framework website about GF Editor Modes.

I wrote a custom filetype config file for the Geany text editor, providing correct syntax highlighting for anyone editing Grammatical Framework source code. Put the code below into the file /usr/share/geany/filetypes.GF.conf (under Ubuntu). You will need to manually create the file:

Light Version

Dark Version

You will also need to edit the filetype_extensions.conf file and add the following line somewhere:

GF=*.gf;

Hope that helps someone who doesn’t feel like reading through the Geany documentation! (which btw can be found here).