“the j stands for Justify”
about me | research blog | wordpress plugins | jQuery plugins

6 July, 2012

Partial tables

Research / / 1:48 pm

Though still in the early stages of the Maltese grammar, my feeling is that it is a language characterised by many partially-filled inflection tables. For example, my understanding so far is that there are 5 distinct number forms for nouns:

  1. Singulative (1 or more than 10)
  2. Collective (non quantifiable)
  3. Dual (exactly 2)
  4. Indeterminate plural (between 2 and 10)
  5. Determinate plural (non quantifiable)

My observations so far seem to indicate that a noun can have almost any combination of the above forms. In other words, when it comes to quantifying a noun to form a noun phrase, for example, one has to basically check which forms are available and then proceed accordingly. These following examples indicate, for me, this apparent lack of regularity in which forms can co-exist within any given noun:

English Singulative Collective Dual Determinate plural Indeterminate plural
leg riġel riġlejn
knee rkoppa rkopptejn rkoppiet
tooth sinna sinniet snien
tree siġra siġar siġriet
stone ġebla ġebel ġebliet ġbiel
leaf werqa weraq werqtejn werqiet

and so on.

This seems to also be the case when it comes to pronominal suffixes. Most nouns referring to body parts or “innate” things take such suffixes, but in many cases these tables seem incomplete, or only valid for certain number forms:

English Singular (+ P3 Sg Masc) Plural (+P3 Pl)
wife mara martu nisa – (in-nisa tagħhom)
tooth sinna – (is-sinna tiegħu) snien snienhom
face wiċċ wiċċu uċuħ uċuħhom

28 June, 2012

First [research blog] post!

Research / 9:18 am

As I am about to begin my M.Sc. studies and once again take up the work of building a computational grammar for Maltese, I think it’s a good time to set up this so-called “research blog”. This is really nothing more than a new category in my old blog site, but the idea is that I will post here all observations, difficulties, milestones and open questions which I am bound to come across during my work. This in other words shall become my research log book, which presumably will be of some use some day in the future when I am writing my thesis and need to look back to try and remember what it is I actually did during all those months. This whole exercise is essentially for my own personal purposes, but I’m not against helpful comments if you happen to have any 🙂

Warning: Since this will very much be a work-in-progress type of blog, I expect my initial observations will often be unresearched and potentially incorrect, and as such nothing written in this blog should be taken as established fact.

30 March, 2012

Emacs

Ever since switching to Ubuntu from Windows, I’d never really found a text editor I was truly satisfied with. I spent most of my time using either Geany or GEdit, and while both are quite fine, somehow neither ever felt complete.

So a few weeks ago, after meeting probably the most hardcore power user I’ve ever known (he types in Dvorak with a blank keyboard), I decided that I would embark on the long voyage towards teach becoming proficient with the legendary text editor, Emacs.

At first it felt incredibly masochistic, like the computer science equivalent of cutting your wrists just to feel alive. Yet after two weeks, the benefits are slowly beginning to become apparent. I’m still not sure if have become a convert yet. The pain is still there of course, but somehow that almost serves to convince me that I am in fact doing the right thing.

8 February, 2012

The sorry state of LaTeX editors: a manifesto

Firstly, I love LaTeX, I love coding, and I generally love feeling like a geek. Being able to produce beautifully typeset documents is hugely satisfying to me. But if you use LaTeX, you probably spend a good deal of your time actually writing markup rather than writing, well, words. Perhaps this is why LaTeX tends to be most popular in the computer science and engineering disciplines; you more or less need to have the skills and will of a coder to produce anything useful with it. But just because you can code, it doesn’t mean that you should always code.

Putting effort into your typesetting is a good thing. Producing elegant documents requires discipline and attention to detail, and LaTeX certainly gives you the right tools for exercising this discipline as far as you care to. But when you’ve set up your document layout just the way you want, defined your custom commands and prepared your reference and labels, it’s time to actually write.

Writing code and writing prose and two entirely different things. Even though LaTeX is a markup language, putting more emphasis on your actual text content that other paradigms, the truth is that its commands can quickly become unwieldy and dominate over the rest of your copy. And not being able to focus on what you are actually writing because of all the markup in the way is definitely a bad thing.

Writing with LaTeX requires two separate skillsets: markup-language programming and prose writing — but also one meta-skill: the ability to repeatedly switch back and forth between these two modes. Almost every tool or resource you’ll find in the LaTeX universe is focused only on the first of these skills, but never on the others. Nobody ever seems to bother promoting a writer-friendly environment for using LaTeX .

Every LaTeX editor I’ve tried has the same programmer-centric approach: split screen between editor/preview, syntax highlighting (which always breaks), brain-dead auto-completions, and the worst possible environment for writing you could ever imagine. LaTeX is capable of producing such beautiful output, why is it that writing LaTeX should be so painful and counter-aesthetic?

My first ever TeX IDE was TeXnic Centre, which I first used in 2009. Back then, it looked like Microsoft Office 1997. It doesn’t seem to have changed since. I floated between various basic text editors, TeXworks, Texmaker and Kile, each one pretty much as disappointing as the others. The first thing that annoys me is that in every single IDE, the auto-suggest is entirely dumb. Since they don’t do the dirty work of looking into your imported packages and determining what commands and environments they provide, your auto-suggestions are really only a list of default LaTeX commands. I understand that scoping into LaTeX packages is probably an absolute nightmare, but not doing so makes the auto-completions entirely useless. They also do some kind of reference processing to, so that when you type \cite{ you get a list of the things defined in your BibTeX file. Stop the presses. Quick-building and previewing is obviously nice, and some editors even do their best to jump to the newest addition in your freshly-compiled document. It’s never perfect, but it’s barely useful enough to prevent you giving up on life altogether.

There are now also a bunch of web-based TeX editors floating around, the most notable being ScribTeX. The advantages of not having to install LaTeX and deal with all those ridiculous intermediate files are clear. But guess what? They’re just clones of the desktop versions. Syntax highlighting, compilation/preview, and auto-completions that are only useful if you are not importing any other LaTeX package (read: useless). Their implementations may be elegant from a technical point of view, but they are not innovative in any way which I can see. But the biggest disappointment of all these LaTeX editors it that every last one of them completely disregards the notion that productive writing thrives in an environment that specifically promotes it; free from distractions and free from complicated markup.

What I’m basically talking about is the need for a distraction-free “Zen” editor. This is not a new idea, and many apps already exist for this — both OS-native (FocusWriter, WriteMonkey, Q10, WriteRoomOmmwriteriA Writer) and web-based (QuietWrite, Hallo, DarkCopy, PenZen, Koi). They all pretty much do the same thing, some with support for Markdown or RTF, but none of them seem to do anything about LaTeX. Full LaTeX support is understandably daunting – not even the so-called LaTeX Editors even support it fully. But just knowing when to show you code and when to show you content, just being able to succesfully handle the constant context-switching which LaTeX writers know all to well, for me is paramount. There is of course LyX, which does try to give you a WYSIWYG-like editing environment on top of LaTeX, although LyX itself is not simply an editor but a full-blown preprocessor. Truthfully, I have never tried it, although there are plenty of reasons why I don’t want to go down that road primarily the problems with collaborative writing and varying dependencies.

So, I am basically determined to build such an editor. Web-based by far the makes most sense, for compatibility. No cloud services or any other junk though; with HTML5 I’m hoping it shouldn’t be too hard to edit files right on your computer. Just a nice clean distraction-free TeX editor which knows when to show your syntax-highlighted TeX commands, and when to switch into “Zen” writing mode. As I thrash a proper prototype together I will launch a proper project somewhere and invite people to contribute. But you can consider this my manifesto.

Disclaimer

Firstly if I am wrong about any of the above, please call me on it. Particularly if some existing IDE is actually better than I have given it credit for, of if I failed to mention something significant altogether. Secondly, I want to know if things rings true with anyone else, or if I’m the one who is deeply disappointed by the LaTeX editors out there. It would be nice if this actually led to something half-decent one day. Finally, this post will change over time as things get pointed out to me and the ideas mature a bit.

29 July, 2011

jQuery Clear-on-Focus Plugin

A small jQuery plugin I wrote which I thought I would make public. Instead of using <label> elements for each of your form’s fields, use the input’s value attribute to set the label. Clear-on-focus will then take care of clearing the value when a user clicks or focuses the field. If they leave the field blank, the original message is reset back. Also works correctly for password fields. Example usage:

<input type="text" name="username" value="Enter username" class="clear-on-focus" />
<input type="text" name="password" value="Enter password" class="clear-on-focus password" />

Link: github repository

27 July, 2011

Gedit syntax highlighting for Grammatical Framework source code

There is now an official page on the Grammatical Framework website about GF Editor Modes.

For correct syntax highlighting in Ubuntu’s default text editor (gedit) for anyone editing Grammatical Framework source code, put the code below into the file ~/.local/share/gtksourceview-2.0/language-specs/gf.lang (under Ubuntu).

Some helpful notes/links:

  • The code is based heavily on the haskell.lang file which I found in /usr/share/gtksourceview-2.0/language-specs/haskell.lang.
  • Ruslan Osmanov recommends registering your file extension as its own MIME type (see also here), however on my system the .gf extension was already registered as a generic font (application/x-tex-gf) and I didn’t want to risk messing any of that up.
  • This is a quick 5-minute job and might require some tweaking. The GtkSourceView language definition tutorial is the place to start looking.
  • Contributions are welcome!

25 July, 2011

How I revived my disappearing Seagate GoFlex Home DLNA/UPnP server

My family bought a Seagate GoFlex Home 2TB network-attached drive for streaming videos directly to our Samsung TV via DLNA/UPnP. Everything worked fine for a while, until suddenly one day it just refused to show up on the TV anymore. After eliminating all the home networking factors (cables, IPs, DHCP etc.), I noticed that when restarting the device it actually showed up briefly on the TV, but with no files on it – only to disappear after a few seconds.
Note that the device still behaved normally as a network device, i.e. when browsing using Windows Networking/SAMBA we could still access all our files normally. It was only the DLNA service that did not seem to be functioning.

Combing through the Seagate forums (and the web in general) I found that others have had similar issues but no real solutions seem to have emerged. So a little more investigation had to be done. The GoFlex Home web interface did not report anything untoward, and fiddling with all the settings in the preferences did not seem to have any effect.

So, like all good hackers I got my hands dirty and gained SSH access to the device.

Getting SSH access to the device

As described here and here, to gain access to your GoFlex Home via SSH you will need:

  1. An SSH client (obviously)
  2. The IP address of your GoFlex Home
  3. The administrator username & password
  4. Your device’s product key, which you can find by clicking About GoFlex Home in the bottom left of the web interface
  5. Confidence using the Linux command line

So, you open an SSH connection to your device with this specially-formed username USERNAME_hipserv2_seagateplug_XXXX-XXXX-XXXX-XXXX, where USERNAME is your username and XXXX-XXXX-XXXX-XXXX is your product key. On a Linux/Mac terminal this could look something like this (note the username, product key and IP will be different):

ssh john_hipserv2_seagateplug_FKSU-FJDU-DOWU-OSHD@192.168.1.100

Once you’re in, go straight into root — as you will need to do so anyway before long — with:

sudo -s

Of course poking around as root is dangerous and could irreversibly mess up your device, but if you’re still reading then you probably already knew that.

Restarting the DLNA service

At this point I tried poking around, trying to find some logs or anything which could give me an idea what the problem was. The name of the service which actually provides the DLNA server is minidlna, for which a totally invaluable reference can be found here. I tried to access the MiniDLNA log with

tail /tmp/minidlna/minidlna.log

but was told that the log was unavailable. Curiously, running ls -l in the directory reported that the file had no size, permissions, or modification date; so that wasn’t much help. I then tried to find the status of the MiniDLNA service with

/etc/init.d/minidlna.init status

which told me that the PID in /mnt/tmpfs/var/run/minidlna.pid did not match that of any running process, implying that the daemon had crashed. Made sense so far, but when I tried to restart the service with

/etc/init.d/minidlna.init restart

the service would attempt to restart, but instantly crash again as described above. Same thing with manually stopping and starting the service. Some trial and error later, I discovered that what needs to be done is is that MiniDLNA’s temporary folder needs to be forcibly unmounted, like so:

umount /tmp/minidlna

Restarting the service again after doing this finally did the trick; Checking the service’s status again as above now reports that MiniDLNA is running, and everything shows up normally on my TV etc.

Rebuilding the database

If the steps above still don’t fix your problem, you likely need to get MiniDLNA to rebuild its media database, with the command

/usr/sbin/minidlna -f /etc/miniupnpd/minidlna.conf -R -d

This time, I was given an error message about being unable to open sqlite database file, /tmp/minidlna/files.db. Attempts to force-delete the file manually also failed, and I finally had to resort to manually unmounting the minidlna directory with

umount /tmp/minidlna

This happily worked, and I was then finally allowed to rebuild the database with the command given above. This will scan your media folders and rebuild the database for you in MiniDLNA’s debug mode, which means it will spit out lots and lots
of output message to the console. After maybe 5 minutes or so, it finally told me the media library scanning was complete, and lo and behold I could once again access my GoFlex Home via my DLNA-enabled TV.

Now, I am still unclear as to what happens when I restart my device. The first time I tried this (after rebuilding the media database) my device went into the exact same problem as before! This time I logged in via SSH again, and run the MiniDLNA daemon in debug mode but without rebuilding the library:

/usr/sbin/minidlna -f /etc/miniupnpd/minidlna.conf -d

This successfully started the service again, allowed me to cleanly exit my SSH connection and access the DLNA server via my TV again. However it did take many minutes until my files all showed up, so I am assuming that MiniDLNA was in fact rebuilding the media database itself.

Conclusion

Despite finally managing to get things working as described above, it turns out that every time my GoFlex Home is restarted the MiniDLNA daemon crashes in the same way, and I am forced to fire up an SSH connection to sort things out. I have as yet found no way of permanently fixing the issue, but since our device is basically online 24/7 it’s not too much of an issue.

Of course I realise the steps here are not for the computer faint hearted, but after exhausting all the “user-friendly” ways of restoring the device, this is only sure-fire way I have found.

14 July, 2011

Geany syntax highlighting for Grammatical Framework source code

There is now an official page on the Grammatical Framework website about GF Editor Modes.

I wrote a custom filetype config file for the Geany text editor, providing correct syntax highlighting for anyone editing Grammatical Framework source code. Put the code below into the file /usr/share/geany/filetypes.GF.conf (under Ubuntu). You will need to manually create the file:

Light Version

Dark Version

You will also need to edit the filetype_extensions.conf file and add the following line somewhere:

GF=*.gf;

Hope that helps someone who doesn’t feel like reading through the Geany documentation! (which btw can be found here).

16 December, 2010

jQuery Tiler Plugin

A new plugin I’m in the process of creating, for neatly tiling boxes of different dimensions into a grid.

If you have ever created tiled a layout by floating a bunch of boxes together, you may have noticed that when these boxes have different dimensions ( heights/widths) you end up with a lot of space and an altogther un-neat layout (Figure A):

Figure A: Floating of different-sized blocks leaves a lot of white space.

Figure B: By using the jquery-tiler plugin our boxes are now neatly stacked

The tiler plugin fixes this by greedily inserting each box into the column which is “shortest”, resulting in something like that in Figure B.

Note how the order of the boxes has changed. To me this is acceptable, but more importantly it is somewhat unavoidable (as someone wise used to tell me, “you have to break eggs if you want to make an omelette”).

Note that in this scenario all boxes are forced into fixed column widths. There is also some support for varying widths (i.e. preserving the width of each box) however it is far from complete. Will implement this when I have some more time (and a few more brain waves).

Anyway if you want to try it out, the git repository is here: http://github.com/johnjcamilleri/jquery-tiler

6 November, 2010

Fuzzy string matching in MySQL using Levenshtein Distance stored function

Searching for fuzzy string matching methods will return various algorithms and various implementations of them.

I found this MySQL implementation of Levenshtein Distance to be adequate for my needs, and using this handy MySQL stored function makes it super as to use in queries without having to create temporary search-optimised tables or performing post-processing in another language (eg PHP).

Just install the functions in your database, and use like so:

SELECT * FROM users ORDER BY levenshtein_distance(users.name, 'john')

« Older PostsNewer Posts »