Sunday, 1 August 2010

Experimenting NL-style...

For the past few days I was (besides watching GUADEC talks) experimenting a little with Zeitgeist and natural language processing... or sort of anyway. Having no real prior knowledge in NL field and not using any of the existing NLP libraries (as I couldn't find anything in C), definitely made it interesting, but also made me realize that NLP is really hard (even though I only wanted to get a very specific app to work) and taking this path most likely isn't a way to get somewhere.

But anyway, the original idea was to make an algorithm which would take a natural language query and "compile" a Zeitgeist event template from it. This would make it possible to basically ask questions about stuff you did on your computer (not necessarily in a question-form) and get results back from Zeitgeist. The way I did the algorithm was very easily pluggable into any ZG application, so of course I tried it with our lovely Sezen search applet, and on the following screenshots you can see it in action:

If you for some reason don't see the images, there are queries like "music played today", "web pages accessed on wednesday" and "files modified 1 week ago". Those are basically types of strings that my simple engine is able to process right now (besides simple queries like "movies", "vector images", etc.)

I won't deny that all this work got inspired by seeing screenshots for the "Storage" project which Siegfried dug out from somewhere. And even though it seems to be long abandoned and dead, I'd still love to see its sources, but unfortunately I couldn't find them anywhere... But if you know about an URL where it still lives, please give me a shout. ;)

Also, if nothing else, this work led to a patch for Vala which fixes up bindings for N-ary trees, so soon one will be able to finally use the N-ary trees' datatypes present in glib, without having to reimplement it in Vala.