salmoni is currently certified at Master level.

Name: Alan James Salmoni
Member since: 2004-12-14 09:38:36
Last Login: 2013-02-21 21:38:59

FOAF RDF Share This

Homepage: http://roistr.com

Notes:

I am a freelance user experience designer
working in the United Kingdom and New
Zealand. I like open source and free software so
much that I released my own stuff. My programs are SalStat, a
Python
and wxPython based application for statistical analysis,
and
TrackBrowser,
a web browser designed to record user behaviour. I built
the
latter for my professional work.

My PhD is in human-computer interaction
from Cardiff University (accepted 2004) and I've had a few years professional
experience in the field of interaction design and user experience since.

LinkedIn page for salmoni

Currently working for my own company Thought Into Design Ltd and Roistr, the online semantic relevance engine.

Projects

Articles Posted by salmoni

Recent blog entries by salmoni

Syndication: RSS 2.0

Mozilla are looking for a Quantitative user researcher which sounds cool. The emphasis on user research sounds right up my street, particularly the need for mastery of experimental design and statistical analysis. It kind of takes me back to my PhD and work on SalStat (still going strong).

The problem is my covering letter. Can anyone here tell me what style of covering letters are preferred? Long and detailed explaining why I meet each of the requirements? The standard 3 paragraph ["intro", "I'm cool", "thanks"]? Or some combination in between?

In the meantime, I've released Roistr which does some basic semantic analysis / text analytics stuff. I put up some demos but it's hard to really show how useful this thing is. It's based on the open source Gensim toolkit along with numpy and scipy.

Scipy sounds like it's going places. Travis Oliphant recently announced an initiative to bring it to big data properly. I have an idea of what he means and it would be very cool.

Does anyone have any Google Plus invites that they could send (one) to me?

In other news, wife, daughter and I are off to the Philippines for 5 weeks and hoping to get some start-up work moving over there. UX is in demand at the moment so it's a good time to be around.

I've also been looking up versions of principle components analysis in Python and found these:



All the linguistic stuff I've been doing lately is making my head spin but it's coming together.

Lots happening: I've been building a semantic relevance engine - something that can accurately determine the semantic similarity of 2 text documents and it's working reasonably well. Working completely untrained, I'm getting accuracies of well above 0.8 and often above 0.9. Obviously 1.0 is the ideal but even human judgements rarely get above 0.9 with the corpora I've been using for this.

The good thing is that I appear to be discovering new stuff almost every day about how documents are understood. There are some approaches I've used that I've not read about in the literature so there might be some useful stuff for the world here.

However my aim is to make a web service around this. And it's all based on open source software (Python, numpy, Scipy, Gensim etc) which is perfect. There is proprietary knowledge used, however: the corpora, how it's prepared and the architecture of the engine; but that will all come publicly out soon enough.

Log Entropy models

I had problems when I last upgraded to 0.7.8 of Gensim. The main issue was that the package I imported wasn't necessarily the one used: quite often, it seemed as though the top level would be from one install whereas another import would be from somewhere else. The net result was that parts of my software were looking for an id2word method in a dictionary where there were none before.

However, I still want to try 0.7.8 if I can and I found a way. I downloaded and untarred it, and renamed it 'gensim078'. Then, I went and changed each 'from gensim import *' statement to 'from gensim078 import *' which seems to be doing the trick. I'm sure there are better ways to do it but this is working for me so I'm happy.

The advantages are that a) it's faster particularly for similarity calculations, and b) I now have access to the Log Entropy model which I'm building for G1750.

Later tonight, I'll adjust the dictionary and begin pruning words that appear across lots of documents to see if that improves the focus. The program does seem a little 'fuzzy' as it is but that is quite a human characteristic so I'm not too worried. However, it will help me explore vector models and understand them better myself.

Although the results of the word-pair semantic association task were poor, I'm not dismayed (too much!) because my whole construction is not perfect and there is lots of room for improvement. The task is also useful as it gives me an indication of accuracy by another means to the 20NG categorisation task. When I create a new corpus, I should ideally subject it to a battery of tests designed to test different things. With the results of these, I can work out whether the corpus is heading in the right direction or not. It's all good to have these tools even if (initially) not going how I wanted them to.

I'm turning into a perfectionist. I really need to release something useful before I refine... Release early, release often...

I've been having lots of fun lately with Gensim, a Python framework for vector space modelling. It includes fun stuff like latent semantic analysis, latent dirichlet allocation and other goodies. Allied with NLTK, this makes a very formidable Python- based NLP framework.

My tasks are sorting newsgroup posts into correct groups and I've achieved a reasonable level of accuracy (0.92) which isn't bad given that it's entirely dependent upon content. However, most analyses are showing lower accuracies (0.70+) which isn't bad but not far away enough from chance performance to be taken realistically. However, there are a few ways to improve this and I'm conducting an enormous number of experiments to get an effective mental model of how vector space models work.

This is all the beginning of constructing a relevance engine which I'm sure will be useful to some people.

Great fun!

589 older entries...

 

salmoni certified others as follows:

  • salmoni certified returnoftheredi as Journeyer
  • salmoni certified skx as Journeyer
  • salmoni certified hereticmessiah as Journeyer
  • salmoni certified dorward as Journeyer
  • salmoni certified Burgundavia as Journeyer
  • salmoni certified pgavin as Apprentice
  • salmoni certified bi as Journeyer
  • salmoni certified Chicago as Journeyer
  • salmoni certified drobilla as Apprentice
  • salmoni certified gears as Journeyer
  • salmoni certified coywolf as Journeyer
  • salmoni certified samfw as Apprentice
  • salmoni certified zanee as Journeyer
  • salmoni certified pesco as Journeyer
  • salmoni certified marnanel as Journeyer
  • salmoni certified fzort as Journeyer
  • salmoni certified StevenRainwater as Master
  • salmoni certified masood as Journeyer
  • salmoni certified joshuat as Journeyer
  • salmoni certified chipx86 as Journeyer
  • salmoni certified pjcabrera as Apprentice
  • salmoni certified danguer as Apprentice
  • salmoni certified robocoder as Apprentice
  • salmoni certified TypeRite as Journeyer
  • salmoni certified mbrubeck as Journeyer
  • salmoni certified spikboll as Apprentice
  • salmoni certified DV as Master
  • salmoni certified cTaylor as Apprentice
  • salmoni certified pfh as Journeyer
  • salmoni certified gobry as Journeyer
  • salmoni certified fallenlord as Journeyer
  • salmoni certified nixnut as Journeyer
  • salmoni certified garym as Journeyer
  • salmoni certified mwh as Master
  • salmoni certified hub as Master
  • salmoni certified lgs as Journeyer
  • salmoni certified TheCorruptor as Journeyer
  • salmoni certified bdodson as Journeyer
  • salmoni certified yoper as Journeyer
  • salmoni certified cm as Apprentice
  • salmoni certified fscked as Journeyer
  • salmoni certified statbanana as Apprentice
  • salmoni certified ShredWheat as Master
  • salmoni certified osfameron as Apprentice
  • salmoni certified chakie as Master
  • salmoni certified yosh as Master
  • salmoni certified chalst as Journeyer
  • salmoni certified allanf as Apprentice
  • salmoni certified lev as Journeyer
  • salmoni certified julian as Master
  • salmoni certified negative as Journeyer
  • salmoni certified MisterP as Apprentice
  • salmoni certified josef as Journeyer
  • salmoni certified dsnopek as Journeyer
  • salmoni certified kwoo as Apprentice
  • salmoni certified mikehearn as Journeyer
  • salmoni certified follower as Apprentice
  • salmoni certified wspace as Journeyer
  • salmoni certified arrowood as Apprentice
  • salmoni certified mslicker as Journeyer
  • salmoni certified esden as Journeyer
  • salmoni certified johnb as Apprentice
  • salmoni certified blindcoder as Journeyer

Others have certified salmoni as follows:

  • fxn certified salmoni as Journeyer
  • spikboll certified salmoni as Journeyer
  • nikole certified salmoni as Master
  • e8johan certified salmoni as Journeyer
  • pvanhoof certified salmoni as Journeyer
  • skx certified salmoni as Apprentice
  • returnoftheredi certified salmoni as Apprentice
  • hereticmessiah certified salmoni as Journeyer
  • orique certified salmoni as Journeyer
  • groom certified salmoni as Journeyer
  • bi certified salmoni as Journeyer
  • mirwin certified salmoni as Master
  • pesco certified salmoni as Journeyer
  • fzort certified salmoni as Journeyer
  • zanee certified salmoni as Journeyer
  • mentifex certified salmoni as Master
  • cdfrey certified salmoni as Journeyer
  • MikeCamel certified salmoni as Journeyer
  • chipx86 certified salmoni as Apprentice
  • slef certified salmoni as Apprentice
  • sand certified salmoni as Journeyer
  • Chicago certified salmoni as Journeyer
  • cTaylor certified salmoni as Journeyer
  • nixnut certified salmoni as Journeyer
  • DarthEvangelusII certified salmoni as Apprentice
  • Axolotl certified salmoni as Journeyer
  • sprite certified salmoni as Apprentice
  • xf certified salmoni as Journeyer
  • mdupont certified salmoni as Journeyer
  • elanthis certified salmoni as Apprentice
  • TheCorruptor certified salmoni as Journeyer
  • gene99 certified salmoni as Master
  • yoper certified salmoni as Master
  • RhysJones certified salmoni as Journeyer
  • cm certified salmoni as Journeyer
  • sdodji certified salmoni as Journeyer
  • pasky certified salmoni as Journeyer
  • strider certified salmoni as Journeyer
  • mterry certified salmoni as Journeyer
  • allanf certified salmoni as Apprentice
  • lev certified salmoni as Apprentice
  • negative certified salmoni as Journeyer
  • mascot certified salmoni as Journeyer
  • lerdsuwa certified salmoni as Journeyer
  • wspace certified salmoni as Journeyer
  • arrowood certified salmoni as Journeyer
  • kilmo certified salmoni as Journeyer
  • esden certified salmoni as Journeyer
  • sculptor certified salmoni as Apprentice
  • welisc certified salmoni as Apprentice
  • mdekkers certified salmoni as Apprentice
  • vivekv certified salmoni as Master
  • boog certified salmoni as Journeyer

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page