Older blog entries for salmoni (starting at number 575)

I spent the weekend wrestling with factorial ANOVA code which was nice and fun. All seems to work alright but there is still some finishing and of course testing to do before it's anything like releasable. Plus I need to work on how to work things like post-hoc tests and simple effects for when a significant effect is observed. Lots of fun!

I've been having lots of fun working through factorial statistics code. Actually, I'm not being sarcastic because I've spent so much time preparing the data ready for analysis (that's the part that takes the most work), that the statistics code itself is a nice easily stroll. And curiously, it's fun. The preparation stage doesn't provide so much in the way of motivation because it doesn't really do anything from an end-users perspective. But the stats code can analyse factorial analysis of variance of arbitrary factors and that is a rather nice thing indeed. It actually does something!

In other news, the naming of the business (branding etc) is coming to a head and hopefully we should have formed our company soon and bought all the URLs etc. We had a blitz last weekend and managed to get some ideas that I thought were rather good. I won't mention them here because of squatters, but when we're ready, I will be able to announce them.

And once I have announced them, I can make a public release of the software! Yay!

The above factorial code won't be in it though as it's not anywhere near tested (though I should just add it anyway for users to look at and shake their heads at). The problem is that I like to release things that actually work properly. That goes against the principles of "release early, release often" mantra so I should learn to lose control and just get code out there.

Thanks to everyone from here who completed the questionnaire I linked to in my last diary entry. The information has been tremendously useful! And as I promised, the code will be open source code, probably under the AGPL (which ever one we choose - apparently there are two, both of which are very similar).

Statistics software questionnaire

If anyone uses statistics software of any sort (whether Excel, SPSS, R, SAS or anything), I would be grateful if you could help by completing a survey we have put up at SurveyMonkey. It shouldn't take longer than a few minutes to complete and there are only ten questions. Feel free to expand upon your answers if possible.

Thank you very much in advance to those who complete it.

btw, it's all for the open source software that we're producing. We're stuck for a name now.

The market research has been going well and in our favour. We used a survey and interviews (blind for the first half to get opinions about the field and open the second half to get opinions about our product). We certainly have a strong market here.

And the development is going well though I have been stuck a lot on importing data. However, the tool is extremely flexible and useful - and it's great for merging data from different sources into one unified dataset which is something I think advanced users will appreciate.

I have also been trying to work on the interactive results without too much luck and have instead asked the opinions of the very knowledgeable people on the wxPython mailing list. They seem to come up with extremely helpful answers, but why not ask here?

My situation is this: I have a wxHTML frame displaying HTML results. These need to be dynamic - users will be able to select options that will mean the HTML needs to be changed and then redisplayed. The best way I can think of dealing with this is just to get the HTML (stored in a temporary memory file system) and remove the old code and insert the new code in its place and then re-display it. Does this seem like too much of a bad hack?

wxPython Sizers

I just wasted most of a day trying to sort out the data import GUI and problems with sizers. It was quite frustrating, but I managed to get most of the problems sorted out finally. It is now connecting to various databases and showing a sample of data which users can browse and select what they want to import from.

Oh, and it imports the variables too which is good. It is so nice when problems eventually finish. I have lots more work to do tomorrow (csv importing - I wrote my own csv module to deal with little problems like missing data in the middle of a row) but I am also going to my wife's family's village for a fiesta. It's been raining all day, so here's hoping the weather improves. Here's a picture of the village in sunnier times.

Either way, the work is coming along really nicely now. The project is not yet 50% finished (my estimate), but it already imports data from databases, allows a range of operations on them, and can produce even complex descriptive analysis. It's looking good so far.

I've been busy playing with Python 3K at home. It seems to be nice though I haven't dug deep enough / far enough to notice real changes outside of the 'print' statement changes.

In other work, I'm managing to tame wxPython again and am producing a consistent and simple interface for importing data from different sources (databases, spreadsheets, text files). It could form the basis of a data manager, but it's all for the statistics program which is itself coming along.

The program has an interactive interpreter which is fun: it's all based on Python's 'code' module and I've organised it so that users can import data with awkward field names (like: 'Variable (1) & Variable (2) mixed'), and they can still be used on the command line, thus:

Variable (1) & Variable (2) mixed.mean

Not a big change, but it's one less thing to explain to demanding users. The work on the main GUI is still ongoing (choosing a test is the hardest thing) but we're getting there.

The thing will be released under the Affero GPL license so it's even relevant to this site.

In administration things, I managed to get some more marketing research done (all promising but lots of things to think about), and the company is getting closer to being officially founded. It's all very exciting stuff.

I have a questionnaire here if anyone feels like completing it: it should take about 5-10 minutes and concerns people who use computers to perform statistical analysis. I cannot offer any money in return (we have zero investment - any offers will be carefully considered!), but it would be extremely helpful in getting open source to the top.

The questionnaire is at Survey Monkey. TIA to anyone who completes it.

Phew, am tired. It's hard working in these temperatures, but nicer than freezing my bits off. We will be getting air-con in the room soon in preparation for the baby's arrival but right now it's hot!

The stats program is coming on very well thank you! It now handles different types of data with ease and total transparency for the user. The output is nicely formatted and looks good. The architecture seems to be about right and we're looking to release the first version (obviously a beta) maybe next week. It will be under the AGPL because we intend to put in networking capabilities. Complex data analysis through a web-browser? Heh, why not?

Uraeus - I've tried noise-cancelling headphones on long flights and found them to be quite good. If they're the "tighter" ones (ie, the ones that fit tight around the ears) they can also help preventing ears from popping.

So I guess this is the closest thing to an official announcement that I can make.

My business partner and I are going to form a company which will concentrate upon statistics software. Our product will be called Ecstatistics which is a seriously good update of my old project SalStat. It differs in that:

a) It will be able to read data from CSV files, databases (a whole range), and spreadsheets. We plan to import SPSS and SAS files too as well as any other format we can code for;

b) It will output to a range of formats (PDF, OOo, databases, MS Office, HTML). The HTML is interesting because it will allow online analysis;

c) It will have a nice range of tests;

d) It will have a great graphing / charting capability;

e) It will be modular and easy to upgrade;

f) It will be far more usable than existing programs;

g) And of course, it will be open source;

Our plan is to get the product working (the database browser does already quite nicely) and produce a version for the OLPC project. Some people have asked for a stats program that works there already and it makes sense to equip students with (possibly) the most useful tool in scientific research: statistical analysis. So far, it can import from a range of databases and analyse the data descriptively. Output is only text for now and interaction is via a custom interactive interpreter, but it's early days yet. From what we've read, the important thing is to get something released and we hope to do that very soon.

Ecstatistics is coded in Python with NumPy, SQLAlchemy, SQLite and lots of other stuff. Because of this, we can code the OLPC version down to about 200k which competes extremely well with the opposition like R, SPSS and SAS. The interface is designed not just to be useful but also to instill good statistical practice, so it's educational too.

The interface will be designed with non-expert users in mind, particularly students. We aren't aiming at calloused statisticians; they have their favourite tools (and often write them for themselves anyway). We are aiming at all those people who have to do stats but don't like it.

In other news, I saw a couple of laptops here in the Phils in a major chain of electrical stores. They came with Linux preinstalled which was nice to see.

Finally, but most importantly, my wife had her scan earlier this week - we're expecting a little baby girl! The due date is the end of July and we're both very excited.

It's fun. I've been coding a database exploration tool - it's a simple thing designed to explore databases in order to retrieve data for analysis. Python, wxPython & SQLAlchemy are the main bits used. It's fun to see the GUI fire up and pause as it downloads all the information from my remote DB. It does not have a clean GUI yet so there is a lot of work left in making it professional (or should that be presentable? I've seen an awful lot of poor professional interfaces in my time) but that is minor stuff now.

Right now, it's still in the mid-30s as far as temperature is concerned. It's mid-afternoon and a little too hot to work any more. I'll have a nap underneath a large fan and try to get back to work later. Did I mention that I have no air-con in my room? =:-O

I've been busy unit-testing the stats algorithms lately and using R to get "known" values. All seems to be going quite well though there is a modicum of error because mine have greater specificity. The testing showed that an error exists in one of the quantile calculations and the trimmed mean is borked (I mean really borked), but I had guessed that anyway. Still, it's good to have the accuracy of the tests documented.

It's extremely hot here in Laguna. The temperature is 33 degrees but more than that, it's extremely humid and still (little wind). It's hard to be bent over a laptop.

I've been here for 5 weeks now and am enjoying my time.

566 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!