Older blog entries for salmoni (starting at number 577)

Perhaps this has been done, but I have been thinking about my ideal IDE for Python.

I like editors and have tried many. I also like interactive interpreters and have tried many. But my issue is that I often have to have both running at the same time (and yes, I know there are editors with interpreters running in them at the same time, but that's not what I'm thinking of).

But what about a dynamic editor/interpreter?

It sounds fanciful and I'm just beginning to think of the architecture but here is how it would work with Python.

You type some code in. It works interactively, so only executes when a block is entered. Or it may not. Each code block has a flag next to it that when activated causes the code to be marked as executable. When executed, only that code is run.

Ok, still fairly basic.

But what about if the user could also interactively run code separately from the stored blocks. So if I type in a large program, I can still type 'print "hi"' in the middle, click it, and it and only it will run.

But even better: what about if I can execute the code block by block or even line by line?

Again, this is not totally revolutionary. But what if you could change existing code and cause the program (assuming that it's still running and waiting for the user to enter the next code) to step back to a previous state? And then run up to the end with the new code?

And then when saving, the user can save a working version (with the interactive bits in place) and a "parade" version with all the interactive bits taken out.

I'm not sure this has been done (though if anything can, it's probably Eclipse or Emacs). I have probably described this idea poorly, but I think it could be a good thing that unifies the best of editor/IDE operation with the best of interpreters operation.

I'll have to work on a prototype and test it myself to see if it works.

29 May 2008 (updated 29 May 2008 at 09:13 UTC) »
Python Consulting

This is an announcement that I will be doing Python consulting from now. My expertise covers Python, wxPython, NumPy and SQLAlchemy; and the primary area of my work is on numeric analysis / statistics, though of course you get a PhD in human-computer interaction thrown in if you want interfaces made.

If anyone has any Python work they would like help with, I can offer a discount on open source code. I can work internationally as long as requirements can be sent electronically. The best way to contact me as salmoni - at - gmail.com

Apart from that, all is well here in the Philippines! The coding on the new project is going well and I'm considering farming off the database viewer/importer tool as a separate component for database management. I'm not exactly sure what functionality would be necessary for this, but suffice to say that the basics should be easy to implement (and the middling / advanced stuff a nightmare!).

Factorial ANOVA of large sets

I've also solved all the problems concerning factorial analysis of variance for extremely large datasets (ie, those too large to fit into memory). I will crack on with this code now to get it done and to make an industrial quality heavy-weight data analysis tool. This will be open sourced in time, after testing anyway. The real problems that I have are a) getting hold of an environment (ie, a machine with a massive database on it), and b) getting comparison results, though SAS should be able to deliver on this. I understand that SPSS will face problems if the data are too big for memory; but SAS can work around this just like my code can.

Moore's Law makes this of decreasingly utility; but it's nice to have software that you know can handle any task.


I've also enquired about submitting an article to a Python journal about how to use the code module to implement an interactive interpreter and embed it within a Python program. This comes from work on the statistics program where I wrote one for quick debugging and found it so good that I extended it a little to be used as a permanent tool.

One problem we found is that when declaring and using a variable, a user would have to write:

x = newvar()



It would make more sense (to novices) to write


It does this now. What I did was override the code.InteractiveInterpreter.showtraceback method to catch NameErrors (which are risen when x is sent to newvar because x doesn't exist). Then the code works out the command and sends it again to the newvar method but with the x in quotes. It's minor stuff but less annoying to users.

And if a database has awkward variable names that are not valid variable names in Python, they cannot be used: so I added a catcher to showtraceback that catches AttributeErrors and tests to see if a string has been issued with a program method:

"Variable 1 (2000)".variance()

This would never work normally within Python without overriding the string class (which is another possibility). However, the catcher above can catch this attribute error and redirect the 'variance()' bit to the proper variable definition.

All this just means that the application is beginning to work around its users instead of demanding that they work around it.

I also added lots of alternative names for descriptive tests so:


all call the same function. This helps because when I've used a new statistics program, I have to find out the exact name for the functions. This way, I don't have to remember which one: I just pick a common one, and away I go! :-)

I spent the weekend wrestling with factorial ANOVA code which was nice and fun. All seems to work alright but there is still some finishing and of course testing to do before it's anything like releasable. Plus I need to work on how to work things like post-hoc tests and simple effects for when a significant effect is observed. Lots of fun!

I've been having lots of fun working through factorial statistics code. Actually, I'm not being sarcastic because I've spent so much time preparing the data ready for analysis (that's the part that takes the most work), that the statistics code itself is a nice easily stroll. And curiously, it's fun. The preparation stage doesn't provide so much in the way of motivation because it doesn't really do anything from an end-users perspective. But the stats code can analyse factorial analysis of variance of arbitrary factors and that is a rather nice thing indeed. It actually does something!

In other news, the naming of the business (branding etc) is coming to a head and hopefully we should have formed our company soon and bought all the URLs etc. We had a blitz last weekend and managed to get some ideas that I thought were rather good. I won't mention them here because of squatters, but when we're ready, I will be able to announce them.

And once I have announced them, I can make a public release of the software! Yay!

The above factorial code won't be in it though as it's not anywhere near tested (though I should just add it anyway for users to look at and shake their heads at). The problem is that I like to release things that actually work properly. That goes against the principles of "release early, release often" mantra so I should learn to lose control and just get code out there.

Thanks to everyone from here who completed the questionnaire I linked to in my last diary entry. The information has been tremendously useful! And as I promised, the code will be open source code, probably under the AGPL (which ever one we choose - apparently there are two, both of which are very similar).

Statistics software questionnaire

If anyone uses statistics software of any sort (whether Excel, SPSS, R, SAS or anything), I would be grateful if you could help by completing a survey we have put up at SurveyMonkey. It shouldn't take longer than a few minutes to complete and there are only ten questions. Feel free to expand upon your answers if possible.

Thank you very much in advance to those who complete it.

btw, it's all for the open source software that we're producing. We're stuck for a name now.

The market research has been going well and in our favour. We used a survey and interviews (blind for the first half to get opinions about the field and open the second half to get opinions about our product). We certainly have a strong market here.

And the development is going well though I have been stuck a lot on importing data. However, the tool is extremely flexible and useful - and it's great for merging data from different sources into one unified dataset which is something I think advanced users will appreciate.

I have also been trying to work on the interactive results without too much luck and have instead asked the opinions of the very knowledgeable people on the wxPython mailing list. They seem to come up with extremely helpful answers, but why not ask here?

My situation is this: I have a wxHTML frame displaying HTML results. These need to be dynamic - users will be able to select options that will mean the HTML needs to be changed and then redisplayed. The best way I can think of dealing with this is just to get the HTML (stored in a temporary memory file system) and remove the old code and insert the new code in its place and then re-display it. Does this seem like too much of a bad hack?

wxPython Sizers

I just wasted most of a day trying to sort out the data import GUI and problems with sizers. It was quite frustrating, but I managed to get most of the problems sorted out finally. It is now connecting to various databases and showing a sample of data which users can browse and select what they want to import from.

Oh, and it imports the variables too which is good. It is so nice when problems eventually finish. I have lots more work to do tomorrow (csv importing - I wrote my own csv module to deal with little problems like missing data in the middle of a row) but I am also going to my wife's family's village for a fiesta. It's been raining all day, so here's hoping the weather improves. Here's a picture of the village in sunnier times.

Either way, the work is coming along really nicely now. The project is not yet 50% finished (my estimate), but it already imports data from databases, allows a range of operations on them, and can produce even complex descriptive analysis. It's looking good so far.

I've been busy playing with Python 3K at home. It seems to be nice though I haven't dug deep enough / far enough to notice real changes outside of the 'print' statement changes.

In other work, I'm managing to tame wxPython again and am producing a consistent and simple interface for importing data from different sources (databases, spreadsheets, text files). It could form the basis of a data manager, but it's all for the statistics program which is itself coming along.

The program has an interactive interpreter which is fun: it's all based on Python's 'code' module and I've organised it so that users can import data with awkward field names (like: 'Variable (1) & Variable (2) mixed'), and they can still be used on the command line, thus:

Variable (1) & Variable (2) mixed.mean

Not a big change, but it's one less thing to explain to demanding users. The work on the main GUI is still ongoing (choosing a test is the hardest thing) but we're getting there.

The thing will be released under the Affero GPL license so it's even relevant to this site.

In administration things, I managed to get some more marketing research done (all promising but lots of things to think about), and the company is getting closer to being officially founded. It's all very exciting stuff.

I have a questionnaire here if anyone feels like completing it: it should take about 5-10 minutes and concerns people who use computers to perform statistical analysis. I cannot offer any money in return (we have zero investment - any offers will be carefully considered!), but it would be extremely helpful in getting open source to the top.

The questionnaire is at Survey Monkey. TIA to anyone who completes it.

Phew, am tired. It's hard working in these temperatures, but nicer than freezing my bits off. We will be getting air-con in the room soon in preparation for the baby's arrival but right now it's hot!

The stats program is coming on very well thank you! It now handles different types of data with ease and total transparency for the user. The output is nicely formatted and looks good. The architecture seems to be about right and we're looking to release the first version (obviously a beta) maybe next week. It will be under the AGPL because we intend to put in networking capabilities. Complex data analysis through a web-browser? Heh, why not?

Uraeus - I've tried noise-cancelling headphones on long flights and found them to be quite good. If they're the "tighter" ones (ie, the ones that fit tight around the ears) they can also help preventing ears from popping.

So I guess this is the closest thing to an official announcement that I can make.

My business partner and I are going to form a company which will concentrate upon statistics software. Our product will be called Ecstatistics which is a seriously good update of my old project SalStat. It differs in that:

a) It will be able to read data from CSV files, databases (a whole range), and spreadsheets. We plan to import SPSS and SAS files too as well as any other format we can code for;

b) It will output to a range of formats (PDF, OOo, databases, MS Office, HTML). The HTML is interesting because it will allow online analysis;

c) It will have a nice range of tests;

d) It will have a great graphing / charting capability;

e) It will be modular and easy to upgrade;

f) It will be far more usable than existing programs;

g) And of course, it will be open source;

Our plan is to get the product working (the database browser does already quite nicely) and produce a version for the OLPC project. Some people have asked for a stats program that works there already and it makes sense to equip students with (possibly) the most useful tool in scientific research: statistical analysis. So far, it can import from a range of databases and analyse the data descriptively. Output is only text for now and interaction is via a custom interactive interpreter, but it's early days yet. From what we've read, the important thing is to get something released and we hope to do that very soon.

Ecstatistics is coded in Python with NumPy, SQLAlchemy, SQLite and lots of other stuff. Because of this, we can code the OLPC version down to about 200k which competes extremely well with the opposition like R, SPSS and SAS. The interface is designed not just to be useful but also to instill good statistical practice, so it's educational too.

The interface will be designed with non-expert users in mind, particularly students. We aren't aiming at calloused statisticians; they have their favourite tools (and often write them for themselves anyway). We are aiming at all those people who have to do stats but don't like it.

In other news, I saw a couple of laptops here in the Phils in a major chain of electrical stores. They came with Linux preinstalled which was nice to see.

Finally, but most importantly, my wife had her scan earlier this week - we're expecting a little baby girl! The due date is the end of July and we're both very excited.

568 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!