Older blog entries for sjanes71 (starting at number 85)

pydbbench     Putzed a bit last night getting pydbbench to generate GNU Plots of the execution data. It made pretty pictures but didn't really progress the framework of getting a Python script to automatically run and summarize benchmarks. Tonight I had a small fight with DB 2.0 API until I figured out that the pyPgSQL has a more advanced "connect_string" feature (send a string or a dict of parameters) than the MySQLdb library has (only a string is allowed.)

The only test written is a connect-disconnect test and the harness makes runs roughly:

each benchmark to run 120 seconds
for each driver: (PostgreSQL, MySQL)
  for each benchmark: (dbconnectdisconnect)
    run the setup
    for each concurrency: (1,25,100,250,500)
      run that benchmark
      summarize that benchmark
    run the teardown

When it starts hammering 500 sessions I think either the 2.4.20 Linux scheduler or Python threading starts to choke. I don't have enough hardware to run the benchmarker against a different machine where I would expect some different numbers. Now I'm getting to a point where it's time to write some more interesting tests. The framework also tracks successful iterations which I just tested by setting the max-connections for MySQL down fairly low. I'm going to stop hacking on it tonight and go back and do some reading.

1 Apr 2003 (updated 1 Apr 2003 at 15:26 UTC) »
Gentoo Linux Switching To RPM Format     Fuck, shit, cunt, sonovabitch, damn... what a fucking great April Fool's headline. Now I'm going to be shell-shocked for everything else that'll happen tomorrow.

BT & Redhat 9     I'm not seeing as large a speedup for this download as I saw for Mandrake, could be a function of the mix of users attempting to download RH9.

Work     Again, I'm taking on risk to be happy.

For the Advogato Commons: Draft/Code in Progress     Working on a small database benchmarking harness for the purposes of comparing recent versions of MySQL and PostgreSQL. People too often say that "This or the other is better/faster" but it is still quite useful to be able to run some kind of benchmark to get some numbers. Many other database benchmarks out there have good concerns with "end-to-end" testing (browser-webserver-app-db as in TPC-W) but end-to-end testing takes too long to set up and administer and I think that there are some legal issues with "publishing" TPC-W numbers.

The Open Source Database Benchmark seems to be dead, it was written in C which I don't think is absolutely needed to get a general idea of how a database will operate. MySQL's sql-bench is written in perl but doesn't work with newer versions of PostgreSQL. I'm not sure that I want to use a benchmark written by a database vendor themselves when they may have intimate knowledge of the inner workings of their own system-- but also sql-bench does not seem to have a "concurrent" method of testing where multiple sessions are hammering on the database simultaneously.

I really do love Python, so I'm going to go at it with Python. Perhaps the work will create some more pressure to help bring Python's database support to the levels of Perl DBI.

What? No, this isn't an April Fools thing.

The Straw that is ATA/IDE     Waiting on an updatedb to complete so I can use the locate command without a warning. This is driving me nuts-- I wish there were a way to make these drives faster.

pydbbench     I hope I didn't accidentally use someone elses name, I didn't do a search. I haven't finished the code to summarize the total "measurement" for all sessions, but each session knows how to count the number of iterations, rate of execution, and min,max,avg times of execution. This is connect/disconnect for PostgreSQL 7.3.2. Eventually I'll figure out how to chuck this into SVG for more oohs-and-aahs. I'm pretty happy with Python's treading so far.

1 session
ssn 0 name do_dbconnectdisconnect timelimit 10 iterations 385 rate 38.000/s min 0.018036 max 0.136911 avg 0.025974
5 sessions
ssn 3 name do_dbconnectdisconnect timelimit 10 iterations 66 rate 6.000/s min 0.019086 max 0.429477 avg 0.151648
ssn 1 name do_dbconnectdisconnect timelimit 10 iterations 67 rate 6.000/s min 0.022207 max 0.451928 avg 0.151311
ssn 2 name do_dbconnectdisconnect timelimit 10 iterations 83 rate 8.000/s min 0.021862 max 0.324185 avg 0.123009
ssn 0 name do_dbconnectdisconnect timelimit 10 iterations 86 rate 8.000/s min 0.019195 max 0.412027 avg 0.119611
ssn 4 name do_dbconnectdisconnect timelimit 10 iterations 97 rate 9.000/s min 0.018624 max 0.382124 avg 0.105128
25 sessions
ssn 4 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.188134 max 0.778501 avg 0.588586
ssn 3 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.182435 max 0.774702 avg 0.591407
ssn 0 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.194805 max 0.826876 avg 0.595367
ssn 5 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.221709 max 0.859946 avg 0.599044
ssn 22 name do_dbconnectdisconnect timelimit 10 iterations 16 rate 1.000/s min 0.539721 max 0.879632 avg 0.628659
ssn 21 name do_dbconnectdisconnect timelimit 10 iterations 16 rate 1.000/s min 0.482847 max 0.946763 avg 0.632173
ssn 24 name do_dbconnectdisconnect timelimit 10 iterations 16 rate 1.000/s min 0.554055 max 0.947282 avg 0.631847
ssn 6 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.204366 max 1.102799 avg 0.608936
ssn 2 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.189930 max 0.938403 avg 0.609606
ssn 11 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.190076 max 0.919035 avg 0.602645
ssn 9 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.312997 max 0.918923 avg 0.615667
ssn 7 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.276532 max 0.878170 avg 0.618143
ssn 10 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.243147 max 0.922962 avg 0.619472
ssn 17 name do_dbconnectdisconnect timelimit 10 iterations 16 rate 1.000/s min 0.540462 max 0.894974 avg 0.648541
ssn 8 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.268107 max 0.932455 avg 0.624780
ssn 12 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.229943 max 0.903213 avg 0.618912
ssn 13 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.216434 max 0.949649 avg 0.619694
ssn 1 name do_dbconnectdisconnect timelimit 10 iterations 18 rate 1.000/s min 0.033609 max 0.949737 avg 0.595317
ssn 15 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.317664 max 0.940005 avg 0.622457
ssn 19 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.298987 max 0.946451 avg 0.624361
ssn 23 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.490933 max 0.944254 avg 0.628118
ssn 18 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.285008 max 0.995577 avg 0.630302
ssn 16 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.343448 max 1.094034 avg 0.635534
ssn 14 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.332970 max 1.043219 avg 0.638293
ssn 20 name do_dbconnectdisconnect timelimit 10 iterations 17 rate 1.000/s min 0.380707 max 1.072901 avg 0.638602
26 Mar 2003 (updated 26 Mar 2003 at 18:13 UTC) »
Mandrake 9.1 and BitTorrent     I'm running a download not because I care much about Mandrake but I care more about BitTorrent, and want to see BT "succeed" as an alternative to other P2P systems. BT follows a little bit more the UNIX philosophy of building "small tools to achieve one purpose" than any other P2P application out there, which seem to kitchen-sink IRC, P2P, and Media player altogether.

Unfortunately the GUI version of BT segfaults on Gentoo (it was a masked ebuild so I'm not dissappointed) but the headless downloader still works:

saving:         mandrake9.1 (1950.3 MB)
percent done:   53.5
time left:      5 hour 19 min 18 sec
download to:    /home/sjanes/Shared/mandrake9.1
download rate:  92.5 kB/s
upload rate:    91.0 kB/s

I started this download from my W2K Laptop, let it run all night and it choked down about 49% of the ISO's. This morning I moved it to my Linux machine and was happy to see that the interim formats were platform independent, and resumed at my faster T1 connection at work.

The only annoying part is the slow startup of BT when resuming, because it needs to scan the entire download to know how much is downloaded.

Some time later... we have completion:

saving:         mandrake9.1 (1950.3 MB)
percent done:   100
time left:      Download Succeeded!
download to:    /home/sjanes/Shared/mandrake9.1
download rate:
upload rate:    142.2 kB/s

BT is truly some amazing work that needs to be integrated into Squid somehow. :)

Distributed DNS Policies     First come-first serve would indeed cause a land rush by all the squatters-- I think that probably the best policy is to develop a system where the Internet itself decides who is the best manager for a given domain space, e.g. everyone agrees that IBM's collection of servers over there should manage the ibm.com namespace. Networks instead of using a fixed set of root servers (if we keep the existing mechanisms of DNS), servers would apply a trust metric or subscribe to a publisher of DNS delegation. However, this can break the "universality" of the namespace because no longer can one assume that your e-mail will go to the same place because your DNS trusts/subscriptions may be different than someone elses. Could the Internet withstand that kind of chaos in the interest of lancing the boil of ICANN off of itself? Internet heal thyself? Maybe the best thing to do is for one of the distributed DNS providers to trial out such a democratic/trust/rating/subscription style system.

Hacking     I never get too much done over the weekend, having a life distrupts hacking. :)

HTTP: The Definitive Guide     O'Reilly raises the bar of quality again for their own books. I've only read two chapters so far and they've filled in some gaps of knowledge. The biggest eye opener was learning that there are also kinds of parameters to the left-side of the ?, e.g. http://blah-blah/pagename;param1=value1;param2=value2?cgiparam1=value1&cgiparam1=value2 I've never seen it used before because I think it has mostly been used for FTP parameters (send it in binary/ascii, etc.). FTP et TELNET delendae sunt. Passwords sent in the clear just isn't good. I don't care about wrapping them in SSL, it's the equivalent of a wrapping the tourniquet on the detached leg and not the stub attached to the body in my mind. However, basic authentication in HTTP isn't much better.
Grid Computing     I know that personally I'm probably a big electricity waster considering that I've got more computers than most ordinary people. Two of them have been basically on-demand file servers with SAMBA. I have one machine now that I cart back and forth from work that I do all of my primary development and a laptop that hosts Windows 2000 because our world is a very cruel one. I'm wondering how grid computing would be useful within a family-- closer to home, rather than to some pharmaceutical company folding proteins or nationally supported Al-Qaeda distributed cracking system.

The Robotic News Desk Editor     Google News points out as a minor "Top Story":

Jackson Wears Fake Nose , Hates Being Black , According To Article Launch Yahoo - and 89 related »
Hopefully this doesn't become a trend-- I use google news daily because generally this crap only appears in the Entertainment section, not the "Top News" section.

User Friendly open     I didn't get a chance to do diddlysquat on this, might get some time tonight.

auspex's open     I wouldn't think of it as a pitiful "file manager"--this is a good idea and better if you made a "user friendly" version that did not require getopt style arguments for "ordinary people" who are adverse to getopt-style command lines:

view x [with tool]
edit x [with tool]
compress x [with tool]
encrypt x [with tool]
decrypt x [with tool]
translate x to language

We already typically use MIME types to decide what program to "view" something with, why not also create mappings for edit, compress, and translate? I have heard of many users of computers who might have used some program every day for the last 3 years on their computer and didn't know what it was called. ("Do you use Word to edit your manuscripts?" "What's Word?" {{Clippy bangs on the CRT and mouths "I'm Word! Look at me! Help meeeee!!"}}) Too often icons on desktops represent file-types as application tools (e.g. there's a Winamp icon for a sound file, but no distinct WAV, MP3, or OGG icon) instead of file types. This idea has given me a small kick in the pants and I actually started prototyping in Bash-- but will switch back to Python because of better array support. More verbs than just those will be available and the issue of "x" where "x" is some filespec (hard to remember/type by users) will be looked at.

4 Mar 2003 (updated 4 Mar 2003 at 13:23 UTC) »
RMS (Richard Stalman or Microsoft Rights Management Services?)     Interesting how Microsoft is picking some TLA's that are commonly associated with people. Somewhere someone in Redmond, WA is having a a good snicker. I wonder what Microsoft ESR or Microsoft JWZ might become.

Sendmail     Saw another news note about another buffer overrun hole recently published about Sendmail. I decided years ago to stop using Sendmail because of things like this, so I've been using either QMAIL, EXIM, or POSTFIX, depending on whatever was easiest to install. I do typically nuke Sendmail if I find it on a machine if I become responsible for that machine.

Leonardo's Laptop     This book should perhaps become required reqding for GNOME and KDE developers. I'm very worried about the insistence of "cloning" Microsoft user interfaces to increase acceptance of Linux. Evolution may well be a very faithful clone of Outlook, but is Outlook the best user interface for a personal information manager? One thing the book points out in "The Quest for Universal Usability" is that:

A fundamental interface improvement would be support for evolutionary learning and a level-structured approch to design (Baecker et al. 2000). Why can't you begin with an interface that contains only basic features (say 5 percent of the full system) and become expert at this level within a few minutes? Game designers have created clever introductions that gracefully present new features as users acquire skill at the first level of complexity. [...] A good level-structured design in the interface must be acompanied by levels in the tutorials, online help, and the error messages. (pg 47)

I would add that getting the computer to puzzle out your intent AND be smart about it is one of the key issues of making computers easier to use. Microsoft's Clippy was widely hated because it interrupted the user with guesses at the user's intent-- "I see you're about to write a suicide note, would you like to see a list of the most successful ways to off yourself?" What's worse, is all the work involved in making this interruption animated (eating that little bit of processor time that could be better used by everyone's instance of the Distributed Net OGR cruncher!) Open Office make this intent-help less intrusive with the little transient light-bulb icon at the lower-right corner.

Advogato Needs Book Lists     If we have ranking for diary entries, I think it would be a simple matter of programming to also rate lists of books in various categories using the same system. More interestingly, it should perhaps give you the same kind of evolutionary learning scale or level structured and could even take the existing "categories" of Observer, Apprentice, et al. for pigeon-holing books. Observer-level books get you things like "In the beginning was the command line" by Neal Stephenson, Apprentice "Learning Perl", Journeyer "Programming Perl, Mastering Algorithms in Perl, Perl Cookbook" and Master "Perl in a Nutshell". Maybe its too much work but the idea is appealing to kind of see "What's on everyone else's bookshelf?" vs. "What's on everyone else's desk?"

The Sapir-Whorf Hypothesis and Programmers     Tim Sweeny's "radical claim" that
our thought processes as programmers are deeply influenced by the language we programmed in.
is not a new claim as it was theorized early last century that (natural) language affects how we think and related languages affect how we think similarly. I don't think that anyone could dispute that a programming language could not be considered a "human" language, considering the amount of Perl poetry that exists. :)

Gentoo and KDE     Got bitten by the KDE 3.1 bug in Gentoo where everything just collapsed and fucked up after I logged out. Solution is to edit /etc/env.d/49kdelibs-3.1 and add KDEDIRS=/usr/kde/3.1 to that list of variables. You'll probably have to reboot. Without this, look for basically zilch in your KDE menus and no ability to even launch an xterm from inside KDE. TWM makes me shudder.

Orange Juilius Equivalent     Never had one from the mall, but this is pretty good.

  • 1 cup orange juice
  • 1 cup water
  • 1 heaping cup ice
  • 3/4 tsp. vanilla extract
  • 1/8th cup powdered milk

Chuck in a blender, blend until frothy and smooth. Makes two drinks. The "Top Secret" recipe says to use 2 egg whites, but with a 2/60,000 chance of salmonella, I'll think that I'll stick to powdered milk.

Nullsoft Superpimp Installation System     v2.0b1 is verrrrry nice. My favorite "windows installer" software for sure.

Regular Expressions     Didn't get much time to play more with these jewels. The first one I'm experimenting with is ([A-Z][a-z]+) ([A-Z][a-z]+), a simple one intended to find "Firstname Lastname". It doesn't handle the multiline Firstname
Lastname case yet, but it certainly does a fairly good job on my e-mail. I can already think of a whole bunch of other cases where it will not match more complicated names like First Middle Last. With some simple scoring, I can filter out most of the spurious matches which is all I really want-- it doesn't have to be perfect. I wonder if there is a catalog of useful regular expressions-- I think I remember one in the Mastering Algorithms in Perl book.

Realtime with Bill Maher     So far, I think it's a better show than his previous show, Politically Incorrect, which was run off the air immediately after 9/11 when he voiced an "unpopular opinion." Perhaps because it's on HBO and not network television is what makes the difference.

Where there is Traffic, There is Innovation     Noticed on the BBC London site they have something called "jamcams" and "sequences"... jamcams are oviously the American equivalent of traffic cameras, but the sequences was something new to me. The sequence is a useful ordering of existing jamcams along a commonly traveled path. I think most traffic sites I've seen in the US only give you a map to pointyclicky.

Life     Up and down. Chaotic. Big ups. So-so downs. I guess I'm averaging up on the "ok" side.

76 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!