Older blog entries for slamb (starting at number 41)

3 Jan 2005 (updated 16 Oct 2006 at 23:53 UTC) »

Life

...goes well. I graduated from the University of Iowa last month with a B.S. in computer science and minor in physics. I want to stay in Iowa City through the summer, so I'm getting a local job. I'll likely just step up to a professional position at the hospital, since the job listings I've found sound horribly boring.

Microsoft

...sucks. Windows ate my Linux partition! I just found this KnowledgeBase article. It says that if you have more than one "primary" partition[*] and select any but the first to install to, Windows XP must also reformat all partitions before it. That's bad enough, but what the article doesn't say is that there are no prompts or warnings about this. It gave me the usual reformatting warning, but everything lead me to believe it was talking about its little area. When I booted up the Fedora rescue disk to put grub back (as Windows has always silently overwritten the MBR), I was shocked to discover that my Linux partition's type had changed to 0x07 (HPFS/NTFS) and e2fsck found nothing there. No valid magic number at the superblock or any of the backups. It wasn't just unbootable, it was gone.

Just to add insult to injury, it didn't install Windows XP properly, either. It saw the existing Windows 2000 partition on the primary slave (older) hard disk and used that as the "boot" volume (the one containing the NTLDR), again without asking me. In addition to making my drive letters quite weird (C: refers to the second drive and H: refers to the first drive), this means it doesn't boot when I remove the second drive. I need to do some trick in the recovery console to make it work.

If a Linux distribution had this bug, people would raise hell. How does Microsoft get away with this shit?

[*] - The partition table supports up to four partitions, which can be either "primary" or "extended". "Primary" partitions are the simple kind; "extended" ones are useless by themselves but can contain "logical" partitions. I think you should only have one extended partition. Only one partition should be "active" (the one the simple Windows MBR will try to boot) and it must be primary. Though it's actually stored as a boolean in each entry, IIRC, so you could have zero or more active. Who knows what Windows would do then. What a stupid scheme this is.

The actual partition layout:

Disk /dev/hda: 160.0 GB, 160041885696 bytes 255 heads, 63 sectors/track, 19457 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System /dev/hda1 1 9729 78148161 5 Extended /dev/hda2 * 9730 19457 78140160 7 HPFS/NTFS /dev/hda5 1 125 1003999+ 82 Linux swap / Solaris /dev/hda6 126 9729 77144098+ 83 Linux

18 Dec 2004 (updated 20 Dec 2004 at 02:41 UTC) »

.NET seminar

I attended a .NET seminar for U of I developers today. Jeff Brand was the speaker.

First half: Worthless. Marketing. He acted apologetic, but he still spoke the jargon like a native marketroid. He even brought up the pet store example. While he did admit the line counts were questionable, he really drove in the number of hours spent on tuning the stupid, convoluted EJB version. He didn't mention the good Java implementation. And he made the strange assertion that Microsoft's VM is better because it uses JIT. He hedged a little, saying there are implementation differences on the Java side, but he implied that JVMs do not use JIT. (He might have simply deceived by saying the Microsoft JIT-compiles everything for near-native speed. Sun is smarter than that; their hotspot JVM doesn't bother JITing infrequently used code.) I wanted to ask him to name a single major JVM that does not JIT...but I remembered the cardinal rule of seminars: never heckle before they feed you.

Lunch: A disappointment. Microsoft didn't understand how much food we eat. And it was almost all older, full-time developers - I wonder what would have happened if more student/developers showed. We would have eaten his laptop. They ordered more pizza, but it did me little good - had to jet right afterward. Maybe I should have heckled after all.

Second half: Very interesting. He focused on web development with real demos. The controls they have save a lot of work (even the simple ones - a database-driven table that sorts based on clicking the column names and pages, in a themeable way), and the development tools are unbelievable. Eclipse is nice, but after what I saw today, I don't think it's at the same level. I'm sure something similar to the controls they showed exists in the Java world; I will have to find it and design AXP taglibs to interface with it. Maybe JavaServer Faces; I've put off investigating them for too long.

I wouldn't want to use the .NET functionality as-is, though. You'd lose all the advantages if you go a little off their path. For example, the post-back functionality he described as coming in .NET 2.0 is IE-only. No good technical reason why; Google can pull off similar things on any modern browser. This is where open source shines; an OSS framework could do the same partial implementation and I wouldn't care because it'd be easy to hack the source myself.

Update: Jeff Brand just emailed me. They do support the post-back functionality in other browsers. He was thinking of early alpha releases when he said it was IE-only.

4 Dec 2004 »

SQL query tools

Oracle's horrible SQL*Plus Worksheet made me think about what makes a good SQL query tool. (And by this I mean the kind that just executes SQL statements you give it (like Oracle SQL*Plus), not the kind that comes up with SQL statements based on your selecting tables from a list, drawing joins between them, and selecting columns to return (like Oracle Query Builder).)

I made up a list of things the perfect SQL query tool should do. I'm thinking about writing my own, but first assessing what's already out there. Here are a few of the better tools I've found:

SQuirreL SQL allows you to have multiple tabs open, displays the results in a decent graphical table, and - very importantly - does not block the GUI while a query is running. In fact, it even has a "cancel" button. (It even works! I'm used to Oracle Query Builder's cancel button, which just exists to toy with you.) So if you realize as soon as you start the query that you forgot a join condition, you can immediately cancel and restart. But I hate many other things about its UI, like that it uses windowed MDI at the top level. It's got a lot of UI/core separation, so I should look again at making an alternate UI for it.
JFaceDbc, which apparently uses some of SQuirreL's back-end code, is way out there. It's an Eclipse plugin. Like Eclipse, it does some amazing things that I didn't realize I wanted until I saw them. I guess it's not really primarily a query tool, since their screenshots don't actually show the results of running a query. It's more of an IDE for SQL. It has a SQL editor that goes way beyond syntax highlighting. In true Eclipse fashion, it has the hovers that show you table definitions, auto-completion of table names, etc. I need to play with this one some.
DBTree. Its workbook output format (the right pane of this screenshot is the closest thing to what I'm envisioning. Looks beautiful. Seems to be closed-source shareware, though.

I'm not sure yet if I'm going to do any work on this. And if I do, I'm not sure if it will be extending one of those tools or starting from scratch. I might at least make some mock-ups of what my perfect tool would look like. I will soon be entering the Real World, so it would be nice to have this in my portfolio to show I can do UI design. Maybe I'll actually write the thing after that, or who knows...maybe someone else will be interested.

29 Nov 2004 »

Axamol

...is the new name for my xmldb, framework, and mb projects. It's pronouncable, and it has the letters 'xml' in it - all three of these subprojects involve XML.

My choice now is if I should release xmldb/framework together or separately. I might release them as "Axamol SQL Library" and "Axamol SAX Pipeline". Or together as "Axamol". I wrote up pros and cons. If anyone feels like reading them, I'd like opinions.

Axamol SQL Library

Recently got support for some more dynamic SQL. Jeff started using it for his SQL Logger and ran into a problem. He had some conditionally-included where clauses in his Java-generated SQL. He tried switching it to something like this:

<s:query name="foo">
  <s:param name="mindate" type="date"/>
  <s:param name="maxdate" type="date"/>
  <s:sql databases="pgsql">
    select    *
    from      table
    where     (<s:param name="mindate"/> is null or date >= <s:param name="mindate"/>)
      and     (<s:param name="maxdate"/> is null or date &lt;= <s:param name="maxdate"/>)
  </s:sql>
</s:query>

...and found that it took 4 seconds where it used to take 15 ms. PostgreSQL was coming up with a query plan designed to work in either case, so it wasn't using his indexes. (Makes me wonder if our Oracle Reports at work should be using lexical bind variables for this same performance reason.)

So I introduced another form of dynamic query:

<s:query name="foo">
  <s:param name="mindate" type="date"/>
  <s:param name="maxdate" type="date"/>
  <s:sql databases="pgsql">
    select    *
    from      table
    where     true
    <s:ifNotNull param="mindate">
      and     date >= <s:param name="mindate"/>
    </s:ifNotNull>
    <s:ifNotNull param="maxdate">
      and     date &lt;= <s:param name="maxdate"/>
    </s:ifNotNull>
  </s:sql>
</s:query>

Which just conditionally includes a SQL fragment based on whether a parameter is null. Simple but effective.

I also introduced dynamic order by clauses, also for his code. Nothing too exciting there.

He also had a list of regexps to require, but my existing <s:bindlist> dynamic SQL was general enough to handle that. I think he ended up with <s:bindlist join="" each=" and field ~* ?" param="regexps"/> or similar.

I'm not sure how much more dynamic SQL people will need. I could introduce a way to dynamically insert SQL identifiers, but hopefully that's rare to do in software. I've sometimes written queries like this:

declare
    cursor grants is
    select    grantee, granted_role
    from      dba_role_privs
    where     granted_role in ('FOO', 'BAR');
begin
    for grant in grants loop
        execute immediate 'revoke ' || quote_identifier(grant.granted_role)
                          || ' from ' || quote_identifier(grantee);
    end loop;
end;
/
show errors

...but always as ad-hoc queries. I can't think of a reason for software to do that.

Nevertheless, I'm sure as soon as I release this code someone will ask for some form of dynamic query I haven't anticipated. People make SQL for some quite exotic stuff.

Axamol SAX Pipeline

Work on my JSP-like format for building SAX streams is going well. (Used to be .xfp. Called .axp for now, though that has an unfortunate similarity to .asp when spoken.)

I finally ditched "logicsheets", which was a stupid idea I'd gotten from Apache Cocoon. (Preprocessing .axps with XSLT to provide reusable tags.) Instead, I managed to adapt JSP's taglib idea to SAX streams. It worked out well - I even managed to derive a AxpPageContext from javax.servlet.jsp.taglib.PageContext. It has most of the same methods but makes fun of you if you try to get a JspWriter from it. It should make porting JSP taglibs easy. In particular, Struts 1.2 seems to want a PageContext to do anything, and now I have one to give it.

Multi-language support is coming. For now it supports java and java-el (Java + JSP EL expression languages). The second one only required 150 lines of code based on Apache Commons EL. (Inheritance is wonderful.) Now I'm working on PythonAxpWriter, using Jython.

I also cleaned up a lot of code and exception handling. Plus making some things just more pleasant, like the way the Ant task to compile .axps now considers them out of date if the compiler itself has changed.

AxpCompiler.Handler, the language-independent bulk of the .axp compiler, continues to defy my efforts to clean it up. It's a huge mess, and surely buggy.

14 Nov 2004 (updated 14 Nov 2004 at 06:35 UTC) »

jpick:

The problem you've mentioned, difficulty getting both parallelized builds and proper dependencies, is because you have "all the Makefiles". It's a recursive make problem; if you had a single instance of make able to see the entire dependency graph, a correct parallelized build would be trivial. Have you read Recursive Make Considered Harmful?

The catch is that it's hard to set up a single Makefile for an entire build; it's different than how anyone does it, so there's no good documentation. Tools like automake only sort of support it. You can include dir.mk files in each directory to have somewhat similar organization, but it gets...weird.

What's more realistic is switching to a make/autoconf/automake alternative like SCons. Everyone who writes an SConstruct (the equivalent of Makefile) uses it to make a single dependency tree of the entire project. It supports this model well by allowing inclusion of SConscript files in subdirectories. These are somewhat like having Makefiles at each level; it's more than just preprocessor-style inclusion. (They have separate scope, though you can import/export variables.) But unlike many Makefiles, it yields a single dependency graph.

Plus, I like SCons better anyway. Clean, uniform syntax (Python!), rather than m4 over make over sh. Built-in help checking for dependencies in C files. The ability to use a real scripting language to self-generate without introducing another new syntax. Less platform variation, since Python provides a more uniform API than the shell utilities. (No need to use portability m4 macros.)

8 Sep 2004 »

richdawe:

No POSIX system supports non-blocking disk I/O through the O_NONBLOCK interface. Some support it through a separate asynchronous I/O interface but due to its complexity and non-portability, few programs actually use it. Also, it doesn't support an async open(2).

djb argues, like you do, that they should support this through a more normal mechanism.

Some software uses elaborate workarounds - pools of threads that exist just to open files, mmap(2) them (for zero-copy), use mincore(2) to determine if they are in memory, and do reads on them if not. All that work so the main thread doesn't have to block. If the system you/djb propose existed, it would be unnecessary. (There would also have to be a decent sendfile(2) to use zero-copy, though.)

framework

...is moving along. This is my embedded-code-in-XML files thing. It's like JSP except that it's tweaked for XML - programs pump out SAX streams rather than text. More correct XML (checks validity at compile-time, avoids escaping issues like XSS attacks), better performance (can save unnecessary serialization/parsing steps), etc.

My big news is that I'm doing this as an Independent Programming Project now for 3 semester-hours. So rather than interfere with my coursework, it is my coursework. So I'm able to devote more time to it.

I've already factored out a lot of the Java-producing code into a separate class based on an abstract one. The idea is that I'll soon also have implementations for Python (jython) and BeanShell.

The two things holding me back are:

Whitespace in <xfp:logic> blocks. It's horribly messed up, so Python will choke on it. This is the easy one; it's only because of laziness I haven't fixed it yet.
The "logicsheets". I stole this idea from Apache Cocoon and now I want my money back; it sucks. You can create libraries for XSPs (their format) or XFPs (my format) through XSLT files called "logicsheets" that are applied to the X[SF]P before compilation. It's a really easy extension mechanism to implement, which is why I stole it. (And presumably why they came up with it.) They can insert logic blocks or whatever. But...
- They mess with whitespace (Python problems again)
- They're language-specific (need separate versions for Python, Java, BeanShell, whatever)
- They're horribly complicated to make (bizarre <xslt:template> rules dealing with literal and non-literal versions of the same thing)
- Pumping the page first through XSLT means you get no org.xml.sax.Locator. So you can't find the position in the original file in case you have an error to report. Makes debugging an X[SF]P not fun.
- They just plain suck
So I'm going back to something more similar to JSP's tag libraries. I half-designed something that lets them have their own stack (to communicate with something like TagSupport.findAncestorWithClass) and a stack to modify the ContentHandler for snaring body content as arguments. But I realized this gives no way to pass a non-string argument to them; that's no good. Need to tweak the design.

I'm making baby steps toward having a JUnit test suite. But nothing that tests the meat of the code yet.

Modifying the syntax. For now, I support {}-bracketed Java expressions in attributes like XSLT. This saves from doing the whole <xfp:attribute> thing when you just want <a href="thread.do?id={threadId}">. I may go all-out and support the terse JSP 2.0 Expression Language syntax in the Java version; there's an easy API for it. In the Python and BeanShell versions, I think I'll stick instead with one language.

I'm also going to increase the performance. For now, it creates (and GCs) a lot of objects on each hit. org.xml.sax.AttributeImpls, StringBuffers, Strings, char[] arrays. There's no reason except laziness. I can make the char[] arrays once and reference them instead; I just need to maintain an additional mapping during the compile. And I can reuse StringBuffers and org.xml.sax.AttributeImpls. Maybe these are what are causing the horrible performance problems I've noticed. framework really should be fast; it's a good design, I swear.

Eventually I'll write documentation. Answering such important questions as :

Why would I use framework?
How would I use framework?
Why was framework designed as it is?

but for now, any spectators will have to be content with my short webpage and notes in diary entries like this.

7 Sep 2004 »

richdawe

Interesting. djb describes that problem:

The problem with O_NONBLOCK is that it's too heavy: it hits an entire ofile, not just an fd. For example, suppose I set O_NONBLOCK on a pty on stdout, and another program has the same stdout. That other program then has O_NONBLOCK set too. Many programs can't handle O_NONBLOCK and will break horribly in this situation.

This is the first time I've actually seen it come up.

5 Aug 2004 »

Open-source cell phone UI

In my last entry, I mentioned the idea of making a J2ME-compliant cell phone "game" that is actually a better UI to the phone. The idea is dead, I'm afraid. It seems phones do not really provide an API to the phone bits. Which makes sense, I suppose - a malicious program could run up a huge bill. So I've just looked around a bit more and discovered that the LG phones have relatively good UIs. Not as good as I'd like but far better than the rest. I've bought a LG VX3200, with Verizon service. It works fairly well.

OpenSSL gotchas

A while ago, I complained a lot about OpenSSL. I've softened on it, but anyone uses it should know about some counterintuitive things it does by default and how to turn them off. I do this at SSL_CTX initialization:

    /*
     * Enable partial writes.
     * Otherwise, SSL_write may return error with part of the data written; it
     * will then expect the next call to have identical arguments (it will
     * resume the write). That's completely against our calling conventions
     * for Stream::write.
     */
    SSL_CTX_set_options(me, SSL_MODE_ENABLE_PARTIAL_WRITE);

     /*
     * Enable auto-retry.
     * Otherwise, SSL_read and SSL_write may return WANT_xxx in blocking mode.
     * We don't throw IOBlockErrors when blocking; that's just weird.
     */
    SSL_CTX_set_options(me, SSL_MODE_AUTO_RETRY);

     /*
     * Tell SSL to read always read as much into its buffer as possible to
     * avoid unnecessary system calls.
     */
    SSL_CTX_set_read_ahead(me, 1);

Also, very important: if you use it in multiple threads, you're expected to supply locking callbacks. Otherwise, it is not threadsafe! You need to use CRYPTO_set_locking_callback and CRYPTO_set_id_callback. You should also check how this interacts with whatever SSL-using libraries you may be using: libpq, neon, etc. And each thread should perform ERR_remove_state(0) at thread exit time.

Lastly, if you're doing non-blocking IO, you have to realize that SSL_write can return with an SSL_ERROR_WANT_READ and SSL_read can return with an SSL_ERROR_WANT_WRITE. That's because it can renegotiate at any time, so a logical read operation can involve an actual write operation, or vice versa. You have to take care to add the SSL descriptor to the correct FD_SET in select (or whatever mechanism you use).

Programming

...goes slowly. Life interferes, and I haven't been feeling motivated. Maybe things will pick up soon.

15 Jul 2004 (updated 15 Jul 2004 at 20:46 UTC) »

Open-source cell phone UI

Today I got an Audiovox CDM8410, after my Nokia 3360 had problems with the LCD. I'm disappointed in its user interface:

It's hard to read the important information off the display. They've got a small clock on a multicolored background, a tiny signal strength indicator, and a tiny battery life indicator. My old Nokia 3360 has these in huge black letters on a white background. Consequently, the new one is unreadable without the backlight on. The old one's backlight is for dark rooms only.
You can put five phone numbers under one name...but they have to be labelled "General", "Mobile", "Office", "Pager", and "Fax". I want custom names like "Mom's House", "Apartment", "Hospital". I have friends with three jobs. Which one gets the "office" label? So I'm stuck with listing out all of the phone numbers individually.
The phone book's find feature sucks. I have 80+ numbers in my phone. Being able to get to a number quickly is the phone's most important UI task, and they screwed it up! (And this phone is the best of the three I examined, by far!) I picked it because you could get to Find with one button press. But I missed that it's a substring match, not a prefix one. (If I type in "C", I want all the names that start with "C". Not everyone who has a "C" anywhere in their name; combined with the above, this means a search for "C" shows everyone with a cell phone. Not useful.) And it has a delay before it actually starts searching. And if you backspace or hit an invalid character by mistake, it gets out of "Find" completely. Frustrating.

The old phone's interface was actually a little better. Cell phones are getting worse...the designers are too busy focusing on digital cameras, color displays, Bluetooth, games, and other crap to design phones. My new phone might not be the best available, but I think they all have these essential problems.

I might be able to sneak a good UI onto a cell phone. They sell them with J2ME and such for games now. If I take this back and exchange it for such a phone, I might be able to pull this off. Here goes:

Write a J2ME "game" called GoodUI. It would be an open-source interface to the cell phone. A main screen that shows the clock, antenna status, and battery status in huge letters. A phone book that doesn't suck. It just depends on having a decent API to the phone. In addition to needing enough storage space for its own address book, my game would need access to the real phone bits. It'd make calls, receive calls, get the antenna status, and get the battery status. Store a call log, etc. You'd have to play the game every time you start your cell phone, but after that you'd have a real UI.

21 May 2004 (updated 24 May 2004 at 09:12 UTC) »

Scott's Laws of Email Filtering. Amendments, anyone?

Don't ever silently send email to /dev/null.

(For bram, among others.) If you mess up your rules, you could drop important emails. You'll be unhappy, the sender will be unhappy, everyone will be unhappy. If you drop an email, let the sender know.
Never send a bounce or reply to a suspected worm/virus email.

The reason is simple: the address is usually forged. So if it is indeed a worm or virus sending it out, your bounce will go to some poor person whose mailbox is probably overflowing with these messages.

Together with #1, the only option this leaves is to send a 5XX (permanent) rejection during the SMTP transaction. If it's a worm, it'll be connected directly to your SMTP server, so it will swallow the rejection and the address's real owner is never bothered. If it is a real person, their end of the SMTP connection will pop up a dialog box (if its a mail client connected directly) or generate a bounce (if it's an intermediate SMTP server). They know and can resend.

The ones that are especially annoying to me are the challenge/response replies with two choices: "I sent this message and it's not spam" or "I never sent this message". I consider this quite rude. They're expecting me to filter their mail for them every time a mail is sent, instead of spending a little bit of their time to set up rules that would save me the trouble most of the time. (Of course, I always hit the button that will send the junk mail on to them; I'm not a turn-the-other-cheek sort of guy.)
If you send automated email replies or forwards, filter likely worm/virus emails first.

...and then follow #1 and #2 with those messages, of course. Same reason as before.
Keep your rejection messages polite.

If an actual person is reading the rejection, you probably wanted to get the email. Be polite and tell the person how to get around the block.

If you follow these rules, you can be more aggressive in your filtering without too many consequences. For example:

I reject all messages with a subject line of "Thank you!". I'd like to think people might legitimately want to thank me from time to time. Perhaps they'd say so in the subject line. But the worst thing that happens is they get a polite reply asking them to change the subject line and resend. With a copy of the message included, should they not have saved it. It wastes a little bit of their time, but this is rare enough that I don't feel so bad about it.
I reject almost any MIME attachment that could be Windows malware. Even .zip files now. I politely ask them to arrange with me another way to send it. (Sending binaries through email isn't such a good idea anyway; it's indirect, and base64 bloats files by a third.)

richdawe:

mailman is the wrong layer to solve that. SMTP servers (Postfix, sendmail, qmail) can do some filtering based on the From: address. Mine is set up so that it rejects anything claiming to be from slamb.org unless sent from localhost or with SASL authentication. If gnu.org did the same, it would know the email from libtool@gnu.org is bogus.

And SPF, which you mentioned, could do the same across mailservers. I don't see a need for a mailing list-specific variant.

32 older entries...