Recent blog entries for jfrisby

I haven't posted much about it recently but the status of EasyORM is thus: EasyORM 0.5.0-alpha9 has proven in our reasonably stressful production environment to be quite robust. The Java version is very very slowly progressing, having been scrapped and restarted.

Basically, I'm building a set of components that will make implementation of a tool such as EasyORM a lot easier:

  • Templates - A generalized nested (heirarchical) template system intended for use by code-generators. This is 100% done and I really should establish a Freshmeat project for it.
  • BeanBuilder - Using Templates, this tool takes a description of a JavaBean and produces source for that bean. Allows terse description of robust beans, including type-safe collections as attributes and so forth. This is about 98% done -- using it to build SQL (see below) has proven quite instructive and so I'm making it a bit more flexible before releasing it.
  • SQL - A toolkit for modeling the structure of relational schema, including a MySQL DDL parser. The parser is written using ANTLR and is about 98% completed and quite robust in its support of MySQL's DDL (as it's IMPLEMENTED, not how it's DESCRIBED in the docs -- the docs have a number of mistakes). The model is written using BeanBuilder and is about 85% complete.
There will doubtless be several more such packages/tools built before or during the construction of EasyORM. One such toolkit is likely to be an analogue of the SQL package but for object models, rather than relational schema.

Ultimately I want each of these tools to do as little as possible because I know that each piece will probably undergo MASSIVE revision as I learn more about it. It may not seem like it on the surface but O/R mapping contains a LOT of non-trivial problems to deal with: Besides the obvious (mapping objects to relational schema), things like robust generation of efficient and reliable code aren't as trivial as they might at first seem. There's also issues of usability with respect to the mapping definition language, GUIs, etc.

Setting aside EasyORM for a moment, I made a simple but handy little tool called DataDiff which will compare the contents of identically-named tables in two databases on a MySQL server and return rows that differ between the two. Sort of like a NATURAL FULL OUTER JOIN, except that MySQL doesn't do FULL joins, and NATURAL joins don't give the desired results if columns contain NULLs. In addition, the tool will optionally ignore the values in DATE/TIME/DATETIME/TIMESTAMP columns. Basically, this tool was built as part of an initiative to have a push-button release process that included extensive regression testing. It isn't enough to ensure that the front-end produced the "right" answer, one must also ensure that it did the right thing in the back-end as well. DataDiff allows us to do this.

DataDiff is one of a handful of small tools developed internally for our Java-based development efforts, and as time permits I'll release the others. All of these tools are focused on quality-management by automating testing and to a smaller-extent enforcement of "best practices" where applicable.

Bleh. Well, I finally have the Perl version of EasyORM polished and stable. The documentation matches the code, there are no significant bugs in either EasyORM or the generated PHP code, and it Just Works. Whew. EasyORM 0.5.0-alpha8 for the curious.

The Java version of EasyORM however is being delayed somewhat:

  • There were some ugly bugs in EasyORM itself -- nothing that couldn't be worked around, but I'm not comfortable releasing it with them.
  • The generated Java code is being totally changed. It will be a feature-for-feature match with the generated PHP code, (caching, debug levels, etc) and business logic integration should be as seamless as with the PHP version.
  • The PHP generator is still not-yet-ready. When I get done with it, the generated code should be byte-for-byte identical to EasyORM 0.5.0-alpha8. Hopefully this will smooth migration.

I should also note that we're using EasyORM generated code on a daily basis for all O/R mapping in our production system. We handle around 100,000 database-driven page views per day with a few special cases that tend to stretch EasyORM in uncommon ways so I can say with some degree of confidence that the generated code from 0.5.0 is quite stable and usable. The generated code for 0.4.1 however has some serious holes and shouldn't be trusted without *very* detailed testing.

Well, a couple weeks ago I finished the Java-based, Java- targeted rewrite of EasyORM. First Java project in a while and I must say I like the tools available to Java Developers (JUnit and Ant in this case).

The only task remaining for EasyORM 0.5.0 is to build a PHP- targeted code generator.

Since 0.5.0 was built with the pressing need of moving to Java, that's what I focused on. But by the time I had even begun the Java rewrite, I had a great new model for the generated PHP code finished and more or less tested.

O/R mapping in general isn't a trivial task. In particular, caching semantics and object (entity) life cycle management turn out to have a fair number of subtle details that aren't neccesarily easy to address. That said, I think EasyORM 0.5.0 (both PHP-targeted and Java- targeted) rectifies most of said issues...

I'll launch 0.5.0 as soon as I have half a day to build the code-generator for PHP.

Well, it's been a while... Having been laid off from Everyone.net, and then promptly laid off from digiGroups when they got bought out 3 months after I started with them, I'm now working at YourFreeDVDs.com. I've published my first Open Source project -- EasyORM an Object/Relational mapper for PHP. It's at my web site. Yeah, the code is really rough, its buggy, it doesn't produce wonderful code and has 0 documentation but I only had 2 days to put it together... :)

Well... That was... Humiliating.

Well it looks like RMI is the way to go after all. I really wish Perl had something comparable, and better garbage collection -- circular data structures are very convenient. Thankfully the problem domain is fairly simple, and reimplementing what I have already wont be too hard or time consuming.

Anyway, that may or may not be a priority in the Very Near Future. I've got higher priority stuff to focus on at work.

I am however considering defining my own XML language for vocabulary translations. I.E. to help me memorize Japanese vocabulary... *grin* Am I duplicating any efforts? I hope not... I'll be posting the DTD or specification on my site, along with reference code when I get the chance...

I haven't tried messing with global-IMEs and Unicode display in Java or Perl... Any advice?

Nihongo wa utsukushii to muzukashii desu.

Anyway, I miss my sweetie... I didn't see her today. :( Apparently however she and her mom nearly got killed on the way to dinner -- moron ran a long-since red light at around 50MPH. Her mom stopped just in time. He just waved as he blew by... Some people really need to be removed from the gene pool for the sake of the rest of humanity.

-JF

Hmmm... Is the Perl interface to PVM *really* that out of date? The most current one according to the PVM web site and CPAN is from '96. Bleh!

Well, I'm not even sure PVM is the right tool for this job. Essentially I have a tree structure, where each node is a work unit and is dependent upon the results of rendering its children.

Rendering a node involves:
1) Gathering inherited information from the child nodes.
2) Gathering node-local information from the database.
3) Winnowing down the data to the relevent subset.
4) Producing output files (2 per node).
5) Returning the subset data to the parent node.

I've had PVM and MPI suggested to me as starting points, although it seems that moving to Java and RMI might be better for the situation...

Gerry says PVM is probably inappropriate since my code has side- effects (step #4), but Wayne (who suggested PVM/MPI initially) says that RMI may be inappropriate because of the volume of data being passed around -- individually the data set returned to a parent is small, but there can be lots of children.

I still need to look the PVM vs. MPI comparison on the PVM site -- being on Windows, viewing PostScript files is kind of a chore.

Any input or advice?

On a (*grin*) personal note: My girlfriend is awesome. It's really cool to have someone who will give you an 8:00AM wake-up call, even if that means calling epeatedly for 10 minutes straight until you wake up. :)

Wow... Advogato is cool. :)

Raph pointed me to the TIGER/Line database, and Benjy dropped me an e-mail pointing me to Bruce Perens' site, where a Free copy can be found. Cool.

I'm on a deadline so I can't review the data just yet, but I imagine it has everything I'm looking for. I can't wait to OS this job database...

I should be up-front in pointing out that the system is back-end only... All the extant front-end code sucks and uses unusual tools (the best front-end code -- from the now defunct BrainPower -- uses ePerl... MrJoy.com uses HTML::Mason but the front-end is extremely crude and tightly integrated with the rest of MrJoy.com). Basically it's a high-level Perl API and database schema for posting/modifying jobs, etc. It's carved out of a larger system so some things will seem a little incomplete...

I may include my e-mail processor that lets you handle e- mailed job submissions -- using a special format -- but it's kind of clunky. I'd much rather rewrite it to use an XML language to define the jobs first.

The e-mail handler was made to work with QMail, but I've made a POP3 wrapper which can be run as a cron job... Currently the API only supports MySQL, but I'd like to port it to Postgres as well, and add support for BDB transactions in MySQL 3.23...

The job engine really hits MySQL in it's weak spots. Performance can really grind because of uber-complex queries. Feh. MySQL really needs sub-selects.

BTW, anyone looking for actual open source stuff I've written can check out MrJoy (click on "Software") or check out MasonHQ under the contributions section. It's all really trivial stuff, but it works. :)

Given the response to my last question, I'll pose another one: What ever happened to the plans to have proper garbage collection in Perl? I drove myself batty recently trying to eliminate an unintentionally circular data structure... *grumble*

On a personal note, my sweetie is great. :) She can be a tad emotional at times, but I have some of the same issues she does so I understand how she feels when her mood takes a nosedive...

I need to buy her some flowers. I haven't done that yet. I should also take her for dinner at AP Stumps or some such... The nicest place I've taken her so far is Macaroni Grill.

So where does one go to find Free data? I have a really cool piece of code I'd like to release, but for part of its search capabilities it needs a database of Zip/Area/MSA Codes.

The USPS apparently charges an annual subscription fee to get the raw data and numerous other companies integrate it (zip and area code data are normally entirely seperate for example) and then resell it.

My only options are to remove most of the location-search functionality from my code (you can search for data that is "near" where you specify, you can enter zip codes, area codes, etc...) leaving just the raw location search, or mandate that users buy this database. Unfortunately the location search is pretty key...

What then should I do?

On a personal note, I'm happy. I like my girlfriend. :) Jen is the reason I get up in the morning now...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!