Older blog entries for alvherre (starting at number 10)

20 Jan 2007 (updated 20 Jan 2007 at 01:08 UTC) »

It looks like my McNaught comet photos now made into NowPublic: The comet tail seen round the world. Fact is, I was pretty impressed when I saw it. It was certainly quite amazing.

On other news, I managed to get the whole buildfarm in red for several hours with a pg_regress patch. The fix to the problem was quite simple, but since I'm spending some time researching a new appt. to move to, it took longer than expected. Sadly, all the Windows machines are still failing a test with amazing reproducibility. We need some Windows hacker involved in order to fix it though ... the failure is pretty bizarre.

18 Oct 2006 (updated 18 Oct 2006 at 18:50 UTC) »
a couple o'pics

FYI, I've uploaded some of the pictures I took at the PostgreSQL Anniversary Summit. In doing so I hit the Flickr limit for free accounts, so I'll continue next month :-P

They can be seen here.

A nice gift beats dollars

Got a new book collection today. Yay! Huh, and that has to do with a PostgreSQL-related blog exactly what, you ask?

The story goes like this. Back in 2004, I went to Caracas, Venezuela, for a couple of weeks. During my stay, I was asked to do some consulting for a courier company that was using PostgreSQL for its mission-critical database. It was getting very slow, they whined, and had been approached by DB2 salespeople to replace it. The company that did the Linux support for them was not very fond of ditching their free database for some proprietary stuff, so they asked me if I could have a look and try to "make it faster".

Turns out they were running PostgreSQL 7.1, which was already quite old, and had some horrible queries and very big tables. Most of the queries did not properly use indexes, so their I/O channels were saturated by the constant seqscans. They needed a lot of tuning, caring, and query refining; but what they needed the most was an upgrade. So that's what I focused on at that time.

Of course, I was paid for this work.

When I got home, however, they were still having some issues, and on IM I kept giving suggestions on how to improve queries, pointing the appropriate indexes to create, suggesting FSM configuration improvements and proper vacuum scheduling, etc. This went on for almost two years. In those months I never saw a dime from them and I didn't really care, because it was entertaining most of the time, and a good experience; and I already had Command Prompt's wage, or EnterpriseDB's before that, so it felt a bit unethical to behave like we had a support contract.

A couple of weeks ago, however, they were having problems again, queries were taking too long, customers were starting to be annoyed and a pretty girl was starting to pull her hair out — so one sunny saturday her and I sat down and studied some of the problems and finally figured'em out. Just before I left, I tossed my (shameless plug) Amazon wishlist URL to give her an idea.

Today I got this:

For you the not-curious-enough-to-click, that's "The Complete Calvin and Hobbes". That's right, ten kilogram and ten years of Calvin and Hobbes in the most beautiful hardcover edition I can imagine. Who wants two years of support's money? I'll take a nice gift like this any day. This one will keep me busy for quite a while. Thank you Jacqueline!

Don't get me wrong — I, like most of you, need the dollars anyway for paying the rent and other mundane stuff. I also enjoy being able to eat :-) and will happily continue working on stuff I don't quite enjoy as much as doing PostgreSQL support over Jabber, just to have that privilege. I still have fun by hacking on the challenge that is Mammoth Replicator. But I'm a Calvin and Hobbes lover and things like these cannot be beat by mere checks.

Yesterday I had a meeting with the NIC.cl guys -- they run the Chilean ccTLD. They've been running their stuff using a different open source database and they are not happy with it for two reasons: 1) they discovered that their replication system chooses to skip some updates from time to time, which made them a bit nervous :-) and 2) a competing, closed source database company recently bought part of the first company's infrastructure providers, which makes the business future of that other open source company unclear. (I don't think it's very difficult to guess what database they are using.)

So they contacted me so that I could update them on the "state of the art" in PostgreSQL replication and stuff. I mentioned Afilias and they were very interested; people from Afilias are likely to be contacted soon to figure out how their experience with Slony-I has been. I can't help being excited; I expect this to be a future public success story, and a notorious one, here in Chile. It makes me really happy; soon we _will_ see more of these here, I hope.

Ok, so I think I've postponed writing here for too long. I guess the most important thing that I've omitted is the switch from EnterpriseDB to Command Prompt.

EnterpriseDB was not a bad place for me to work in, I think. There was nothing wrong with them in principle. I just felt somehow awkward. One fine day Josh Drake emailed me and offered to go with him instead; it didn't take me much to say aye. To me it wasn't really much difference either way; one could say that the actual difference is more in how I relate to the "big bosses" in each company, than in any other matter. But this relationship really matters to me, which is why I switched.

(A couple of people has approached me asking how was EDB as an employer. If you are a PostgreSQL hacker looking for a job, I think EnterpriseDB is a good place to be in. If you are a long-term contributor, make sure you'll have time to work on the free stuff!)

CommandPrompt is great in one important regard: they've been alive for a long time and have great customers. They are not going away anytime soon. They have well defined products which they have been selling for some time already. This doesn't mean that their products (particularly the Replicator stuff) do not need further work, :-) particularly when no one from PGDG has been overseeing their internal development. So I have plenty of things to do in their proprietary offerings.

But I didn't enter CommandPrompt just to work on proprietary stuff. I really need to take a break from this pressure and do something "in the open." I have talked to Joshua about this and he knows it, so I hope to get back on track sometime soon.


Another development cycle has come and go. Looking back, I see that I did manage to get some new things, but certainly not as much as I'd like. If I wasn't distracted so easily this release would be better ;-) Looking back a little, here is what I achieved for PostgreSQL 8.1:

  • Shared row locking. I wrote about this one a couple of times before.

  • Shared dependency tracking. This patch was committed at last minute, because it conflicted hard with the roles patch. I managed to get it in however. For readers not in the know, this new functionality ensures that the system refuses to drop any user that owns objects (tables, functions, etc) in some database, or that has permissions (grants) on some objects. This is not a big feature but it means the database is always in a consistent state.

  • Integrated autovacuum. Yes, we now have an integrated autovacuum daemon in the backend. It's far more powerful than the contrib daemon: the various parameters can be set per-table, and it can be disabled per-table too if needed.


The EnterpriseDB guys published a press release stating that they have hired me and a couple of other guys. Truth is, I've been working on their account for a couple months already. It's cool because they let me contribute my work back to the community; in fact, the idea is that my work would be 50% closed and 50% community, though I haven't seen the closed part yet. I'm OK with the split, because I was already doing closed software before, so in no way it's worse than before, but I'd like the closed part to be as small as possible. We'll see how it all turns out.


So what now? I have some plans, and EDB has plans for me:

  1. Shared dependencies still have to go some way. I need to implement a way to facilitate dropping users, because currently you have to drop or give away the objects manually. My proposal is to implement a couple of commands, DROP OWNED BY and REASSIGN OWNED BY.

  2. Autovacuum needs further improvements, such as working without row-level stats. Also, I'd like the ability to track Xid wraparound per-table, so we don't force a database-wide vacuum on anybody.

  3. I want to start working on the optimizer and executor code, just to know how does it work and maybe to do something interesting on the long run. My plan for the first step is to add some more specific optimization to BETWEEN. I had an idea for a second step, but I don't remember right now what it was and I don't have my notes handy. The final grand plan was to implement recursive SQL queries, but I'm a long way from that, so maybe someone will beat me to it.

  4. Slony-II. EDB wants me to lay my hands on Spread, or whatever group communication suit Slony-II ends up using. My first step is to develop some sort of test harness for Spread. I'm still not sure how to do it. I still have some research to do on the subject.

My plans

It's been a month since my last entry, so clearly I'm not a good blogger. Anyway:

I submitted the MultiXactId patch some time ago. Tom reviewed and committed it. We don't have the foreign key problem anymore! I was kinda ashamed by that problem, so I'm happy it's now solved. Fortunately, we figured a way to make the mechanism be used only when it's really needed, that is, only when a row is locked by two or more transactions at the same time. This way we don't even incur in a performance hit — the only point where the mechanism enters the picture is when previous Postgres' versions would block, so it's a net win.

Right now I'm looking at Heikki Linnakangas' two-phase commit patch. With some luck I can help clean it up some before it gets to Tom, so that when he gets it, we can quickly have a fully functional 2PC implementation in time for 8.1.

Additionally, I'm looking at completing my shared dependencies patch before 8.1 freezes. I let it rot some; I partly blame Stephen Frost's SQL role support patch. Of course it's a lame excuse — it'd be cool to have both things in. But they conflict anyway so I'm leaving this behind a little.

It worked!

Ok, I am amazed: the phantom Xid idea worked! Only I didn't name it phantom Xid because that name has a bad history. So I named it MultiXactId. The implementation is finished; it only needs some more testing, making sure it deals correctly with wraparound, and that it correctly truncates SLRU segments.

To recap: the idea is to allow a row to be share-locked. This means somehow storing multiple TransactionIds in the tuple header. This is done by using a shared counter similar to TransactionId, which I have dubbed MultiXactId. Each MultiXactId is associated with variable-length storage in SLRU, on which the set of TransactionIds resides. This MultiXactId is stored in the tuple's Xmax, just like a TransactionId is stored in case of a SELECT FOR UPDATE.

I just posted the patch; probably Tom will have some disagreements with the way some things are done, but he already expressed approval of the general idea, so I expect that eventually this is going to be committed.

I feel somewhat ashamed that it took me two weeks (counting from the last blog entry) to complete the implementation. I hope not everything takes me this much, or I won't be having a lot done.

After this I'll return to the shared dependencies patch, and then I don't know what my next project will be ... maybe I'll continue trying to figure out a way to do the translation of the docs, which is badly needed.

phantoms – a better idea

After wasting several days trying to come up with a way to make Postgres' lock manager spill to disk, I gave up and ICQed Bruce. This was mainly motivated by the fact that several people complained that the idea of having any process sleep on I/O waiting for the lock manager was a loser. After thinking on it for a while I cannot but agree. I even considered writing some sort of lightweight B-Tree handler on top of SLRU, so we could quickly find lock entries on disk by their LOCKTAG; this is better than using a plain sequential structure, but it's difficult to get right, it's lots of additional code, and it will still need I/O for getting locks :-(

So when I talked to Bruce, he simply said it was a bad idea to use the lock manager to lock tuples; we can't afford to have that many lock entries. In a way it was a disheartening idea. He wondered that maybe we should use some sort of Phantom TransactionIds, an idea on which hearing I almost freeze on terror – I had already heard of Phantom Xids from him, during the nested transactions project, and it was very painful (to me) to code and in the end it was an utter failure. (Althought coding that was a very interesting exercise from which I learned a thing or three – the patches and the disappointing comments from Tom are here.)

So I spent the next day thinking on phantom Ids and felt miserable ;-) because I didn't really see how would they apply to this case.

Turns out it's surprinsingly easy to have it all fit together. Phantom Xids in this case are just a means of indicating a set of TransactionIds; and we store the set on SLRU. So when somebody wants to lock a row, he creates a set; and when somebody needs to lock exclusively, he can sleep on every TransactionId of the set. The only tricky part is how to store variable length content on SLRU, but that is easily solved by using two SLRU areas.

So far I have coded a good deal of the whole thing; though it still doesn't do everything yet, I have now confirmed that the idea is perfectly workable and I'm sure it will have perfect performance.

This means we will have shared row locks for 8.1! I can't help but feel good about that. No more deadlocks in foreign keys. Cool!

posted the patch

Yeah, apparently what I had first thought was in fact the right idea. I'm talking about my shared-row-locks project. Due to some misunderstanding on my part, I figured that the simpler idea was wrong; so I tried a lot of other things, and of course, they were also wrong. So eventually I came back and tried the first thing again, and discovered that it works as expected (by me at least). So I posted the patch, and promptly received a comment from Tom which made me notice a gross mistake. Easily solved, but gross anyway ;-)

performance measure stupidity

I ran some performance testing to verify that my patch won't make people too angry at me. I was terrified to discover that it had dropped by 25% or so in pgbench. I spent an hour and a half looking at the patch searching for the culprit (I didn't want to compile with profiling enabled because my machine is somewhat slow) ... And then I realized that I had compiled the whole backend/access/heap directory with -O0. Recompiled, reran pgbench and now I see no measurable difference between pristine sources and my tree. That's fortunate at least. I still have to see how will the lock-spilling code perform.

1 older entry...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!