Interesting project, but I don't understand how to use it correctly with transactions.
I can see that you and/or Sean Chittenden put some thought into this - your README.pgmemcache suggests using an AFTER UPDATE trigger, which runs within the transaction making the change. It further suggests deleting the stale entry from the cache rather than replacing it because the latter is not "transaction safe". By that, I assume it means that if the transaction gets rolled back, the cache will incorrectly hold the new value.
But how do records get added? If they're added on retrieval (as would make sense for a cache), it seems like there's still a potential problem:
conn a conn b
------ ------
begin work;
begin work;
update;
(remove entry)
commit work;
select;
(add entry)
PostgreSQL has MVCC, so I think this schedule is possible, even at set transaction isolation level serializable. At the end, the cache is stale and there doesn't seem to be a mechanism to correct it.
The readme linked to this presentation (PDF) which suggests an ON COMMIT trigger. I think an AFTER COMMIT trigger could mostly prevent this from happening, provided that the add does nothing if there's already an entry in place. But it introduces new problems:
- if there's a transient network outage affecting the memcached server, the entry can't be replaced, and it's too late to fail the transaction.
- other connections may briefly see a stale entry. It will usually be quickly corrected, so it may not be a problem for many applications
- if memcached's LRU ejection happens between the replace and the conditional add, the stale entry will be added back in.
The only easy possible solution I see is to lock the row on any cache miss or update, keeping your current procedure otherwise. I guess that's not so bad really. No real performance hit, select ... for share instead of just select ....


