ER
Most of the evening was spent in the ER, because Max had burned
himself with a pair of tongs. As far as we can tell, he stuck them in
the flame on the stove, then touched them to his cheek. We spent over
two hours waiting, then the doctor looked at him for about 10
seconds. In any case, it was only a partial thickness burn, so with
some antibiotic cream it should heal up nicely.
Stamp idea from Bram
Bram gave me a really nice gift this afternoon: an
optimization for the bandwidth and storage consumed by stamps.
Basically, his idea starts with stamps of value 2^k, then lets
them get split in half up to k times.
In my previous thinking, a stamp contains some random bytes, an issuer
id, connection info for the issuer (ip address and port), and an
expiration time. When a node creates a new stamp, it uses /dev/random
or whatever to make the bytes. Then it stores the stamp in its own
database, and "issues" it to a peer, who may then trade it around the
net. Finally, it gets sent back to the issuer for "redeeming". At that
point, the issuer checks to see that the bytes are in the database
(and haven't been redeemed already by someone else).
Bram's idea is simple (one of those "why didn't I think of that
myself" ones). Stamp entropy is the root of a k-high tree. For each
node in the tree with entropy k, the left child is AES_x(0), and the
right child is AES_x(1). Now, instead of an opaque random block, you
have a tree id (can be random; is shared by all nodes in the tree),
a path (sequence of 0 and 1 bits to choose left and right nodes,
respectively), and the result of the hash.
It's a nice technique, and it'll almost certainly be in the prototype
when I finally release it. But what makes it a nice gift is that Bram
obviously understands what stamps are all about and has been thinking
about them.
Haiku licensing
Aaron Swartz has beautiful haiku explaining free
software licenses (there's another one linked, but it's not relevant
to free software):
MIT: take my code with you / and do whatever you want / but please don't blame me
LGPL: you can copy this / but make modified versions / free in source code form
GPL: if you use this code / you and your children's children / must make your source free
Aaron has placed these in the public domain. In other words:
I could even lie / and say I wrote it myself / the author cares not
Big companies
Somebody (I don't remember who, they've scrolled off the recentlog)
said "that large corporations and free software are interested in"
me. Sure, I can't help the fact that my altruism benefits these kinds
of entities in addition to real people, but the fact doesn't excite
me. When dealing with business entities, I prefer a business
relationship; I provide them with something of value, they pay me
money.
Free software licenses don't promote this goal, but dual-licensed GPL
libraries are consistent with it. I recommend the approach highly, and
plan to do most of my future work under it.
Graphs are not a precise model for software food chains
David McCusker writes:
Another exception related to performance occurs with
caching. A reverse proxy server depends on an origin web server for
content, and yet can serve that content faster than the origin web
site when either of two things occurs. First, the proxy might be on a
better network than the origin web site. And second, the proxy might
have content cached most directly in the form for serving.
Don't get me wrong, I'm not overly attached to my theory, but I'm not
sure this counts as an exception. Yes, the proxy depends on the origin
server for content, but it might also make sense to say that the
origin server depends on the proxy for efficient distribution of that
content. Here, I don't think a graph edge says enough about the
relationship.
Fault-tolerant techniques like RAID are a clear example, though. Here,
the dependency relationship is a clean edge. The disk is clearly
at a lower level than the RAID cluster, yet disk failure is
hidden. Disk bandwidth has a linear relationship with bandwidth at the
higher level, but the cluster has better bandwidth than the individual
disk. When it comes to latency, though, everything is hard-limited by
the disk.
There's another aspect in which simple graphs aren't a powerful enough
model. A simple Unix utility may have only two dependencies: to a
kernel, and to a C implementation. However, the kernel could be Linux
or BSD, and the compiler could be (say) gcc or lcc. So we have four
edges, but they're not all the same. The real structure is two groups
of two.
An analogous situation occurs in food webs. To a first approximation,
food sources are interchangeable. But again, there is structure.
Humans need both starch and protein. It doesn't matter whether the
starch is wheat or rice, but a diet of starch and meat is considerably
more complete than one of two starches.
So a graph is an interesting starting place, I think, but it is an
oversimplification. It would be very interesting to see whether people
trying to apply network theory to other domains are developing more
sophisticated models. I'm not aware of any.
Psyco vs Parrot
I feel that my last entry
was a bit unfair. After all, Psyco and Parrot are both
speculative projects with the potential to vastly improve the
performance of dynamic scripting langauges, but also with a
significant risk of failure.
Even so, my gut feeling is that Parrot has a solid chance of success,
while Psyco is much more speculative. Parrot is based on fairly
conservative ideas. There's a lot of literature on virtual machines,
and the Parrot people seem fluent in it. By contrast, the Psyco webpage and
distribution contain absolutely no references to any related work
(that I could find). Maybe it's just my academic background, but it
strikes me as a red flag.
In any case, I wish both the Parrot and Psyco teams huge success.
Having to choose between multiple high-performance dynamic languages
would be a good problem to have.