Older blog entries for Omnifarious (starting at number 152)

Help! DynDNS has become prohibitively expensive!

They want to charge me $40/yr per domain for secondary DNS! $40/yr! This is completely ridiculous. With the volume of lookups I get, I could probably host all the domains on my own server on a DSL line if I wanted.

Is anybody out there willing to provide secondary DNS for a few domains for me? I'm willing to cough up the equivalent of $10/yr in bitcoins for the service if you really want.

Syndicated 2011-06-10 23:48:36 from Lover of ideas

30 May 2011 (updated 30 May 2011 at 23:08 UTC) »

Session properties

I've been puzzling over a minimal and orthogonal set of properties for a session. I at first thought there were 3:

Message boundaries preserved
Whether or not your messages are delivered in discrete units, or whether they are delivered as a stream of bytes in which the original sizes of the send calls bear no relevance to how the bytes are chunked together on the other end.
Ordered
Whether or not data arrives in the order you sent it
Reliable
Well, this has a tricky definition. For TCP it means that failure to deliver is considered a failure of the underlying connection. But after such a failure you can't really be sure about exactly which bytes were delivered and which weren't.

But, as is evidenced by my description of 'reliable', these properties are not as hard-edged as they might seem. I also thought about latency, for example a connection via email is relatively high latency, and a connection between memory and the CPU is generally pretty low latency. But I'm looking for hard-edged, yes/no type properties that are in some sense fundamental. Latency seems like a property that's rather fuzzy. It exists on a continuum, and isn't really a defining feature of a connection, something that would drastically alter how you wrote programs that used the connection. In an object model, it would be an object property, not something you'd make a different class for.

But I find TCP's notion of 'reliability' very curious. It isn't really, in any sense, particularly reliable. I've had ssh connections that died, but when I reconnect to my screen session, I discover that a whole bunch of the stuff I was typing made it through, it just wasn't echoed back.

It also interacts with 'ordered' in an odd way. It might make sense to have an unordered connection that was 'reliable', but what does that really mean then? If it's a TCP notion of reliability, you could just deliver the last message and have the connection drop. Also, what would it mean to have an unreliable, but ordered connection? Would that mean you could send a bunch of messages and have only the first and last ones delivered? And would it make any sense at all to have an unordered, unreliable connection in which message boundaries were not preserved?

So I've come up with a different division...

Message boundaries preserved
Whether or not your messages are delivered in discrete units, or whether they are delivered as a stream of bytes in which the original sizes of the send calls bear no relevance to how the bytes are chunked together on the other end.
Ordered
Whether or not data arrives in the order you sent it
Must not drop
This means that if a message does not make it through, the connection is considered to be in an unrecoverable error state, and no further messages may be sent. Though you may not know which message didn't make it through.
Delivery notification
Whether or not you can know that a message made it to the other side or not.

These are not fully orthogonal. For example, if message boundaries are not preserved, then, in order for a connection to be in the least sensible, it must also have the 'ordered' and 'must not drop' properties. Also, if you must not drop messages, I'm not sure that it would then be sensible to have out-of-order delivery.

One of the rules of the system I'm designing is that any property that is not required may be provided anyway. This makes non-orthogonality much easier to deal with. So the prior cases aren't really a problem.

Can any of you think of a better set of properties, or important properties that I left out?

Some good discussion also happens in this Google Buzz post that mirrors this entry.

Syndicated 2011-05-30 12:48:35 (Updated 2011-05-30 22:55:50) from Lover of ideas

CAKE has reached a small milestone

CAKE reached a new milestone early this morning. It now successfully both generates and parses messages that use the new protocol. It also successfully detected a re-used session id. I also think the code that does this is also a lot better designed than the old code was. It's easier to see how to put it in the context of a larger system that implements a node that speaks the protocol

It's also much more extensively tested at a deeper level with tests that are designed to document the inner workings of the system.

Overall, it's in a much better state than I left it when I sort of stopped working on it much in 2004. And I'm going to handle the hard problems first, how to maintain the relationship between sessions and transports, and having two way realtime conversations between nodes. This rather than concentrating on the messages that will be traded back and forth at a higher level (which will be done using protobuf). That can come later, especially since I'm not likely to get it right the first time anyway.

I also need to think about getting nodes to participate in a DHT to share assertions (like how to reach a particular node) in a distributed way.

Lastly, the protocol has something of a problem with 'liveness' because I designed it with the idea of conversations being able to be initiated without any round trips. There are some mitigation for this problem in session ids, but that mitigation is somewhat problematic because it requires the recipient of a conversation initiation to keep track of some stuff for everybody who tries to talk to it.

I'm not really sure how to handle the 'liveness' problem though and still preserve the lack of round trips property. I could require that session ids contain an 'hour number' or something similar. Though that introduces a requirement for at least very coarse grain time synchronization for all nodes.

Syndicated 2011-03-28 16:10:59 (Updated 2011-03-28 16:11:22) from Lover of ideas

Interesting design problem with serialization and deserialization

I have been working on a serialization framework I'm happy with for Python. I want to be able to describe CAKE protocol messages clearly and succinctly. This will make it easier to tweak the messages without having to rip apart difficult to understand code. It will also make it easier to understand if I drop the project again and then come back to it years later, or if (by some miracle) someone else decides to help me with it.

Here is what I've come up with as the interface, along with one implementation fo that interface for a simple type:

class Serializer(object):
    """This is class is an abstract base class.  Derived classes, when
    instantiated, create objects that can serialize other objects of a
    particular type to a sequence of bytes, or alternately deserialize
    a sequence of bytes into an object of a particular type."""

    __slots__ = ('__weakref__',)

    def __init__(self):
        super(Serializer, self).__init__()

    def serialize(self, val):
        """x.serialize(value) -> b'serialized value'

        This is implemented in terms of serialize_iter by default.

        It is suggested that derived classes only implement serialize
        or serialize_iter and implement one in terms of the other."""
        if self.__class__ is Serializer:
            raise NotImplentedError("This is an abstract class.")
        return b''.join(x for x in self.serialize_iter(val))

    def serialize_iter(self, val):
        """x.serialize_iter(value) -> an iterator over the bytes
        sequences making p the seralized version of value."""
        if self.__class__ is Serializer:
            raise NotImplentedError("This is an abstract class.")
        return iter((self.serialize(val),))

    def deserialize(self, data, memo=None):
        """x.deserialize(data, [memo]) ->
        (value of the appropriate type, memoryview(remaining_data))

        data must be of type 'bytes', or 'memoryview'.  The memo must
        be a value extracted from a previous NotEnoughDataError.

        It is undefined what happens if you use memo and do not pass
        the same data (plus some possible extra data on the end) into
        deserialize that you originally passed in when you got the
        NotEnoughDataError you extracted the memo from.

        May raise a ParseError if there is a problem with the data.
        If the failure was because the parser ran out of data before
        parsing was finished, this is required to be a
        NotEnoughDataError."""
        return self._deserialize(data if not isinstance(data, bytes) \
                                     else memoryview(data),
                                 memo)

    def _deserialize(self, memview, memo=None):
        """x._deserialize(memoryview) ->
        (value of the appropriate type, memoryview(remaining_data))

        Exactly like deserialize, except a memoryview object is
        required.  deserialize is implemented in terms of
        _deserialize.  Derived classes are expected to override
        _deserialize."""
        raise NotImplentedError("This is an abstract class.")


class SmallInt(Serializer): """This class is for integers that are 8, 16, 32, or 64 bits long. They may be signed or unsigned. No other sizes are supported. >>> s = SmallInt(2, True) Traceback (most recent call last): ... ValueError: size is 2, must be 8, 16, 32 or 64 >>> s = SmallInt(8, True) >>> b = list(s.serialize_iter(5)) >>> b == [b'\\x05'] True >>> o = s.deserialize(b''.join(b)) >>> o = (o[0], o[1].tobytes()) >>> o == (5, b'') True >>> o = s.deserialize(b''.join(b) + b'z') >>> o = (o[0], o[1].tobytes()) >>> o == (5, b'z') True >>> s = SmallInt(8, True) >>> b = s.serialize(-5) >>> b == b'\\xfb' True >>> s = SmallInt(8, True) >>> s = s.serialize(128) Traceback (most recent call last): ... ValueError: 128 is out of range for an signed 8 bit integer >>> s = SmallInt(64, False) >>> b = s.serialize(2**64-1) >>> b == b'\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff' True >>> s = SmallInt(64, True) >>> b = s.serialize(-2**63) >>> b == b'\\x80\\x00\\x00\\x00\\x00\\x00\\x00\\x00' True """ _formats = dict(( ((8, True), '>b'), ((8, False), '>B'), ((16, True), '>h'), ((16, False), '>H'), ((32, True), '>i'), ((32, False), '>I'), ((64, True), '>q'), ((64, False), '>Q') )) __slots__ = ('_size', '_signed', '_low', '_high', '_format') def __init__(self, size, signed): if size not in (8, 16, 32, 64): raise ValueError("size is %d, must be 8, 16, 32 or 64" % (size,)) self._size = size self._signed = bool(signed) self._format = self._formats[(size, signed)] def serialize(self, value): if not isinstance(value, (int, long)): raise TypeError("%r must be an int or long" % (value,)) value = int(value) try: ret = _struct.pack(self._format, value) except _struct.error: raise ValueError("%d is out of range for an %ssigned %d bit " "integer" % (value, ("un" if not self._signed else ""), self._size)) return ret def _deserialize(self, memview, memo=None): numbytes = self._size // 8 if len(memview) < numbytes: raise _NotEnoughDataError((self._size // 8) - len(memview)) else: data = memview[0:numbytes].tobytes() remaining = memview[numbytes:] try: result = _struct.unpack(self._format, data)[0] return result, remaining except _struct.error as err: raise ParseErrror(err)

There is also a CompoundNumbered type for representing tuples. This allows you to represent structured messages with multiple fields. Here is example of how you might represent CAKE new session messages:

cake_newsess_v2 = _serial.CompoundNumbered(
    _serial.Count(), # Version
    _serial.Count(), # Type
    _serial.KeyName(), # Destination key
    _serial.KeyName(), # Source key
    _serial.SmallInt(64, False), # Session serial #
    _serial.CountDelimitedByteString(), # Encryption header
    _serial.CountDelimitedByteString(), # Signature.
    _serial.FixedLengthByteString(32) # Header HMAC
)

There is a problem though. The signature and header HMAC are supposed to be encrypted, but the deserializer can't know the key to use until it's decrypted the encryption header. This means that later parts of the deserialization process need to know about things from previous parts.

I have a way for the deserialization process to save state. This is used so that if deserialization throws a NotEnoughDataError because not enough data is available, the exception may have a memo field. This memo field can then be passed in again to resume close to where deserialization stopped. (Though now I'm sort of wondering if I shouldn't do something generator based instead...)

But this mechanism does not allow state to be passed forward from a previous deserializer to a new one. And this applies the other way around too. When serializing there is stuff that's not really a part of the data being serialized (like the current HMAC or encryption state) that needs to be known by serializer in order to serialize properly.

I'm thinking of adding an optional context parameter to the serialization and deserialization functions that's just an empty dictionary into which this sort of state can be stuffed. But this seems really messy. Can anybody think of any better ways to do this that are fairly general?

Syndicated 2011-02-02 22:46:39 (Updated 2011-02-02 23:03:41) from Lover of ideas

Protocol buffers?

I have a problem for which protocol buffers seem like a good solution, but I'm reluctant to use them. First, protocol buffers include facilities for handling the addition of new fields in the future. This adds a small amount to a typical protocol buffer message, but it's a facility I do not need.

Also, I feel the variable sized number encoding is less efficient than it could be, though this is a very minor issue. I also feel like I have a number of special purpose data types that are not adequately represented.

I'm also not completely pleased with the C++ and/or Python APIs. I think they contain too many googlisms. I would like to see public APIs published that were free of adherence to Google coding standards like do-nothing constructors and no exceptions.

I think, maybe, I will be using protocol buffers for some messages that are sent by applications using CAKE as a transport/session layer. These include some of the sub-protocols that are required to be implemented by a conforming CAKE implementation.

On a different note, I think Google's C++ coding standards are lowering the overall quality of Open Source C++ code. This isn't a huge effect, but it's there.

It happens because Google's good name is associated with a set of published standards for C++ coding that include advice that while possibly good for Google internally is of dubious quality as general purpose advice. It also happens because when Google releases code for their internal tools to the Open Source community, these tools follow Google's standards. And some of these standards have the effect of making it hard to use code that doesn't comply with those standards in conjunction with code that does.

Syndicated 2010-12-04 23:26:39 (Updated 2010-12-04 23:28:28) from Lover of ideas

Today's XKCD

Normally XKCD is amusing for very positive reasons. But I frequently feel a lot like the guy with the beard in this cartoon. It's really frustrating. So, today's XKCD is darkly amusing to me. Freedom is such a hard sell before people lose it. People choose convenience every time, frequently until it's almost too late to fix the problem all the while berating the people who were worried in the first place.

Infrastructures

Syndicated 2010-05-22 00:05:25 from Lover of ideas

19 May 2010 (updated 19 May 2010 at 07:10 UTC) »

Eben Moglen Tech Talk at Google

Eben Moglen is one of the principle lawyers behind the GPL. He's also a tireless free software advocate, and significantly more photogenic and diplomatic than Richard Stallman.

He recently gave this interesting tech talk at Google about the perception of Google by entities outside it. It was really well done, and struck a strong chord with me.

I've noticed that people frequently are incapable of believing that some things Google does are for the reasons Google says they're doing them. For example (and I don't really have the time to find references just now) many people seem to think that Google Doodles, those fun, timely modifications to their main search page, are a marketing tool, when in fact they are largely done purely out of whimsy.

I suppose, in one sense there is marketing purpose. Google is projecting their image of themselves out into the world. It's brand building. But, on the other hand, there isn't. I doubt that Google Doodles started as an idea for brand building in some marketing department. I'm betting some random small group of people decided one day that it would be fun to do, and the idea sort of caught on and now it's a tradition.

But people seem to want to analyze doodles for the marketing message they contain, despite the fact there generally isn't one. The more enigmatic the doodle is, the more determined people seem to be to find the marketing message in it.

This means there is a disparity in perception between people outside Google and people inside Google. One that might serve Google very poorly in the future. It's very important that Google understand this and respond appropriately. Perception is reality and people and organizations live up to expectations. Google risks becoming what people perceive them to be unless they act to correct that perception.

Google also frequently doesn't realize how the fact that they are so large and powerful affects people's perceptions of them. Witness the brouhaha over Buzz. Google did do some somewhat wrongheaded things in introducing it, but Buzz was not anywhere near the privacy destroying aggregator that people thought it was. And the fact that people perceived Buzz in this way seemed to mystify people inside Google, even though it was predictable given Google's size and people's perceptions.

Again, this points to a need by Google to better manage people's perceptions of them, and to manage their product releases better in terms of how people perceive them.

Eben Moglen suggests, quite wisely, that one thing Google could do is to change their policy on contributing internal changes back to Open Source projects. I think this is a good idea, but I doubt it will really be enough.

I am a little worried that if Google takes this advice to heart that they will grow a PR arm that does what every other PR arm in the world does, which is to try to make sure that perception stays far more positive than reality instead of simply trying to make perception match reality. But Google should do something, since I think people think far more ill of them than they generally deserve.

Google is, in fact, the only company I know of that has a revenue stream greater than 1 billion dollars a year that I actually have a positive opinion of.

Syndicated 2010-05-18 23:32:06 (Updated 2010-05-19 06:25:53) from Lover of ideas

The evils of Flash

This was a Slashdot comment, but I think it deserves a top level post here. It's in response to Apple’s attack on Adobe Flash, it’s all about online video NOT. (I added the 'NOT' because that's the author's conclusion.)

Pot calls kettle black, kettle complains, but it's just as black.

Flash is a despicable disgrace. Most of the time when I talk to a Flash developer, the thing they're the happiest about is the control they get over my computer. This is directly because the Flash player is a piece of garbage closed source tool that purposely caters to developers over end-users. The Open Source gnash (not ganash) player has an option to pause a Flash program. The Adobe player will never, ever end up with that option, ever. Giving me control over my own computer is against Adobe's best interest. That makes Adobe's Flash player is little more than a widely deployed trojan horse that, IMHO, is little better than spyware (Flash cookies anyone? Where's my control over those?).

I wouldn't complain so bitterly about this if the gnash player were actually a decent drop in replacement for the closed source Flash player, but it isn't. I have to either choose my freedom to have my computer do what I want instead of what some random corporation wants with Flash that is broken most of the time, or Flash that works while giving up my freedom. I will choose my freedom, thank you very much, but I will be bitter about the stupid choice I'm forced to make.

So, when one maker of a closed, proprietary platform that steals people's freedom purposely does things to the detriment of another closed proprietary platform that steals people's freedom, I can't help but cheer. And I hope Adobe finds a way to play nasty games with Apple too. The more these two companies can find ways to hurt eachother, the more the rest of us benefit.

If Adobe Open Sourced the Flash player (I could care less about the developer tools, they will end up with Open Source implementations no matter what Adobe does if the player is truly open) my objections to Flash would completely disappear. I could realistically choose a fully functional Flash player and I'm certain I could find one with a pause button, or one that refused to store cookies for longer than a week. I could make it myself if I wanted to.

And lest you tell me that I'm just whining, the majority of large sites out there no longer look right without Flash. By not using Flash, I'm cut off from a significant part of the experience of the web. I shouldn't be forced to give up control of my computer in order to browse the web. That's a completely and utterly ridiculous assertion.

Syndicated 2010-05-07 17:27:02 (Updated 2010-05-07 17:54:09) from Lover of ideas

Walking data structures

It's common programmer tech speak to talk about 'walking' data structures, meaning following all the pointers around to put all the data back together again. I think that 'brachiation' is a more apt metaphor, and fits well with the concept of 'code monkey'.

Syndicated 2010-03-30 16:49:53 (Updated 2010-03-30 16:50:11) from Lover of ideas

I hate perl

Case in point, the Net::IP module. The documentation looks nice. It handles IPv6 and IPv4 addresses. It looks clean and simple.

Then, I decided I would like to be able to have IPv4 mapped IPv6 addresses match the IPv4 address ranges I'm singling out for special treatment. So I look into its tool for extracting an IPv4 address from an IPv6 address.

The call, ip_get_embedded_ipv4 doesn't seem to work on IPv6 addresses created with 'new'. It only works on IPv6 addresses represented as strings. This leads me to dive into the implementation.

I discover that the is no coherent internal representation. Just a lot of different attributes that are used at different times for different purposes and are converted from one another as needed.

Additionally, there appears to be no way to import particular symbols of certain classes from the module. You have to import them using the import statements specified in the documentation or take your chances on whether or not it will work. This is because the import mechanism and which symbols are global or not is handled in a fairly ad-hoc sort of way and re-implemented in each module according to the whims of the author.

It's really quite surprising the module works at all. And I'm left feeling like I really ought to re-write it if I want something I can count on.

In reality, looking at the module's implementation was a mistake. This is always what happens to me when I look at a perl module. Either it works in a completely mysterious way using language mechanisms I've never seen used before, or it works in a way that's totally broken and practically guaranteed to break for any use that varies from the specific use-cases described in the documentation. Frequently both are the case. Aigh! Run away!

I hope I can convince my new workplace to stop using perl.

Syndicated 2010-03-18 18:14:05 (Updated 2010-03-18 18:16:08) from omnifarious

143 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!