Have you hugged your systems administrator lately?
Production vs. Development
I've been doing UNIX administration for quite a few years now, and I know a few other people who would call this their profession. I've been noticing, though, a rather drastic difference in attitude toward systems and users in some of them, and I think I've finally nailed down where that difference might be coming from. Fair warning: as usual, this is written basically as a stream-of-consciousness kind of thing, so I may or may not actually get to the point.
You have a few different priorities as a systems administrator, and your job is to juggle them using your best judgement (weighing business needs, available resources, personal time, etc). One of these priorities is production support; keeping the systems that run your business online and performing whatever task it is that they are supposed to be doing (in the case of an Internet-facing business, for instance, this could mean keeping the web, app, mail, and other servers running; for a development shop, it could mean keeping the build systems working smoothly). Inevitably, though, there is another audience that systems administrators are asked to deal with: users. In some cases, that means developers; in others, it means the sales team. In either case, as a technical lead, your job is to help them get their job done. Some administrators get lucky: they only have to deal with one environment or the other. In most shops, however, the administrative staff shares responsibilities. This sounds fine, until you consider the psychology of the two situations.
Production support generally involves saying no a lot. Change is bad; slow, methodical improvements are fine over time, but radical departures from what is known to work are frowned upon. Taken to the logical extreme, you end up with a heavy-weight change control process where even small system changes are scrutinized heavily. And to those who hate such things (and I'm right there with you), this is a good thing. Production systems are differentiated from others by the fact that the business lives and dies by them functioning correctly. By extension, things that effect those systems ought to be tightly controlled and monitored.
But development (or, in some environments, user) systems are a completely different thing. Joel Spolsky has talked about this kind of thing at length: the systems are a tool to help the users do useful work. If a development group needs a machine or two for testing, then find a couple somewhere. If a user needs access to the Internet to research a sales pitch they're working on, give it to them. This doesn't mean giving them everything they ask for; this means finding out what they need to do their jobs, and working toward that in a collaborative manner. The systems administrator provides the role of technology expert and facilitator in this arena.
Take someone who has done production support for years, and give them a development group to support, and Bad Things Happen. The admin can't understand why on earth they want to do things like arbitrarily shuffle 50GB files around the network, use various outdated tools that only support older insecure protocols, or talk on IRC or an instant messenger with people outside the company. They lack perspective: they don't see how that style of systems management affects real people trying to get their job done.
I won't even get into the mess that can happen if you try the reverse (a development-only systems administrator in a production role).
I'm finding that there are very few administrators who can successfully balance both mindsets. As geeks, we seek out uniformity because it regularizes our workloads, but you simply can't reconcile the two: to be successful at both, you have to treat both as the unique environments that they are. I've talked before about administrative geeks needing to step beyond their job title, and this is part of it: working with a mindset of "value to company" rather than the micro-view of what you happen to be doing today.
Anyone can say "no" all the time.
Systems Administrators != Programmers
I had a good reminder today of what Erik Naggum was talking about when speaking with a co-worker. He was trying to come up with a "good" solution to a Perl problem, so I had him describe it to me. It turns out that he has an object, which may or may not contain the attribute he's looking for. If it doesn't have it, the object has an attribute referring to another object. This other object, in turn, may or may not have the attribute he's looking for, and also has a reference to another object; this repeats until the "next object" is null.
Astute readers will have recognized this as a basic linked list.
I figure, okay, he hasn't been programming in a long time, so we'll mentally walk him through it. First, a quick nudge to see if he can remember his basic CS from a long time ago: "Why don't you write something that will just recurse through that list?" He nods, but the eyes, they don't have it. He goes away to think about it for a while, and then we wind up on a conference call in a meeting room. While the person we're calling is talking, he's quickly jotting down on the board:
O -> O -> O -> ...
I think, okay, he has the mental idea now, because he's diagramming it correctly. We talk a bit more, and he's obviously trying to come up with a way to iterate over the list. Once again, I suggest a simple recursive function (because that's natural to me, goddamnit, I don't care what you iterative programmers say), and quickly psuedocode something up for him:
function blah(o) if !o: return null if o->whatImLookingFor: return o->whatImLookingFor return blah(o->nextObject) found = blah(myList)
He looks at it, and appears puzzled. This appears to be a new construct to him, and he asks a few questions that confirm this. "But it'll just keep calling itself and looking at the same object all the time, won't it?" My first psuedocode example unfortunately used similar variable names in both the "mainline" and the function itself, but after renaming things, he still didn't seem to understand. Ah! Local vs. global scoping rules, maybe? So I suggest that, in Perl, you'd have something like my ($o) = @_; as the first line of that function. He still doesn't get it, and grunts something about just dumping the objects and grepping for what he's looking for. He just couldn't make himself look at the "CS" solution, because it didn't fit what he was expecting from Perl: a quick hack that did what he wanted without having to think too hard about it.
Systems administrators who spend to much time with "qwiky" languages like Perl are doomed to forget everything they ever knew about Computer Science. I'm convinced of this now. You might think I had this conversation with a newbie programmer or someone who whipped up scripts now and then, but you'd be wrong; this is a fellow who did software development professionally, and moved into systems administration later in his career. He's no idiot.
(I liken the problem to "l337 sp33k". I've known colleagues who could carry on a perfectly professional conversation face-to-face, but the moment they sit down at a keyboard, they immediately regress to "OMG r u 4 r34l?!" and, setting aside perceptions, they actually behave dumber while doing this. Perl has a similar effect on programmers; in the end, you end up with something that's a tangled web of spaghetti code and system() calls, getting the job done but disgusting anyone who has to look at it. Sort of like reading, "omg u pwn3d that bug, yo!". I wonder when someone will spec a programming language called "l337"...nevermind, Google to the rescue. Ye ghods.)
Postscript: yeah, I probably should have given him an iterative version of the pseudocode. Something like this:
function blah(o): while o: if o->whatImLookingFor: return o->whatImLookingFor o = o->nextObject return null
Just as simple, conceptually. Maybe some people have trouble wrapping their heads around recursion, but it's always seemed like a very straightforward idea to me.
If you can read this (and in some parts of the 'net, you probably can't), then you're reading it through our new network connection via Comcast. After a pretty nasty false start over the weekend, and the cutoff date for Dataflo coming up this weekend, I bit the bullet and cut things over this morning. DNS was set up such that NS records were available for both address assignments for a while now, as well as MX records pointing at both address pools, so the only thing that's really off the air right now in some places is the webserver, and that'll just have to wait. I also managed to break my backup MX along the way, so I'll need to poke at that a bit more and see what I fat-fingered.
Bah. I like renumbering about as much as I like moving.
So, some initial observations: latency is a touch higher than it was with Dataflo, surprisingly enough. First hop averages went from 2-3 ms on the old wireless link to 8-10 ms (with occasional spikes) on Comcast. On the other hand, I'm now seeing download rates around 6Mbps (vs. 1.5Mbps), which is a nice change of pace. Downstream speed is 768kbps now (vs. 1.5Mbps through Dataflo), which may prove to be a problem, but we'll see how it goes for now.
The biggest negative observation I've made so far is Comcast's complete lack of business-oriented customer service; they're obviously incredibly used to dealing with residential customers, and they've used that same contracted-out infrastructure to deal with their commercial customers as well. Phone support is excellent (when you finally get to them; getting a rep on the phone requires navigating about 10 levels of phone menus), but getting physical on-site service is a multi-day waiting game, with the typical "we'll be there sometime between 8:00 AM and 8:00 PM" kind of scheduling that you've come to expect from their residential service. There is no email interface to the helpdesk whatsoever; any kind of technical support inquiry MUST be made by phone, which is incredibly annoying when wanting to do something complicated with DNS. So far, I'm not impressed, but if I never have to deal with their customer service (much like when I was with Speakeasy, who rarely saw an unannounced service outage), then it won't matter to me. We'll see.
Dataflo or Datanoflo?
So, the day after I post a blog entry about being forced to switch to Comcast because of the lackluster service I've received with Dataflo, I have another rather significant service outage. There appeared to be problems last night, although I was having a hard time pinning the issue on Dataflo then. However, they definitely made a mid-day routing change today; at 12:34, all of a sudden, my backup MX starts receiving email for everything (the backup MX, in this case, is my router, which usually sees nothing but spam all day), and traceroutes to the static IP block they've assigned me are dropped on the floor the moment Cogent hands off to Dataflo.
It's not like I'm asking for a lot here. I need a service that stays up, and I'm paying a premium for that. That premium ought to include at least a bit of special attention to making sure the services provided to me are working when a major change is made, and perhaps major changes shouldn't be done in the middle of the workday, in the middle of the week, without any prior notification to the customer base of a potential outage. Screw the SLA; I shouldn't need an SLA to hold over their heads. What I really need is a service that stays up, with planned maintenance windows and some degree of customer notification.
I have no idea why I think Comcast will be any better, but at least they're half the price (actually, a third of the price, if you consider what my renewal contract pricing stated). In the event I get lousy service, at least I won't be crying about the money I'm wasting.
Ugh. I finally bit the bullet, and had Comcast "Workplace" service installed to replace the flakey fixed-point wireless service I've been fighting with for the past two years. To be fair, I should qualify that: my "last mile" link (between my rooftop transmitter and the tower) has been rock solid. The problem has always been mid-day routing and configuration changes on the part of the provider, which have knocked me offline anywhere from a few minutes now and then, to one incident that had me offline for the better part of a day. At this price and service level, that's completely unacceptable. So, since I can't get service from my preferred vendor here due to a complete lack of DSL availability in my area, I'm stuck with the local cable company: Comcast.
So, first impressions (note that I haven't moved ANY services to this yet, until I get a feel for how this will work): 6Mbps download speeds are unbelievably fun. The 768kbps upload speed may end up being a bit of a hindrance; most of my traffic is SMTP, but I do receive a fair bit of HTTP traffic too, and that's where that upload bottleneck is going to suck (specifically, both I and Erica maintain pretty large photo galleries that seem to get a bit of traffic, and I'll often host larger items for people for limited times). But, we'll have no idea how that'll go until I bite the bullet and cut stuff over; I'm hoping I can run fairly well off of both links for a while, with plenty of time for DNS updates to propagate.
Ugh. I remember how much I hate renumbering now.
Fedora Core 5
My initial impressions: more solid than I thought this release was going to be, given how much has changed. My biggest complaint so far: they upgraded OpenSSL to a new major version without providing a compat release for those of us upgrading. All we needed was an equivilent to the already-existing openssl097a package, which does nothing but package up libssl.so.5 and libcrypto.so.5, and upgrading would have been smooth as silk for me on one machine. Instead, I'm stuck rebuilding packages that I should have been able to leave alone until later. Bah.
More later when I've had a little more time to play with it. Playing with it on a desktop machine is proving difficult, as the only "play" machine I have right now running Fedora Core hangs the console whenever I fire up X. (Not too surprising, that machine is a bit of a hodgepodge of hardware.)
The action with the WRX was captured beautifully with my new toy: a Treo 650. It's a combination PDA and cellular phone that runs PalmOS, and with the data service I picked up with the package, the built-in web browser and email client are actually proving to be fairly handy. Definitely a big step up from the BlackBerry I had while I was at my previous job; the screen and form factor are a huge leap forward, and having a camera built-in (albeit pretty crappy) is a cute gimmicky toy. Battery life is at least as good too (actually, it looks like it's going to be quite a bit better; I should get two or three days out of a charge with my normal usage, which is pretty excessive compared to an average user). Phone and data service coverage with Sprint seems to be pretty good so far, and the data service is a LOT faster than the Nextel service I had the BlackBerry connected to. The SD media slot means this thing can also serve as a portable MP3/OGG/media player too; 2G of audio storage ain't bad. I can bolt the thing up to my desktop via bluetooth and infrared as well, which means one less cable to muck with. I'm pretty happy with it so far.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!