Older blog entries for avriettea (starting at number 210)

19 Aug 2007 (updated 22 Aug 2007 at 19:06 UTC) »

Teaching with emphasis


Computational Science Research Assistant Professor

The Computational Materials Science Center seeks a highly qualified computational scientist. The computational scientist will be responsible for design, implementation, and maintenance of data mining and knowledge discovery tools for chemical structure, chemical compounds and properties databases.

The ideal candidate will have an advanced degree in computer science or a Ph.D. in a chemistry-related discipline with significant computational experience, including machine-learning methods, database management and Web interfaces. Experience in cheminformatics, chemical database formats and chemical structure analysis is a plus.

Applications will be received continuously until the position is filled. Qualified candidates should send their CV containing a detailed description of their computational skills, relevant computational work done, list of publications and contact information for three references. Applications should be entered online at http://jobs.gmu.edu by selecting "Computational Materials Science Center" in the department menu.
The position is for two years. Salary will be commensurate with experience, but will not include benefits.
What the fuck, people? This position isn't going to pay more than $85,000 a year. In fact, that's probably the high end of the range, with $65,000 being the bottom. Yet, the position is for an assistant professorship. You're a lackey. For two years. With no benefits. They want somebody highly qualified, which is reasonable, given what they're doing, but they're asking for such a specific skill set that they can't possibly get anyone less than either a doctorate (they do suggest this) or twenty plus years in both chemistry and computer science. Somebody who's going to know Lisp, data architecture, probably filesystem mechanics, and who also understands the chemistry industry from an extremely technical point of view.

Are they looking for somebody retired? Are they looking for somebody who has all these skills but who, for some reason, is unable to pull down the $150k they'd make elsewhere? I really fail to see how anyone could want this position. I mean, sure, they'll probably do great stuff, but being a toady, losing your funding in two years, and "your" work actually being the work of the tenured prick who you actually work for.

They're a good university. I've said before, and I'm sure I'll say it again: I love teaching, but holy cow is the pay shit. The more I look for a teaching position these days, the more I also find that they have a wholly unrealistic impression of the candidate base (or they're raping grad students; equally possible), and they're not really interested in doing anything but rubbing their academic squishy bits against themselves.

They have so many positions that are assistants to assistants to the semi-provost of the director of human information definition center. I mean, shit that just makes my mind boggle. There are no, as far as I can see, positions that look like:


Instructor, Undergraduate, Programming

Masters degree or ten years industry experience preferred, in addition to pre-vetting by tenured staff of computer science department. Must be able to teach C, Java, and Lisp from provided materials. Additionally, incumbent will be expected to create curricula as required. Strong familiarity with Unix, Windows, and other operating systems required, as well as the ability to teach from any of the above platforms.

Certifications from professional organizations, such as the CISSP or RHCE, will be considered as qualifications and favored on submitted curricula vitae, however interviews with faculty and teaching ability will be given higher preference in hiring.


Now, that looks a lot more like an industry posting than one of these stupid academic postings, and I'm not really sure where the discrepancy comes from. Given it's a teaching position, I'd expect something like $62-97k, depending on experience, for the position. And it would be a full professorship, with tenure at ten or fifteen years. And for heavens' sake, fucking health insurance and life insurance for the new prof.

So, what in the hell is wrong with academia that they can't figure out how to hire people or even train them? We get a new MA or PhD or even just somebody with an AA, and it takes them four fucking years before they're worth a shit. And yet, academia wants more of the same academic fuckers that created the useless twits coming out of colleges today. Seems to me if academia started looking for the people that were, you know, already spun up, that they might be able to produce students who were more useful.

btw, hi Cheryl.

Syndicated 2007-08-19 21:27:00 (Updated 2007-08-22 18:41:49) from Alex J. Avriette

17 Aug 2007 (updated 10 Jun 2011 at 03:17 UTC) »

Look, ma! Backup superblocks!


Given it's 640GB and USB2, it churned for quite some time. Note, however, the "classic" backup superblock of 32. One wonders exactly what Disk Utility.app was thinking when it initialized this drive.

Using newfs(1) is probably not the recommended method per Apple, but it looks to me to be a lot safer than their idiotproof clickety interface from Disk Utility. I wish Apple would provide a "... so you're using a UNIX workstation" guide with their computers. You know, when SGI sold Octanes and Fuels and even the Indy's and Indigo's, they distributed both the "how to use from the front end" guide and the "how to use from Unix" guide. That is, you could manage your Octane from the gui, or you could go roboinst and friends.

There's no reason the two have to be exclusive. Even the developer tools (and we developers ostensibly are capable of using you know, newfs) don't include a comprehensive manual for the Unixy end of things. Sure, there's a manual included (see man(1)), but it's not only woefully lacking in places, it's wrong – dangerously wrong – in others. I might be willing to chalk this up to there being a large degree of churn in their Unix backend (e.g., the default shell changing, versions of perl, various netinfo tweaks, filesystem changes – hfs to hfs+case – and so on), but it's just amateur. When I write code (and this isn't to say I'm the standard by which Apple should measure their operating system), I write the documentation first (this is a technique I learned from a guy named saucepan on perlmonks) and make sure that the code fits the documentation and not the other way around. Now, I was doing this years and years ago, but I understand it's now a sort of design paradigm, so I'm not so special as I was back then. However, back in 2001, people looked at you like you were some kind of loon if you wrote docs before code because, well, how could you document something that simply didn't exist? Anyways, I digress substantially. The point here is that Apple could be writing the documentation for their systems and binaries as they are developing their systems and binaries, rather than just wholesale importing them from 4.2BSD (as in the case of newfs.)

Documentation is actually pretty cheap to produce, especially when compared to what it saves you down the line. In this case, I'm bitching about a flaw in their product. In other cases, I might decide that I am simply not going to buy an XServe for my next webapp project because their Unix support sucks. All they'd have to do to fix it is have each developer who is responsible for their neck of the binary woods (e.g., I'm the guy responsible for find(1), therefore I need to make sure it behaves the way the manpage says) sign off on the documentation. In the event they find a discrepancy, while this would seem like a pain in the ass to somebody who was naïve, it's actually a blessing. If you find a bug in your documentation, it probably means that your programmers are assuming that their APIs or binaries are going to act one way when they're really acting another. This is how we get the "grey screen of death" on the Mac.

So here's a question that's not addressed in the newfs manpage: does newfs create an ffs filesystem? Or maybe ufs? HFS? HFS+? Maybe HFS+ with case sensitivity? It's entirely unclear. My guess is that because tunefs(8) is completely hosed:

that newfs, while it claims to be creating ffs:


M. McKusick, W. Joy, S. Leffler, and R. Fabry, "A Fast File System for
UNIX,", ACM Transactions on Computer Systems 2, 3, pp 181-197, August
1984, (reprinted in the BSD System Manager's Manual).


it is actually creating UFS of some sort or other. Which sucks, since Apple's UFS support is so incredibly slow I'd just rather be flayed and fed to hyenas than use it anyways. It does look like, however, that by using newfs to create your filesystem, and mount(8) to actually, you know, mount them, that you can have backup superblocks, and you can tell your filesystem to not reserve X percent for root (do we need this on an iTunes volume?). We can even tell it the expected average file size, number of files in a directory, and all kinds of things. You might even start to think that, under the hood, thar be Unix. However, if it were Unix, it would frickin have an /etc/vfstab where I could actually set up mounts and options and things like that.

As it is, while my original idea of having all my iTunes media on a 400gb single file was, I think, nominally a good thing, the worst case scenario happened: either the disk tanked or the filsystem deposited a fecal patch to the device driver. At any rate, it resulted in taking nineteen hours to copy 400 gig because it had a hard time finding every file. At least it found them.

This time round, I'm going to keep them separate, but have two volumes, and set up an hourly rsync between them. Sure, it means the disk(s) thrash a lot, but I have redundancy, and I don't have to worry, as I did this time, about a single disk losing its mind and having it take 400gb of media with it.

So after all this dicking around with filesystems for two days, we get this:

There's no easy way to tell what sort of filesystems you've got mounted (mount(1) won't tell you, and neither will df(1)). But, df -T fstype will in fact tell you which filesystems of fstype you have mounted. Nevermind that a df -Ta might be useful, because we have the lovely perl construct,


perl -le 'print map { qq,FSTYPE: $_ $/,.qx, df -aignT $_, } map { /^([a-z]+)/; chop and $1 } qx{lsvfs}'


Which only sort of does what I want it to do, and is largely opaque to anyone but me. So, long story short, I've managed to reformat the drive that lost its mind (and redundant superblock), re-fill it with the goodies I want on it (Steve hates DRM, so none of the stuff I have has DRM; nooooo, it's all just, uh, protected for my own good), and I'm now using rsync to make sure that the primary volume – that is the one that the shitfairy blessed recently – has somewhere to leave data, should its coprogenic tendencies emerge again. Only now it's UFS instead of ffs or HFS or HFS+ so it will be slow as hell. Which is what I've come to expect from having a couple hundred gigs of stuff in iTunes anyways. It's just that before, I had iTunes to blame. Now I have to blame their stupid broken Unix. Argh.

And so the score:
Alex: 1
Apple: 1
Shitfairy: 2

Lastly, I could use a tiramisu. Anyone feeling generous, stop on by. All this stress gets my coffee liquor and mascarpone nerves wiggling.

Syndicated 2007-08-13 14:52:00 (Updated 2007-08-13 23:17:34) from Alex J. Avriette

How to work with IT recruiters

I occasionally spend time looking over the Google Analytics profile of this site. In general, there are three things people are interested in:

  • Futanari
  • Broken ribs
  • Recruiters ("rate my recruiter")
Now, I think the first one kind of goes without much explanation. The second one really surprised me. There are a lot of people out there with broken ribs from various accidents (usually automobile accidents from what I can tell) and they're not getting adequate treatment from their physicians. But that's not really what I want to get into today, as I discussed that at great length previously. No, today I am going to talk about recruiters, and specifically IT recruiters.

Let's start with the basics. I've been working as a consultant, largely through recruiters, for ten or fifteen years (fifteen years ago, recruiters weren't exactly the same sort of people they are today, so there's some ambiguity). I've worked with I think thirteen different "head hunters" (although I am surely forgetting some). More importantly, I have never (with two exceptions) gotten a job that I applied for. The recruiters come to me.

My first position with a recruiter, I drastically underbid myself. Instead of asking for money based upon what I was capable of doing, I asked for money based upon my pedigree. I'm not ashamed to say that I asked for $40,000 a year to hack perl for a defense organization. The hiring manager chuckled and offered me $48,000. I was ecstatic. However, this didn't work out because I was too young and my clearance didn't go through, so I was desperate when the next recruiter came along. They asked me what I wanted, and terrified, I said "35." They didn't even blink, and said okay. So we we went through the rest of the paperwork, W-4, I-9, and so on, and when it got down to how much I was making, they said, "so, we have you down as $35 an hour?"

Now, I choked, deep inside, as I had meant $35,000 a year. The difference between $35/hr and $35k is of corse a factor of 2,000: They had me making $70,000 a year. It took everything I had to not lose my composure, and I simply smiled and said that yes, that was fair.

Here's lesson number one when it comes to recruiters: always ask for more than you think you're worth. Chances are, they're willing to pay it. When I was making $35 an hour, that same headhunting company was charging the principal $60 an hour. Imagine if I'd just asked for $45/hr. I'd have been making $90,000 a year at 22 years old. So, if you have been making $50 an hour, or $75,000 a year, or whichever, take that number, and add a liberal increase to it. Pick a number that sounds just too high to be plausible, and almost without fail they will take it. I won't get into what I currently make, but for most people with solid skills, making $60-70 an hour shouldn't be hard.

The other thing you have to worry about with recruiters is the lying, slimy type of people they tend to have running the joint. If they want you to show up at their office, so they can show you their swanky view of the potomac or whichever, chances are there's not much to the business but lies and a trophy office. They'll promise you the world, tell you that they have five companies that want you, and you'll never hear from them again. So, lesson number two is trust a recruiter who sounds like he knows what he's talking about on the phone. The twenty-something chicks in low-cut halter tops in trophy offices are sure nice to look at, but they're so dense as to pronounce "BAE Systems" as "Bay Systems, what I like to call it." Run away. They are a complete waste of time. A good recruiter will take you out to lunch – of your choice – and ask you not only about your professional background, but also about your personal background, to make sure that you're a good fit for the team they're trying to put you in. If they're really good, they'll even tell you a bit about themselves, so you don't feel like you're being interrogated, and you can get a sense of who they are, and who their company are. These are real important things to know when it gets down to offer time and you're weighing numbers (salary/bene's) against perks (office, environment, challenges, etc). Meeting at their office is usually done, but if they're serious, they'll take you out somewhere informal so that you can interact more naturally.

Chances are if you are dealing with a lot of head hunters in email (because you're on dice/thingamajob/monster/etc), you will run across the email that looks like this:

Please list in number of years your experience with the following products:
Sun Solaris:
User Management:
Network File System version 3:
NIS:
Active Directory:
Secure Shell:

This should set off red flags for a number of reasons. First, the recruiter who is asking these questions has no knowledge whatsoever about what these technologies actually are. What they're doing is taking up all the "scores" they get, ranking them by the sum of the number of years, and then the ones at the top get the first interviews. The problem is, you or I or kermit the frog can lie on any of these and get the interview. In fact, most people can lie their way through "solaris experience" or any of the above. This makes lesson number three of recruiters if they can't actually engage you in conversation about technology and require number lists like this, they won't get you a job (or if they do, you won't want it), they're going to be a pain in the ass to work with because they're retarded, and they're wasting your time by having you quantify the number of years you've worked with something rather than qualify the actual level of skill you have in a given discipline.

This next one bothers me because it feels racist. However, the experience is true enough. I get a lot of recruiters from India. They get VOIP lines in places like Boston and New York, so that while you're actually talking to Ranesh in Hyderabad, it looks like you're talking to RTH Consulting in New York. These guys generally have their hearts in the right place, and I think the generally want to connect people who are looking for jobs with people who are looking for consultants. The problem is, they don't know anything about the technology industry here in the states. You'll get people who will ask you if you have worked with LAMP, and then subsequently ask you if you've worked with Linux or Apache. I can see how there might be some subtlety there, but it's not the only time they'll do that. So unfortunately, lesson four of working with recruiters is if the guy's name is Ranjesh, or his accent is so thick you can't distinguish SQL from "perl," you need to just tell him that it's not going to work out and move on. I wish there were a better way for those guys to get paid, but I think the industry is screwing them, and our playing along with them is wasting their time and screwing them even harder.

Lesson five of working with recruiters is that they will almost always want you to have a clearance of some sort. Frequently, having had a clearance, and recently, is good enough. Don't just assume that because they want TS/SCI with a Lifestyle Poly and Umbra that they won't hire you. It never hurts to ask.

And lastly, the last rule of working with recruiters is they will always try to screw you. They're going to skim anywhere from 20% to 65% on top of what you take home, and they work on commission.

I have worked with recruiters I like, and I'll list them here, as a) they'll like a reference from me (no I don't get paid) and b) they're always looking for good people.

There are a couple I'd avoid. In particular, The Computer Merchant, LTD, of Boston actually requested that I bill the client for hours in which I was not in the office. This sounds like a sweet deal initially, until you realize the enormous liability you have when the customer is the federal government.

Syndicated 2007-08-13 00:16:00 (Updated 2007-08-13 01:01:15) from Alex J. Avriette

The fscking problem with pretending Darwin is FreeBSD



from fsck(8) (note: No entry for fsck in section 1 of the manual)

  
-b Use the block specified immediately after the flag as the
super block for the filesystem. Block 32 is usually an
alternate super block.



The problem is that, of course, block 32 is no such thing. Furthermore, on some other filesystems, when you initialize them you essentially are able to tell it you wanted lots, or not a lot of superblocks. I don't have a running Linux machine here, or I'd show you. The point is, the next thing in the Darwin fsck manpage is thus:

  
-c Convert the filesystem to the specified level. Note that the
level of a filesystem can only be raised. There are cur-
rently four levels defined:

0 The filesystem is in the old (static table)
format.

1 The filesystem is in the new (dynamic table)
format.

2 The filesystem supports 32-bit uid's and gid's,
short symbolic links are stored in the inode,
and directories have an added field showing the
file type.

3 If maxcontig is greater than one, build the
free segment maps to aid in finding contiguous
sets of blocks. If maxcontig is equal to one,
delete any existing segment maps.



The problem herein is of course:

  
bling% sudo fsck -c 3 /dev/disk3s3
fsck: illegal option -- c
fsck: ? option?



And then there's the good old "block 32" fix:

  
bling% sudo fsck -b 32 /dev/disk3s3
Alternate super block location: 32
** /dev/rdisk3s3
BAD SUPER BLOCK: MAGIC NUMBER WRONG

LOOK FOR ALTERNATE SUPERBLOCKS? [yn] y

SEARCH FOR ALTERNATE SUPER-BLOCK FAILED. YOU MUST USE THE
-b OPTION TO FSCK TO SPECIFY THE LOCATION OF AN ALTERNATE
SUPER-BLOCK TO SUPPLY NEEDD INFORMATION; SEE fsck(8).



And of course, it tells me to check out fs(5) which is cool, except that we know that 5 is the programmer's section of the manual, and not the binaries section (or even the miscy or errata section). So while

  
#include <sys/types.h>
#include <ufs/fs.h>
#include <ufs/inode.h>



(yes, it says ufs because, uh, everyone on the mac uses ufs, right?) is useful to somebody, it's absolutely useless to somebody trying to figure out why the fuck their volume is toast. So I have now this silly mac Mini with four 250gb drives in a raid 0, backing up a single 640gb drive that contains all my music and other media so that I can then blow away the 640 drive and hopefully give it some better options from newfs. I don't think I can trust Disk Utility anymore.

Who QA's this shit, coming out of Apple? I mean, if people are building clusters on XServes and they're doing evil things with eight-core xeon towers, why is it we can't have a reasonable, robust filesystem with redundancy, journaling, and oh yeah, performance?

Even on OpenBSD when you had ffs and we liked it, or when we had to choose between jfs, ext2, ext2, or even XFS on Linux, we had reasonable assurances that our manual pages were correct, that our filesystems, while they may crap out, would crap out in ways that weren't "the crap fairy visited, she was pissed, and she didn't leave a note."

It's almost enough to make me think I oughtta get rid of the mini, build a shuttle pc with about a bazillion 1394 and usb2 ports, as well as an 802.11N card, and have it essentially be NAS for the media. Even run leenucks on it. At least then, when shit went splode, I'd know where to go looking. Not any of this "totally fucking wrong manpage" garbage.

Syndicated 2007-08-12 23:02:00 (Updated 2007-08-12 23:32:18) from Alex J. Avriette

8 Aug 2007 (updated 10 Jun 2011 at 03:24 UTC) »
7 Aug 2007 (updated 10 Jun 2011 at 03:25 UTC) »

the show goes on

I'm receiving ideas from myriad places containing the traditional wisdom of how to treat this sort of head injury. It's not a conventional injury, for one. It's got some features of damaged bits, but other parts seem to be fine. Still others are really worrysome, such as the intermittent hallucinations. It looks like I'll be doing inpatient stuf for a day or so (maybe 3-4) on an EKG and EEG, and get my MRI redone. It would be so nice if they found something. Of course, "something" is going to be like "You have a four centimeter mass on your prefrontal cortex. While it's probably operable, there's a margin of morbidity."

Fortunately for all concerned, it's never lupus. So, we're safe there.

Syndicated 2007-08-01 23:41:00 (Updated 2007-08-01 23:52:40) from Alex J. Avriette

The iTunes disk

I've been ruminating on this iTunes disk notion. We already have this silly iDisk thing which is a wrapper around webdav. At one point, we were able to mount ftp connections as volumes (although I think this has been turned off by default; It was a pain in the ass with the early copies of Safari, but I just don't use ftp anymore – or Safari, for that matter). Anyways, why not have an "iTunes Disk" functionality? It's really not that hard (in the sense that it's frequently easier to scale up and out code than to start from scratch. Sometimes it isn't. I'd bet, though, that the QA and vetting process for such a feature would take longer than the coding) as most of the code is written already.

Live preview and columns

The Finder has column display, so hierarchies are not hard to figure out. There's live previewing in the Finder already, both for small icons and "click-hovering" (in column display, tap the icon once, it shows up enormous in the rightmost column – you can actually play m4p and m4b files from here, if iTunes has the right credentials!). All they'd have to do is make it pretty (making it simple/elegant is not very hard, as the users are already using the finder, and the display in iTunes wouldn't change). The functionality is there.

Browsing through iTunes data, including rendering of data through iTunes tokens (so they're not decoupled)

What reason would they have for not extending this any further? There are other examples of this, such as the "burn folder" functionality. Furthermore, Microsoft has kind of taken the idea and run with it in Vista (and to a lesser extent, XP). There's a whole lot of stuff you can do in explorer.exe without really launching an application. Isn't this usually the other way around? I'm not saying Vista has a slicker-n-guano human interface, just that there are features that seem sensible to have which aren't in MacOS (to be fair, there's a lot of garbage I don't want in Windows).

What seems to be missing from the equation, though, is that performance-tuned backend. As we've now hit twenty times the storage of the original iPod, we're still using the same stuff. I wholeheartedly agree with the notion of having it be hacker-friendly, and XML is a reasonable way to do that. However, when you provide somebody with XML they have to either write their own lexer/parser or grab an API from somebody. Why not, as a couple people suggested to me, use something like sqlite? I wouldn't be having these irritated "seek...seek...seek" times between book parts, gapless cd's, and so on. Seems to me that SQL is fairly agnostic as an interface goes. Similar to XML's being an agnostic data exchange format. So what would be wrong with providing either APIs (I am sure they would make sure the snake is kitted out) or SQL hooks via their ODBC stuff, or even appleshare hooks. If you really, really want XML, there's no reason you couldn't have a select_as_xml() function built into the API. I suppose you could probably spit out an iTunes library on an interval, or when things were changed (changes tend to be not-often and come in groups) for applications that insist on frobbing the XML.

As I was mulling all this over, it occurred to me that it's not unlike the NeXT vision (that is, plugins to the finder, rather than heavy applications), and of course OS X is full of next relics. So why wouldn't they have done this already? I could write software that generated a library for iTunes to read, but I have no idea how to handle writes, which of course the software would have to do.

Syndicated 2007-07-31 15:37:00 (Updated 2007-07-31 16:43:57) from Alex J. Avriette

Getting serious about media


Apple has been courting the music devotee for quite some time. While some of us may think this goes back to, say, the first generation iPod, it of course goes much further. Step back even further and you see Altivec, which was essentially a means to render floats on the proc (because back then, graphics cards weren't the hyperactive reality-engine-in-a-chip that they are today). Further yet, we have the 'av' models (660av, 840av, 6100/8100av, etc), which were the company's first efforts at producing a real graphics machine for consumers (the other choice for the 'prosumer' market was spending a couple hundred grand on a deskside machine). We can even extrapolate and point out that Quicktime was a step in this direction, as well (and I had it on my SE/30, to give you an idea of how far back this goes).

And yet, Apple has no serious approach for storage. You can now get 2tb of storage in your Mac, built-to-order. In fact, as soon as the deal with Dell/Alienware expires, Hitachi will be selling you 1tb drives for your Mac, pushing that number up to 4tb, in the chassis alone.

What about the XServe RAID? The problem you'll encounter here is precisely the same problem you'll have with 4tb in your tower. Because Apple is taking a bottom-up approach, adapting consumer hardware to professional uses, they run into ugly issues like the OS (or the controller) sleeping the drives when they're not in use – even when they're part of a volume group. I have in front of me five 250GB firewire 800 drives. Unfortunately, even if I were to make a RAID 5 out of them, I expose myself to substantial risk of data corruption. I could instead go with RAID 0, but the problem there is of course that the risk I mitigate by switching from 5 to 0 is offset by the increased risk of failure due to reduced redundancy.

The other problem that bothers me is the absolute, glaring failure of Apple to actually support the two-percent (you could call this hyperparetotic if you like, although it might be more applicable to ask Benford for a reacharound) folks in the media market. Because their product lines encourage people to expand their storage needs at a rate much faster than the rest of the consumer market, and the baseline of what they consider normal (Apple's selling 80GB iPods, and I reckon we'll make it to 100 or 120gb before Apple changes the form factor in some way), that hyper-extended 2% (it becomes much more dilute if we extend out to the traditional 80/20 Pareto principle) will be consuming seemingly exponentially larger storage real estate, and are going to need novel ways to manage them. How plausible is this? Well, they've managed to get most of us carrying around accelerometers. How far away could it be that we begin to understand logical versus physical volumes and volume groups? (the bad news: iSCSI is already taken, they'll need a new product name)

We wouldn't need novel forms of storage if everyone understood how to manage an FCAL loop or could trouble themselves to memorize what RAID levels 0, 1, 5, 10, and 15 are. But, we do need that technology, and Apple is in a unique place to provide it to a segment of the market that doesn't have a problem spending a thousand or two more for a laptop, every year.

Let me change directions a bit here. It's very unusual to find a database that is greater in size than a terabyte. Further, the data contained therein is generally smaller than its footprint by a factor of six to ten. So it's fair to say that for most individuals – indeed most organizations – their data footprint is smaller than 250GB, and probably smaller than 100GB. However, when running through indices for data that large, when we want subsecond response times, most organizations that are serious about data (Oracle, SGI, and RHAT, for example) realize that the filesystem very much gets in the way.

Consider this. When I moved all my iTunes data off of the primary filesystem and into a logical partition therein, I essentially said to the operating system that I didn't care too much about the niceties of file systems. Instead, I wanted portability, scalability, and containment. But why not add to that performance? What are the needs of your average iTunes user? It seems like an obvious answer at first, but really, it's quite complicated.

  • Performance (prefetch; no gaps between gapless tracks or video segments)
  • Redundancy (durability; with "thousands of songs" in my pocket, at $1-$20 per each, I don't want them to disappear)
  • Containment (protection from commingling; Sandy's media should remain separate from mine)
  • Portability (the ability to move media from one machine or device to another; not necessarily the ability to duplicate or "share")
  • Scalability (reformatting or upgrading devices is inherently dangerous; I would like the ability to simply add storage when I run out of room, be it with a physical or logical device, or expanding a logical device)
But we don't have anything that addresses even one of these items from Apple. My current inventory looks to be about 30,000 music items (this includes iTunes U and music videos), 200 movies, 250 TV items, 300 podcasts, 200 book items (misleading, as most of them are split into 3-5 pieces). All told, it's about 300gb (incidentally, the library is about 3 million [logical] words, or about 60MB of XML... more on this in a minute). What I'm getting at is that all of these things have different needs.

Colons? Never! We're running Unix under the hood!

Apple has given us the ability to denote where a library lives. That's half the code you need to support multiple libraries. Of course, from that point, you're two-thirds of the way to defining kinds of libraries in multiple locations. I could specify ten gigs for gapless music, a hundred gigs for television, a hundred gigs for "normal" music, and twenty-five gigs for low-quality audio like iTunes U and podcasts. Because the hardware requirements of all four of those types of media are different, why not specify them in different places (e.g., on different physical volumes)? Moreover, why not specify them as separate libraries so that I could take parts with me and leave others at home? Do I really need to keep two hundred movies on me? Probably not. Of course, I can come up with smart playlists, but I can't manage them manually without spending substantial time on just that. It becomes necessary to have the software understand them in some way. Even extremely rudimentary functionality in this regard would be a significant improvement.

The other thing here is the filesystem getting in the way. Oracle and other database vendors get past this problem by having what they call "raw mode." Essentially, the database owns the physical disk, rather than having the operating system format it and manage it. Why would you want to do this, right? THe answer is simple. The database has its own set of users and advanced ackles. Why does it need to have the operating system managing permission on that disk? Just let "oracle" own it. Since the system administrator is making sure nobody's looking up Oracle's skirt, and Oracle is making sure that the data it's sending out is going to the right, vetted people, Oracle gets the benefit of not having to ask the filesystem for permission to do everything.

Consider the 4mb "song" versus the 750+mb movie or 350mb television show. Among songs, we have albums like Dark Side of the Moon, at 100mb, but with individual components ranging from 2MB to 20MB. DSOTM in particular is intended to be one single piece rather than ten individual pieces. So, when we're at 1.95MB on track A, we need to be reading .25MB into track B. This is governed by the filesystem. With RAIDs, we can set the "block size" to optimize this process. The notion is that for bigger files, we don't want to have to go back to the disk a bunch of times to read a file that's a gig in length. So we set the block size to very high numbers like 64MB or even higher. The corollary to this of course is that for very small files, we don't want to sit there waiting to read 64MB when the file itself is 1MB in length. In this case, we can set block sizes down as small as a few KB.

Apple is in a unique position to help these customers out. First, Apple has been culturing a userbase (in the "imma growin me sum pleghs" sense of the word culture, not as in "high"** culture) of people with enormous storage needs, who spend lots of money on their products, and whose storage needs are generally easily grouped into a few narrow categories. How hard would these things be to understand?

  • An "iTunes Disk" function in the preferences (or in Disk Utility.app), or "Let iTunes manage this disk".
  • A "multiple libraries" function under Advanced... .
  • Different types of media storage by location.
  • "Add volume to library" to extend the available logical space.
  • "Mirror my data here" for one-click redundancy.
  • iTunes Prosumer Edition (or iTunes Pro, or iTunes Enterprise Edition, etc).

So getting back to how this helps the consumer, and how it helps Apple, let me say that Apple's great strength is their ability to abstract away complicated ideas behind simple interfaces through the use of clever algorithms and other tricks of software and hardware. Of late, they've even had a problem bringing their employees up to speed, technically. Because they're not hiring PhD's to work at the local "genius bar," they have to not only explain technical concepts to their employees in a manner that is technical, but not too technical, but of course also explain those products in lay terms for their consumers.

It seems to me that Apple has the opportunity to mine their own customer base for new customers. That's pretty win-win. Of course, for every product Apple brings to market, they manage to fuck up two that are already available and abort three more for lack of management and vision.

Syndicated 2007-07-25 19:11:00 (Updated 2007-07-25 21:34:10) from Alex J. Avriette

201 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!