Older blog entries for nbm (starting at number 109)

Google I/O: Five random bytes from the Thursday Keynote

  1. In a particular period of time, 50% of the projects going live were "20% time projects"
  2. Occam's Razor applied to design - the simplest design is probably best
  3. Don't let the urgent drown out the important
  4. Have a healthy disrespect for "the impossible"
  5. The imagination is a muscle (which needs regular exercise to function well)

Syndicated 2008-05-29 22:57:23 from Cosmic Seriosity Balance

Google I/O: Google Doctype

The Hitchhikers Guide To The Web is what Google Doctype would be called if it could be called that, said Mark Pilgrim.  It's an encyclopedia of the open web - definitions and compatibility of various HTML, CSS, and JavaScript functionality, articles on how best to achieve certain effects on the web (and avoid pitfalls).

The goal of Doctype is to be the #1 hit for any search term about the open web.

The "open web" is the DOM, HTML, and CSS - "just about anything that you can do on Firefox on 64bit Linux" (ie, no Flash and Silverlight) (and not XUL either).

In scope are those "open web" topics, as well as security and accessibility concerns that affect the "open web" and the closed web features as one needs to know them when using the "open web" technologies.

The text content is licensed under Creative Commons attribution-only ("the good one"), and the code under the BSD license.

For offline use, you can use subversion to check it out from Google Code Project Hosting.

Progress so far (about two weeks) - 75 articles, 2800 pages of reference docmentation (ie, "background-color", "orange"), and 10k test cases.

Syndicated 2008-05-29 03:14:34 from Cosmic Seriosity Balance

Google I/O: Underneath the covers at Google

This session deserves a much longer post, but I just wanted to put down the most interesting stuff quick.  Basically, a back-end developers guide of how Google is put together - from how a request that someone does in a browser gets a response to how those responses are put together from multiple sources and how those sources are built up.

Everyone knows Google's love of lots of commodity hardware for their servers, but it was interesting to hear some other things - reasonably low-end networking gear too.  Otherwise, that they've back where they started in terms of machines without cases shoved into in-house-designed racks.  The scale has changed dramatically, of course.

"If you have 10k servers, expect to lose 10 a day..."

GFS's masters are same server hardware as slaves - take part in master election like any other machine.  Google puts "millions" of pages together in a GFS "file", since it uses 64MB chunks.  200+ clusters, many of them 1000s of machines, pools of 1000s of clients.  4+PB filesystems, 40GB/s read/write load (even while HW is failing constantly).

MapReduce usage within Google is growing fast - 700 new applications in a recent month at peak, currently around 10k applications.  From 171k MapReduce jobs in March 2006 to 2.2 million jobs in September 2007.  MapReduce is very optimised to keep jobs near the data they need to conserve precious network speed within the datacentre.

Google still has one large shared source base(!), from low-level libraries used by anything to domain-specific libraries to applications.  Benefits are that it's easy to find examples of usage of something so you can use it correctly, and to reuse (ie, as a library).  Drawbacks being that such reuse causes some fairly tangled dependencies.

Language usage at Google: C++ for all high-performance, commonly-accessed web stuff.  Java is used for less-performance-oriented and/or lower-volume applications.  Python is used behind the scenes for things like configuration, administration, &c.

Syndicated 2008-05-29 01:39:39 from Cosmic Seriosity Balance

Announcements from the Google I/O keynote, Google App Engine opens signups

Some interesting news was delivered during the Google I/O keynote.

In terms of Google App Engine, the announcement that got the biggest applause was that it was now open to all signups - no waiting list and a few tens of thousands of developers.

Beyond that, the two new APIs were announced - the memcache API and the Image API.

Some pricing expectations for usage beyond the free chunk given to you were given:

  • CPU: 5 million "average" page views free, 10-12c per core-hour thereafter
  • Storage: 500MB free, 15-18c per GB-month thereafter.
  • Incoming traffic: 5 million "average" page views, 11-13c/GB thereafter
  • Outgoing traffic: 5 million "average" page views, 9-11c/GB thereafter

The Google Web Toolkit 1.5 release candidate was released today, which brings Java 5 language features.

In terms of OpenSocial, the 0.8 version specification was released yesterday, and that AOL has joined the OpenSocial initiative.

Syndicated 2008-05-28 22:19:54 from Cosmic Seriosity Balance

Google I/O keynote - moving the web forward

The Google I/O keynote was entitled Client, Connectivity, and the Cloud.  The message was that "Google cares about moving the web forward".

The obvious question is why, and four reasons were given:

  • Google is a company that has only existed and could only have existed because of the web.  Thus, moving the web forward is improving what they can deliver.
  • It's a virtuous cycle - richer web apps can reach more users, and this means more usage, and this means more revenue.
  • More softly, since Google is a company of the web generation, and that's how the web was made - consensus and partnership.
  • Again softly, Google feels a debt of gratitude to the open source and the web community. 

How will they move the web forward (and, perhaps, how should the web be moved forward?).  The history of challenges and benefits of the historical and current model of computing was given:

The mainframe had a lot of power (for the time), but was not easy to get your hands on it.  Deployment was easy, since you installed your software on one computer and used dumb terminals to use it.

The PC meant accessibility, but meant less power.  It lost the ease of deployment because you had to support a variety of hardware, operating system, and applications/libraries.

The web brought the return of easy deployment with deployment on servers you control, and the use of the (relatively) dumb browser.  However, supporting scale in your application means needing "cloud computing" (perhaps a stretch for most current applications, but certainly true if you're aiming high), and the "accessibility" of clouds is currently not that great.

So, how should it move forward?

  • Making the client more powerful
  • Keeping connectivity pervasive
  • Making the cloud (ie, resources/power) more accessible 

In terms of Client, Google Gears was explained and demonstrated.  Basically, Gears is about extending the current browser to enable more rich applications.  It's not just offline/cacheability - Allen Hurff from MySpace showed off how asynchronous threads, SQL database and full-text search allow their mess age view to allow sorting and (more impressively) search without leaving the client.  The speed possible with this isn't what we expect from the web today.

In terms of Connectivity, Android was brought out.  What wasn't immediately obvious to me before, Android is a full stack for mobile phones, not just OS, framework, libraries, and so forth.  The example wasn't all that impressive, since only one phone was used, but the idea of multiple phones with different abilities having a consistent set of applications and interaction is compelling.

The Cloud discussion was perhaps of most interest to me, since it discussed Google App Engine.  It was described as a way of making aspects of the Google infrastructure available to developers (ie, not just "machines").  It is supposed to take care of all the problems you have outside of your own application - ie, not having to worry about setting up machines, installing the OS, maintaining the OS and applying security updates, logs, monitoring, and so forth.

The key goals of Google App Engine is to allow it to be easy to develop, easy to scale, and free to start.

Beyond or within these three main ways forward, three other projects were given attention: Google's GData APIs, Google Web Toolkit, and OpenSocial.

Syndicated 2008-05-28 22:06:40 from Cosmic Seriosity Balance

In Sunny San Francisco, at Google I/O

I won't say anything about the trip up, but I've been in San Francisco since Sunday afternoon (local time).  Monday was Memorial Day and the members of the SynthaSite team at the time I joined in November last year decided that we should get together for brunch, and then we headed out to see some sights around the area, taking a quick trip over the Golden Gate bridge and into Sausalito.

It was an overcast day, but I did get a glimpse of how beautiful the city can be.

Tuesday I pretended I had the capacity to do work and visited the SynthaSite offices in San Francisco.  Didn't get much done, but felt the office here in San Francisco shared a similar vibe to the one in Cape Town (although open plan feels weird after being in a three/four-person room for so long).

I've made it to Google I/O, which is way bigger than I expected it to be (and, according to one of the shirted staff-members, more than they originally expected too).  Hundreds of people were registering when I arrived, and by the time I got to the front of the A-B table (which was a good 75 people long), the A-B queue was longer than when I arrived.

Will try write something after each session, assuming the wireless works better than it does now...

Syndicated 2008-05-28 21:35:42 from Cosmic Seriosity Balance

In San Francisco next week, Google I/O, Sebastopol Pylons/WSGI Sprint

On Saturday, I'm heading off to San Francisco to attend Google I/O and also spend some time with my colleagues at SynthaSite in our US office.  Of most interest at the conference (at least in my personal capacity) is Google App Engine, but pretty much everything sounds interesting (with GWT being the big exception), and I can just imagine that making the decisions on what sessions to attend will be hard to do.  (And, you know, I guess I'm supposed to keep an eye out for things that might be useful to the company, or something...)

Over the weekend, I'll hopefully be heading to Sebastopol (in California Wine Country) for the Pylons/WSGI Sprint being held at O'Reilly Media's headquarters there.  There's two days of sprints, and I'm hoping to be there for most of both days - but it depends on travel arrangements.  If I get the time, I hope I can pop out and see a bit of the surrounding country and maybe one or two of those "places of interest".

In between the gatherings and travel, and before I head back, I'll spend time at the SynthaSite offices, doing what I'd generally be doing in Cape Town, but with better connectivity and less rainy cold winter.

If you want to catch me while I'm in San Francisco (or in London for the half-day I'll be there on the trip back) send me an email or leave a comment.

Syndicated 2008-05-22 23:03:13 from Cosmic Seriosity Balance

CTPUG in the Global Python Sprint weekend

On Saturday (May 10th) the Cape Town Python User Group held a Python Sprint meeting as part of the Global Python Sprint weekend.  8 or so of us got together on and off from 10:30am until about 9:30pm at the SynthaSite offices around a table and worked through 10 or so issues in the Python issue database.

Thanks to The Other Neil and Simon for most of the organisation effort, and to them and Adrianna, Russell, Jonathan, Jeremy, Brad, and David for coming through and taking part.

And thanks to SynthaSite for coffee, coke, crisps, chocolates, and other goodies.

According to The Other Neil, we worked on:

Syndicated 2008-05-12 15:28:46 from Cosmic Seriosity Balance

A team apart

For about two weeks, ending about two weeks ago, we had a full house of current employees at the SynthaSite offices in Cape Town - which has allowed everyone to get to know everyone else both at work and at play.  Over the past two weeks and continuing for another week or so, people have been heading back to the US office or heading to work from there for the known future.

The time together was great and necessary, and the time apart is necessary also, but it's hard to not want to see my new and old friends at the office.  The offices feel too quiet (although we've got new friends starting next week).

It is early days yet, but I know from previous experience how distance can allow one to treat people unfairly - it is easier to disappoint and easier to pretend to forget and easier to believe that the other is being stupid or lazy when you don't see each other regularly.  Yes, even geeks.

I'm quite interested in the challenge of making this not happen, and I'm hoping to see how our experiments in project management and communication and structure turn out.

I identified tools, process and people as our main strengths that will help us get through this new period, and then realised they were also our greatest challenges.  It's amazing how much your outlook can affect how you feel about a prospect like this.  If you start out, like I did, with "We've always been good with tools, but...", it leaves you feeling like you're entering a big unknown without much help.  But if you say "This might mean having to retool somewhat, but we've learned a lot about getting tools right", it makes you feel up for the fight.

I'll try write up my observations as they happen - although this recent three week break wasn't for lack of things to write but more for lack of the energy to write.  (I'll try catch up, but no promises...)

Syndicated 2008-04-30 08:56:55 from Cosmic Seriosity Balance

Traffic accounting with ulogd, by Stefano

When I first started at the Bandwith Barn, the traffic accounting that such an environment required just wasn't available off-the-shelf or in the open source world.  I've often been asked for the hacking combination of scripts and pmacct that maintain the Bandwidth Barn traffic system - which includes "buying" more monthly traffic, setting traffic limits per month per person, up-to-date graphs of usage per protocol and per client available to each company in the Barn, and months of historical data in case of queries or complaints about the billing.

Looks like ulogd, some iptables rules, and a few simple cronned SQL scripts make this a lot easier these days, thanks to this post about ulogd for bandwidth accounting by Stefano.

Syndicated 2008-04-07 13:49:14 from Cosmic Seriosity Balance

100 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!