Older blog entries for bagder (starting at number 734)

curling the metalink

Back in 2005 Anthony Bryan started to work with his metalink idea, as can be read in this early 2006 article. Very simplified, Metalink is a way to tell a client how download the same identical file from many places potentially in parallel. Anthony tells me he had the idea much earlier than so, going back to a bad experience trying to download a Fedora ISO from a download mirror…metalink_logo

Anthony’s and my discussions about metalink started in September 2006 and we’ve bounced countless of mails and ideas back and forth since then. Even more, we’ve become friends and we’ve worked together on several related subjects as well, including several Internet Drafts within the IETF.

We had a metalink discussion on the libcurl mailing list back in April 2008 about whether to have libcurl support it natively or not, but we (I) ended up with the conclusion that it wasn’t fit for libcurl. Basically because metalink is a layer on top of the application protocols that libcurl supports.

I wasn’t quite prepared at that time to accept the patches for the curl tool since I didn’t like all the XML stuff it would bring in and as I recall it I felt that I wasn’t prepared to deal with that extra work load at the time. I think I told the guys I wanted to wait and see and try it more at a later point.

In September that same year I blogged about Anthony’s work on getting an internet draft done for metalink. That would later in 2010 get released as RFC5854 and a year later RFC6249 came out with a way to provide all the info in HTTP headers instead of XML as the previous document was for. (Both RFCs contain acknowledgement to yours truly as contributor.)

Today

While I said metalink wasn’t really fit for libcurl, it was always fit for curl – the command line client that uses libcurl but is more of a transfer tool. During the spring 2012 Anthony and super-hacker Tatsuhiro Tsujikawa approached me and asked if perhaps we were ready for metalink in curl this time?

Yes!

Since the last time, metalink has developed as a standard and there’s now a libmetalink project to use and I felt it was a good time development wise as well. Tatsuhiro whipped up a refreshed patch in no time and soon we were polishing off the last little edges around the corners and the metalink patch set was merged into curl 7.27.0! Anthony’s and Tatsuhiro’s persistence and patience over the years are impressive. Thanks a lot my friends! That’s a little over five and a half years since the first approach until it got merged into the mainline sources. That’s nothing but pure dedication.

Usage

So, starting with curl 7.27.0 and assuming you built curl with the correct set of prereqs installed, this is how you use it:

curl --metalink [URL]

Where the URL is a URL that points to a metalink file, and then curl will download the file from one of the URLs mentioned. curl will at this point try them serially if there are multiple ones specified and not in parallel. Room for future improvements.

curl 7.27.0 will probably be released in the end of July 2012, but you can already get an early test version as a daily snapshot. We’ll appreciate all feedback you can give us!

Syndicated 2012-06-02 22:00:33 from daniel.haxx.se

300M users

Ok, so here’s a little ego game. The rules are very simple: try to figure out all things I’ve written code in (to any noticeable degree) and count how many users the products that use such code might have. Then estimate the total amount of humans that may in fact use my code from time to time.

I’ve been doing software both for fun and professionally for over 20 years (my first code I made available to others was written in 1986 on the C64). But as I look back on what I’ve done at my day job for all this time, most of my labour have been hidden into some sort of devices or equipment that never really were distributed to many customers. I don’t think I’ve ever done software professionally for consumer stuff. My open source code however has found its way into all sorts of things so I decided I could limit this count to open source code I’ve done. It is also slightly easier. Or perhaps less hard. And when it comes to open source, none of my other projects is as popular and widely used as curl. Counting curl users will drown all others.

First some basic stats: the curl.haxx.se web site gets more than 12000 unique visitors every weekday. curl packages are downloaded from there at a rate of roughly 1 million times/year. The site sends over 200GB of data every month. We have no idea how large share of users who get curl from the main site, but a guess is that it is far less than half of the user base. But of course the number of downloads says nothing about how many users there are.

Mac OS X ships with curl (and libcurl?) by default. There are perhaps 86 million macs in the world.

libcurl is used in television sets and Bluray players made by at least five major brands (LG, Panasonic, Philips, Sony and Toshiba). I’m convinced they don’t use it in all models but probably just a few of their higher end internet-connected ones. 10% of the total? It seems in 2009 there were 35 million flat panel TVs sold in the US with a forecast of the sales growing slightly over the years. I figure that would mean perhaps 100 million ones sold in the US the three last years possibly made by these brands (and lets assume that includes some blu-rays too), and lets say that is half the world market for them, it would make libcurl shipping in 20 million something TVs.

curl and libcurl are installed by default in some Linux distributions but not in all. In Debian it is an optional extra and the popcon overview shows perhaps 70% of Debian users install libcurl (and 56% use libssh2). Lets assume that’s a suitable average for all desktop Linux users. How many are we? Let’s for the sake of the argument say that 3% of all computers using the internet run Linux. Some numbers say there are 2.3 billion internet users. It would make 70 million Linux computers and thus 49 million libcurl installations. Roughly.

Open Office and the recent spin-off libreoffice are both using libcurl. Open Office said they have 100 million users now in May 2012.

Games: Second Life, Warhammer 40000, Ghost Recon, Need for speed world, Game Face and “Saints Row: The third” all use libcurl. The first game alone boasts over 20 million registered users. I couldn’t find any numbers for any other game I know uses libcurl.

Other embedded uses: libcurl and libssh2 are both announced as supported packages of Wind River Linux, the perhaps most dominant provider of embedded Linux and another leading provider is Montavista which also offers curl and libcurl. How many users? I have absolutely no idea. I’d say more than just a few, but how many? Impossible to tell so let’s ignore that possibly huge install base. Spotify uses or at least used libcurl, and early 2012 they had 15 million users.

libcurl is not installed by default in neither Android nor iOS, but is in WebOS and it is used by RIM for some (to me) unknown purposes. Lots of applications on Android and iOS still build and use libcurl, c-ares and libssh2 for their apps but it is just impossible to estimate how many users they get. Millions perhaps. I’d say no more than single-digit millions. Let’s call it 5 million.

Infrastructure. libcurl is used in the Tornado web server made by Friendfeed/Facebook and it is used by significant services at Yahoo.com. How many users of said services? Surely many millions. But really, that would be users of just 2 libcurl users so let’s not rush ahead and count those as direct users!

libcurl powers the very popular PHP/CURL extension that a large amount of PHP-running sites have enabled and use. How many? In 2008, 33% of all internet sites run PHP. Let’s say the share has decreased to 30% since then and the total amount of active sites is now 200M. That makes 60M PHP sites, and if there’s 10% of them using PHP/CURL we’re talking 6 million users.

Development. git, darcs, bazaar and Mercurial are all children of the distribution version control systems (some of them very popular) and they all use libcurl. How many users do they have? Since they’re all working on multiple platforms I would estimate the number of users of them collectively to be in the tens of millions range. Let’s say 10 million.

86 + 20 + 49 + 100 + 20 + 15 + 5 + 6 + 10 = 311 million users

300 million

And yes, of course a lot of these users will be the same actual human. But I may also just have counted all the numbers completely wrong to start with. I would say I’m probably within the correct magnitude!

300 million users out of the world’d 2.3 billion internet users. 1 out of 7 are using something that runs code I wrote. Kind of cool!

Sweden has a population of less than 10 million. 300 million is slightly less than the entire USA, twice the population of Russia or almost four times the population of Germany… As a comparison to some big browsers, a recent article claims Google Chrome has 200 million users in April 2012 which may be around 25% of the browser market and showing that basically none of the individual browsers have a lot more users than 300 million…

Of course I know that every single person who reads this is a knowing or unknowing user… Can you think of any other major users?

Syndicated 2012-05-16 14:11:33 from daniel.haxx.se

shorter HTTP requests for curl

Starting in curl 7.26.0 (due to be released at the end of May 2012), we will shrink the User-agent: header that curl sends by default in HTTP(S) requests to something much shorter! I suspect that this will raise some eyebrows out there so even though I’ve emailed about it to the curl-users list before I thought I’d better write it up and elaborate.

A default ‘curl localhost’ on Debian Linux makes 170 bytes get sent in that single request:

GET / HTTP/1.1
User-Agent: curl/7.24.0 (i486-pc-linux-gnu) libcurl/7.24.0 OpenSSL/1.0.0g zlib/1.2.6 libidn/1.23 libssh2/1.2.8 librtmp/2.3
Host: localhost
Accept: */*

As you can see, the user-agent description takes up a large portion of that request, and this for really no good reason at all. Without sacrificing any functionality I shrunk the same request down to 71 bytes:

GET / HTTP/1.1
User-Agent: curl/7.24.0
Host: localhost
Accept: */*

That means we shrunk it down to 41% of the original size. I’ll admit the example is a bit extreme and most other normal use cases will use longer host names and longer paths, but even for a URL like “http://daniel.haxx.se/docs/curl-vs-wget.html” we’re down to 50% of the original request size (100 vs 199).

Can we shrink it even more? Sure, we could leave out the version number too. I left it in there now only to allow some kind of statistics to get extracted. We can’t remote the entire header, we need to include a user-agent in requests since there are too many servers who won’t function properly otherwise.

And before anyone asks: this change is only for the curl command line tool and not for libcurl, the library. libcurl does in fact not send any user-agent at all by default…

Syndicated 2012-05-12 20:22:08 from daniel.haxx.se

NFS has many meanings

Today I learned that Need for speed World (I first had to google what “NFS-world” acutally means) uses curl when I received this email:

From: [removed]
Subject: NFS-world

I can not go into the game for 4 months my nickname ”[removed]“. it writes the error “Login failed, please try again.” Please solve this problem. Support Group does not help.

But no, I don’t know why this guy emailed me…

Syndicated 2012-05-10 19:57:38 from daniel.haxx.se

Digging the fiber

Finally the installation of my open fiber is moving along.

Roughly two weeks ago the team responsible for getting the thing from the boundary of my estate to my house arrived. They spent a great deal of time trying to piggyback the existing tube already running under my driveway for the telephone cable – until they gave up and had to use their shovels to dig a ditch through my garden. Apparently the existing tube was too tight and already too filled up with the existing cables. A little strike of bad luck I think since now they instead had to make a mess of my garden. Here’s a little picture of the dig work they did:

a ditch for the fiber through the garden

They aim at a depth of 25 cm for the cable while going through people’s estate, while outside of my garden they need 50 cm depth underneath the road and sidewalk down my little suburb street.

Once they were done we could see this orange cable sticking up next to my mailbox:

the outer end of the cable by my mailbox

… and the other end is sticking up here next to my front door. I expect the next team to get here and do the installation from here and pull it in through my wall and install the media converter etc possibly in the closet next to my front door. We’ll see…

the end of the cable next to the stairs by my front door

Today, when I arrived home after work the team that were digging up the sidewalk had already connected the cable side that was previously sticking out next to my mailbox (the middle picture).

Of course, they did their best at putting things (like soil) back as it was but I’ll admit that my better half used some rather colorful expressions to describe her sentiments about getting the garden remade like this.

I’ll get back with more reports later on when I get things installed internally and when the garden starts to repair.

Syndicated 2012-05-02 19:24:08 from daniel.haxx.se

Back to China

As the plan is currently, I’m going to Beijing China the last week in May for work. It’s now been something like 4.5 years since I was in China the last time, and I’m really looking forward to see how things have changed. This time I expect to get a slightly different insight as well since I’ll be visiting and talking to a bunch of Chinese employers of my customer.

ma dao cheng gong

This picture is hanging in my house, and apparently means “gain an immediate victory“, as I was told

Also, this brings back the chance for me to show you all the picture of this awesome power socket we had in our hotel room the last time, allowing basically any plug to get inserted:

chinese-socket

In comparison to Jordan where I recently spent a week vacationing, where my hotel room had the British style of sockets, but in other places in the same (fancy) hotel they had euro plugs…

Syndicated 2012-04-20 22:01:19 from daniel.haxx.se

Linux kernel code on TV

In one of the fast-moving early scenes in episode 16 of Person of Interest at roughly 2:05 into the thing I caught this snapshot:

person of interest s01e16

(click the image to see a slightly bigger version)

It is only in sight for a fraction of a second. What is seen in the very narrow terminal screen on the right is source code scrolling by. Which source code you say? Take a look again. That my friends is kernel/groups.c from around line 30 in a recent Linux kernel. I bet that source file never had so many viewers before, although perhaps not that many actually appreciated this insight! ;-)

And before anyone asks: no, there’s absolutely no point or relevance in showing this source code in this section. It is just a way for the guys to look techy. And to be fair, in my mind kernel code is fairly techy!

Syndicated 2012-03-19 23:14:34 from daniel.haxx.se

No summer of Rockbox 2012

For the first summer in many years I’m not doing any admin or mentor work for an organization for Google’s Summer of Code program this year.

I’ve been mentoring, co-mentoring and admined within the Rockbox project the last… 4-5(?) summers and as a result I now have a good collection of t-shirts. :-) This year, the project sadly came to the conclusion that there was not a good enough number of mentors and projects ideas gathered for it to apply to become a mentor organization.

Taking care of a student for full-time work during many weeks is not something to take lightly. To do it properly you need a dedicated and qualified mentor. To provide a good starting point for students to figure out and come up with a good project proposal you need an really good and detailed list of ideas.

The gsoc task is hard enough as it is with many mentors and many good ideas, so when there’s a sign of us not being able to fill up both lists we thought it better not to waste anyone’s’ time or energy. We also value and treasure Google’s very fine help with open source over the years thanks to gsoc, and we would hate to end up looking like we try to just take advantage of our role of having been accepted as mentor organization for many years in a row in the past.

In the other end, I was very happy to see that my friends in the metalink project finally after having applied many years got accepted as a mentor organization. I’d like to think that perhaps we (as in the Rockbox project) by standing back this year can let others get the chance to shine and join in the fun.

There is nothing said or planned for Rockbox for next year. If people want to mentor and if we manage to get a good pile of ideas I’m sure we will apply to be a mentor organization again. If not, well then I’m sure other organizations will still participate in the program and possibly I will find myself involved in there via another project. I am involved in a bunch of other open source projects, but none of the ones I’m very active in have applied nor participated as mentor org in gsoc so far.

Syndicated 2012-03-18 14:59:23 from daniel.haxx.se

Travel for fun or profit

As a protocol geek I love working in my open source projects curl, libssh2, c-ares and spindly. I also participate in a few related IETF working groups around these protocols, and perhaps primarily I enjoy the HTTPbis crowd.

Meanwhile, I’m a consultant during the day and most of my projects and assignments involve embedded systems and primarily embedded Linux. The protocol part of my life tends to be left to get practiced during my “copious” amount of spare time – you know that time after your work, after you’ve spent time with your family and played with your kids and done the things you need to do at home to keep the household in a decent shape. That time when the rest of the family has gone to bed and you should too but if you did when would you ever get time to do that fun things you really want to do?

IETF has these great gatherings every now and then and they’re awesome places to just drown in protocol mumbo jumbo for several days. They’re being hosted by various cities all over the world so often I deem them too far away or too awkward to go to, also a lot because I rarely have any direct monetary gain or compensation for going but rather I’d have to do it as a vacation and pay for it myself.

IETF 83 is going to be held in Paris during March 25-30 and it is close enough for me to want to go and HTTPbis and a few other interesting work groups are having scheduled meetings. I really considered going, at least to meet up with HTTP friends.

Something very rare instead happened that prevents me from going there! My customer (for whom I work full-time since about six months and shall remain nameless for now) asked me to join their team and go visit the large embedded conference ESC in San Jose, California in the exact same week! It really wasn’ t a hard choice for me, since this is my job and being asked to do something because I’m wanted is a nice feeling and position – and they’re paying me to go there. It will also be my first time in California even though I guess I won’t get time to actually see much of it.

I hope to write a follow-up post later on about what I’m currently working with, once it has gone public.

Syndicated 2012-03-15 13:15:05 from daniel.haxx.se

The updated web scraping howto

webbots-spiders-and-screen-scrapers

Web scraping is a practice that is basically as old as the web. The desire to extract contents or to machine- generate things from what perhaps was primarily intended to be presented to a browser and to humans pops up all the time.

When I first created the first tool that would later turn into curl back in 1997, it was for the purpose of scraping. When I added more protocols beyond the initial HTTP support it too was to extend its abilities to “scrape” contents for me.

I’ve not (yet!) met Michael Schrenk in person, although I’ve communicated with him back and forth over the years and back in 2007 I got a copy of his book Webbots, Spiders and Screen Scrapers in its 1st edition. Already then I liked it to the extent that I posted this positive little review on the curl-and-php mailing list saying:

this book is a rare exception and previously unmatched to my knowledge in how it covers PHP/CURL. It explains to great details on how to write web clients using PHP/CURL, what pitfalls there are, how to make your code behave well and much more.

Fast-forward to the year 2011. I was contacted by Mike and his publisher at Nostarch, and I was asked to review the book with special regards to protocol facts and curl usage. I didn’t hesitate but gladly accepted as I liked the first edition already and I believe an updated version could be useful to people.

Now, in the early 2012 Mike’s efforts have turned out into a finished second edition of his book. With updated contents and a couple of new chapters, it is refreshed and extended. The web has changed since 2007 and so has this book! I hope that my contributions didn’t only annoy Mike but possibly I helped a little bit to make it even more accurate than the original version. If you find technical or factual errors in this edition, don’t feel shy to tell me (and Mike of course) about them!

Syndicated 2012-03-06 18:41:57 from daniel.haxx.se

725 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!