Older blog entries for bagder (starting at number 732)

shorter HTTP requests for curl

Starting in curl 7.26.0 (due to be released at the end of May 2012), we will shrink the User-agent: header that curl sends by default in HTTP(S) requests to something much shorter! I suspect that this will raise some eyebrows out there so even though I’ve emailed about it to the curl-users list before I thought I’d better write it up and elaborate.

A default ‘curl localhost’ on Debian Linux makes 170 bytes get sent in that single request:

GET / HTTP/1.1
User-Agent: curl/7.24.0 (i486-pc-linux-gnu) libcurl/7.24.0 OpenSSL/1.0.0g zlib/1.2.6 libidn/1.23 libssh2/1.2.8 librtmp/2.3
Host: localhost
Accept: */*

As you can see, the user-agent description takes up a large portion of that request, and this for really no good reason at all. Without sacrificing any functionality I shrunk the same request down to 71 bytes:

GET / HTTP/1.1
User-Agent: curl/7.24.0
Host: localhost
Accept: */*

That means we shrunk it down to 41% of the original size. I’ll admit the example is a bit extreme and most other normal use cases will use longer host names and longer paths, but even for a URL like “http://daniel.haxx.se/docs/curl-vs-wget.html” we’re down to 50% of the original request size (100 vs 199).

Can we shrink it even more? Sure, we could leave out the version number too. I left it in there now only to allow some kind of statistics to get extracted. We can’t remote the entire header, we need to include a user-agent in requests since there are too many servers who won’t function properly otherwise.

And before anyone asks: this change is only for the curl command line tool and not for libcurl, the library. libcurl does in fact not send any user-agent at all by default…

Syndicated 2012-05-12 20:22:08 from daniel.haxx.se

NFS has many meanings

Today I learned that Need for speed World (I first had to google what “NFS-world” acutally means) uses curl when I received this email:

From: [removed]
Subject: NFS-world

I can not go into the game for 4 months my nickname ”[removed]“. it writes the error “Login failed, please try again.” Please solve this problem. Support Group does not help.

But no, I don’t know why this guy emailed me…

Syndicated 2012-05-10 19:57:38 from daniel.haxx.se

Digging the fiber

Finally the installation of my open fiber is moving along.

Roughly two weeks ago the team responsible for getting the thing from the boundary of my estate to my house arrived. They spent a great deal of time trying to piggyback the existing tube already running under my driveway for the telephone cable – until they gave up and had to use their shovels to dig a ditch through my garden. Apparently the existing tube was too tight and already too filled up with the existing cables. A little strike of bad luck I think since now they instead had to make a mess of my garden. Here’s a little picture of the dig work they did:

a ditch for the fiber through the garden

They aim at a depth of 25 cm for the cable while going through people’s estate, while outside of my garden they need 50 cm depth underneath the road and sidewalk down my little suburb street.

Once they were done we could see this orange cable sticking up next to my mailbox:

the outer end of the cable by my mailbox

… and the other end is sticking up here next to my front door. I expect the next team to get here and do the installation from here and pull it in through my wall and install the media converter etc possibly in the closet next to my front door. We’ll see…

the end of the cable next to the stairs by my front door

Today, when I arrived home after work the team that were digging up the sidewalk had already connected the cable side that was previously sticking out next to my mailbox (the middle picture).

Of course, they did their best at putting things (like soil) back as it was but I’ll admit that my better half used some rather colorful expressions to describe her sentiments about getting the garden remade like this.

I’ll get back with more reports later on when I get things installed internally and when the garden starts to repair.

Syndicated 2012-05-02 19:24:08 from daniel.haxx.se

Back to China

As the plan is currently, I’m going to Beijing China the last week in May for work. It’s now been something like 4.5 years since I was in China the last time, and I’m really looking forward to see how things have changed. This time I expect to get a slightly different insight as well since I’ll be visiting and talking to a bunch of Chinese employers of my customer.

ma dao cheng gong

This picture is hanging in my house, and apparently means “gain an immediate victory“, as I was told

Also, this brings back the chance for me to show you all the picture of this awesome power socket we had in our hotel room the last time, allowing basically any plug to get inserted:

chinese-socket

In comparison to Jordan where I recently spent a week vacationing, where my hotel room had the British style of sockets, but in other places in the same (fancy) hotel they had euro plugs…

Syndicated 2012-04-20 22:01:19 from daniel.haxx.se

Linux kernel code on TV

In one of the fast-moving early scenes in episode 16 of Person of Interest at roughly 2:05 into the thing I caught this snapshot:

person of interest s01e16

(click the image to see a slightly bigger version)

It is only in sight for a fraction of a second. What is seen in the very narrow terminal screen on the right is source code scrolling by. Which source code you say? Take a look again. That my friends is kernel/groups.c from around line 30 in a recent Linux kernel. I bet that source file never had so many viewers before, although perhaps not that many actually appreciated this insight! ;-)

And before anyone asks: no, there’s absolutely no point or relevance in showing this source code in this section. It is just a way for the guys to look techy. And to be fair, in my mind kernel code is fairly techy!

Syndicated 2012-03-19 23:14:34 from daniel.haxx.se

No summer of Rockbox 2012

For the first summer in many years I’m not doing any admin or mentor work for an organization for Google’s Summer of Code program this year.

I’ve been mentoring, co-mentoring and admined within the Rockbox project the last… 4-5(?) summers and as a result I now have a good collection of t-shirts. :-) This year, the project sadly came to the conclusion that there was not a good enough number of mentors and projects ideas gathered for it to apply to become a mentor organization.

Taking care of a student for full-time work during many weeks is not something to take lightly. To do it properly you need a dedicated and qualified mentor. To provide a good starting point for students to figure out and come up with a good project proposal you need an really good and detailed list of ideas.

The gsoc task is hard enough as it is with many mentors and many good ideas, so when there’s a sign of us not being able to fill up both lists we thought it better not to waste anyone’s’ time or energy. We also value and treasure Google’s very fine help with open source over the years thanks to gsoc, and we would hate to end up looking like we try to just take advantage of our role of having been accepted as mentor organization for many years in a row in the past.

In the other end, I was very happy to see that my friends in the metalink project finally after having applied many years got accepted as a mentor organization. I’d like to think that perhaps we (as in the Rockbox project) by standing back this year can let others get the chance to shine and join in the fun.

There is nothing said or planned for Rockbox for next year. If people want to mentor and if we manage to get a good pile of ideas I’m sure we will apply to be a mentor organization again. If not, well then I’m sure other organizations will still participate in the program and possibly I will find myself involved in there via another project. I am involved in a bunch of other open source projects, but none of the ones I’m very active in have applied nor participated as mentor org in gsoc so far.

Syndicated 2012-03-18 14:59:23 from daniel.haxx.se

Travel for fun or profit

As a protocol geek I love working in my open source projects curl, libssh2, c-ares and spindly. I also participate in a few related IETF working groups around these protocols, and perhaps primarily I enjoy the HTTPbis crowd.

Meanwhile, I’m a consultant during the day and most of my projects and assignments involve embedded systems and primarily embedded Linux. The protocol part of my life tends to be left to get practiced during my “copious” amount of spare time – you know that time after your work, after you’ve spent time with your family and played with your kids and done the things you need to do at home to keep the household in a decent shape. That time when the rest of the family has gone to bed and you should too but if you did when would you ever get time to do that fun things you really want to do?

IETF has these great gatherings every now and then and they’re awesome places to just drown in protocol mumbo jumbo for several days. They’re being hosted by various cities all over the world so often I deem them too far away or too awkward to go to, also a lot because I rarely have any direct monetary gain or compensation for going but rather I’d have to do it as a vacation and pay for it myself.

IETF 83 is going to be held in Paris during March 25-30 and it is close enough for me to want to go and HTTPbis and a few other interesting work groups are having scheduled meetings. I really considered going, at least to meet up with HTTP friends.

Something very rare instead happened that prevents me from going there! My customer (for whom I work full-time since about six months and shall remain nameless for now) asked me to join their team and go visit the large embedded conference ESC in San Jose, California in the exact same week! It really wasn’ t a hard choice for me, since this is my job and being asked to do something because I’m wanted is a nice feeling and position – and they’re paying me to go there. It will also be my first time in California even though I guess I won’t get time to actually see much of it.

I hope to write a follow-up post later on about what I’m currently working with, once it has gone public.

Syndicated 2012-03-15 13:15:05 from daniel.haxx.se

The updated web scraping howto

webbots-spiders-and-screen-scrapers

Web scraping is a practice that is basically as old as the web. The desire to extract contents or to machine- generate things from what perhaps was primarily intended to be presented to a browser and to humans pops up all the time.

When I first created the first tool that would later turn into curl back in 1997, it was for the purpose of scraping. When I added more protocols beyond the initial HTTP support it too was to extend its abilities to “scrape” contents for me.

I’ve not (yet!) met Michael Schrenk in person, although I’ve communicated with him back and forth over the years and back in 2007 I got a copy of his book Webbots, Spiders and Screen Scrapers in its 1st edition. Already then I liked it to the extent that I posted this positive little review on the curl-and-php mailing list saying:

this book is a rare exception and previously unmatched to my knowledge in how it covers PHP/CURL. It explains to great details on how to write web clients using PHP/CURL, what pitfalls there are, how to make your code behave well and much more.

Fast-forward to the year 2011. I was contacted by Mike and his publisher at Nostarch, and I was asked to review the book with special regards to protocol facts and curl usage. I didn’t hesitate but gladly accepted as I liked the first edition already and I believe an updated version could be useful to people.

Now, in the early 2012 Mike’s efforts have turned out into a finished second edition of his book. With updated contents and a couple of new chapters, it is refreshed and extended. The web has changed since 2007 and so has this book! I hope that my contributions didn’t only annoy Mike but possibly I helped a little bit to make it even more accurate than the original version. If you find technical or factual errors in this edition, don’t feel shy to tell me (and Mike of course) about them!

Syndicated 2012-03-06 18:41:57 from daniel.haxx.se

The first month of Spindly

Let me entertain you with some info and updates from the Spindly project. (Unfortunately we don’t have any logo yet so I don’t get to show it off here.)

Since I announced my intention to proceed and write the SPDY library on my own instead of waiting for libspdy to get back to life, I have worked on a number of infrastructure details.

I converted the build to use autotools and libtool to help us really make it a portable library. I made all test cases run without memory leaks and this took some amount of changes of libspdy since it was clearly not written with carefully checking memory and there were also a lot of unnecessarily small mallocs(). Anyone who does malloc() of 8 bytes should reconsider what they’re doing.

Since I’ve had to bugfix the libspdy so much, change structs and APIs and add new functions that were missing I decided that there’s no point in us trying to keep the original libspdy code or code style intact anymore so I’ve re-indented the whole code base to a style I like better than the original style.

I’ve started to write the fundamentals of a client and server demo application that is meant to use the Spindly API to implement both sides. They don’t really do much yet but the basics are in place. I’ve worked more on my idea of what the spindly API should look like. I’ve written the code for a few functions from that API and I’ve also added a few tests for them.

Most of this work has been made by me and me alone with no particular feedback or help by others. I continue to push my changes to github without delay and I occasionally announce stuff on the mailing list to keep interested people up to date. Hopefully this will lead to someone else joining in sooner or later.

The progress has not been very fast, not only because I’ve had to do a lot of thinking about how the API should ideally work to be really useful, but also because I have quite a lot of commitments in other open source projects (primarily curl and libssh2) that require their amount of time, not to mention that my day job of course needs proper attention.

We offer a daily snapshot of the code if you can’t use or don’t want to use git.

Upcoming

I intend to add more functions from the API document, one by one and test cases for each as I go along. In parallel I hope to get the demo client and server to run so that the API proves to actually work properly.

I want the demo client and server also to allow them to run interop tests against other implementations and I want them to be able to speak SPDY with SSL switched off – for debugging reasons. Later on, I hope to be able to use the demo server in the curl test suite so that I can test that the curl SPDY integration works correctly.

We need to either fix “check” (the unit test suite) to work C89 compatible or replace it with something else.

Want to help?

If you want to help, please subscribe to the mailing list, get familiar with the code base, study the API doc and see if it makes sense to you and then help me get that API turned into code…

Syndicated 2012-02-11 18:48:24 from daniel.haxx.se

Sloppily using SSL_OP_ALL

This story begins with a security flaw in OpenSSL. OpenSSL is truly a fundamental piece of software these days and I would go so far and say that lots of our critical infrastructure today is using it and needs it. Flaws in OpenSSL literally affect entire societies or at least risk doing so if the flaws can be exploited.

SSL/TLS is a rather old and well used protocol with many different implementations, both client and server side. In order to enhance how OpenSSL works with older SSL implementations or just those that have different views on how to implement things, OpenSSL provides an API call to tweak behaviors. The SSL_CTX_set_options function. In the curl project we’ve found good use of it for this purpose, and we use the generic define SSL_OP_ALL to switch on all “rather harmless” workarounds that OpenSSL offers. Rather harmless, that’s what the comment in the header file says.

Ok, enough background and dancing around the issue. The flaw that ignited my idea to write this blog post was a particular mistake made within SSL a long time ago within the code handling SSL 3.0 and TLS 1.0 protocols when speaking this protocol with a peer that could select the plain-text (see this explanation) – the problem is a generic one with the protocol so different SSL libraries would approach it differently. Ok, so OpenSSL fixed the flaw back in the days of 0.9.6d (we’re talking May 9th 2002). As a user of a library such as OpenSSL it always feels good to see them being on top of security problems and releasing fixes. It makes you feel that you’re being looked after to some extent.

Shortly thereafter, the OpenSSL developers discovered that some broken server implementations didn’t work with the work-around they had done…

Alas, on July 30th 2002 the OpenSSL team released version 0.9.6e which offered a way for programs to disable this particular work-around. By switching this off, it would of course make the protocol less secure again but it would inter-operate better with faulty servers. How do you switch off this security measure? By using the SSL_CTX_set_options function setting the bit SSL_OP_DONT_INSERT_EMPTY_FRAGMENTS.

Ok, so far so good. But the next step is what changed everything from fine to not so fine anymore: they then added that new bit to the SSL_OP_ALL define.

Yes. In one blow every single application out there that use SSL_OP_ALL suddently started switching off this security measure as soon as they were recompiled against this version of OpenSSL. This change was made in 2002 and this is still like this today. It fixed the security problem from OpenSSL’s aspect, but the way the bit was later added to the SSL_OP_ALL define it was instead transferred to affect many programs.

In curl’s case, we were alerted about this flaw on January 19th 2012 and it resulted in a security advisory. I did a quick search for SSL_OP_ALL on koders.com and it is obvious that there are hundreds of programs out there still using this bitmask as-is. In the curl project we enabled the SSL_OP_ALL approach for the first time in the 7.10.6 release we did in July 2003. It was wrong already at the time we started using it. It turns out we’ve been enabling this flaw for almost nine years.

In the GnuTLS camp however, they simply stopped doing their work-around for this as soon as they started supporting TLS 1.1 due to the problems the work-around caused to some servers. This since TLS 1.1 isn’t vulnerable to the problem. OpenSSL 1.0.1 beta was released on Janurary 3 2012 and is the first OpenSSL version ever released to support TLS newer than 1.0… The browsers/NSS seem to have mitigated this problem in a different way and there’s a patch available for OpenSSL to implement the same work-around but there’s been no feedback on how or if it will be used.

Syndicated 2012-01-27 22:10:31 from daniel.haxx.se

723 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!