Better pipelining in libcurl 7.30.0
Back in October 2006, we added support for HTTP pipelining to libcurl. The implementation was naive and simple: it basically preferred to pipeline everything on the single connection to a given host if it could. It works only with the multi interface and if you do a second request to the same host it will try to pipeline that.
Over the years the feature was bugfixed and improved slightly, which proved that at least a couple of applications actually used it – but it was never any particularly big hit among libcurl’s vast amount of features.
Related background information that gives details on some of the problems with pipelining in the wild can be found in Mark Nottingham’s Making HTTP Pipelining Usable on the Open Web internet-draft, Mozilla’s bug report “HTTP pipelining by default” and Chrome’s pipelining docs.
Now, more than six years later, Linus Nielsen Feltzing (a colleague and friend at Haxx) strikes back with a much improved and almost completely revamped HTTP pipelining support (merged into master just hours before the new-feature window closed for the pending 7.30.0 release). This time, the implementation features and provides:
- a configurable number of connections and pipelines to each unique host name
- a round-robin approach that favors starting new connections first, and then pipeline on existing connections once the maximum number of connections to the host is reached
- a max-depth value that when filled makes the code not add any more requests on that connection/pipeline
- a pipe penalization system that avoids adding new requests to pipes that are known to be receiving very large contents and thus possibly would stall subsequent requests for an extended period of time
- a server blacklist that allows the application to specify a list of servers for which HTTP pipelining should not be attempted – real world tests has proven that some servers are too broken to be allowed to play the game
The code also adds a feature that helps out applications to do massive amount of requests in a controlled manner:
A hard maximum amount of connections, that when reached makes libcurl queue up easy handles internally until they can create a new connection or re-use a previously used one. That allows an application to for example set the limit to 50 and then add 400 handles to the multi handle but it will still only use 50 connections as a maximum so over time when requests get completed it will start new transfers on the requests that are waiting in line and thus shrinking the queue and keeping the maximum amount of connections until there’s less than 50 left to do…
Previously that kind of queuing had to be done by the application itself, but now with the much more extensive pipelining support it really isn’t as easy for an application to know when the new request can get pipelined or create a new connection so this logic is now provided by libcurl itself. It is likely going to be appreciated and used also by non-pipelining applications…
This implementation is accompanied with a bunch of new test cases for libcurl (and even a new HTTP test server for the purpose), and it has been tested in the wild for a while with libcurl as the engine in a web browser implementation (the company doing that has requested to remain anonymous). We believe it is in fairly decent state, but as this is a large step and the first release it is shipped with I expect there to be some hiccups along the way.
Two things to take note of:
- pipelining is only available for users of libcurl’s multi interface, and only if explicitly enabled with CURLMOPT_PIPELINING
- the curl command line tool does not use the multi interface and thus it will not use pipelining