Recent blog entries for bagder

Why curl defaults to stdout

(Recap: I founded the curl project, I am still the lead developer and maintainer)

When asking curl to get a URL it’ll send the output to stdout by default. You can of course easily change this behavior with options or just using your shell’s redirect feature, but without any option it’ll spew it out to stdout. If you’re invoking the command line on a shell prompt you’ll immediately get to see the response as soon as it arrives.

I decided curl should work like this, and it was a natural decision I made already when I worked on the predecessors during 1997 or so that later would turn into curl.

On Unix systems there’s a common mantra that “everything is a file” but also in fact that “everything is a pipe”. You accomplish things on Unix by piping the output of one program into the input of another program. Of course I wanted curl to work as good as the other components and I wanted it to blend in with the rest. I wanted curl to feel like cat but for a network resource. And cat is certainly not the only pre-curl command that writes to stdout by default; they are plentiful.

And then: once I had made that decision and I released curl for the first time on March 20, 1998: the call was made. The default was set. I will not change a default and hurt millions of users. I rather continue to be questioned by newcomers, but now at least I can point to this blog post! :-)

About the wget rivalry

cURLAs I mention in my curl vs wget document, a very common comment to me about curl as compared to wget is that wget is “easier to use” because it needs no extra argument in order to download a single URL to a file on disk. I get that, if you type the full commands by hand you’ll use about three keys less to write “wget” instead of “curl -O”, but on the other hand if this is an operation you do often and you care so much about saving key presses I would suggest you make an alias anyway that is even shorter and then the amount of options for the command really doesn’t matter at all anymore.

I put that argument in the same category as the people who argue that wget is easier to use because you can type it with your left hand only on a qwerty keyboard. Sure, that is indeed true but I read it more like someone trying to come up with a reason when in reality there’s actually another one underneath. Sometimes that other reason is a philosophical one about preferring GNU software (which curl isn’t) or one that is licensed under the GPL (which wget is) or simply that wget is what they’re used to and they know its options and recognize or like its progress meter better.

I enjoy our friendly competition with wget and I seriously and honestly think it has made both our projects better and I like that users can throw arguments in our face like “but X can do Y”and X can alter between curl and wget depending on which camp you talk to. I also really like wget as a tool and I am the occasional user of it, just like most Unix users. I contribute to the wget project well, both with code and with general feedback. I consider myself a friend of the current wget maintainer as well as former ones.

Syndicated 2014-11-17 09:11:12 from daniel.haxx.se

Keyboard key frequency

A while ago I wrote about my hunt for a new keyboard, and in my follow-up conversations with friends around that subject I quickly came to the conclusion I should get myself better analysis and data on how I actually use a keyboard and the individual keys on it. And if you know me, you know I like (useless) statistics.

Func KB-460 keyboardSo, I tried out the popular and widely used Linux key-logger software ‘logkeys‘ and immediately figured out that it doesn’t really support the precision and detail level I wanted so I forked the project and modified the code to work the way I want it: keyfreq was born. Code on github. (I forked it because I couldn’t find any way to send back my modifications to the upstream project, I don’t really feel a need for another project.)

Then I fired up the logging process and it has been running in the background for a while now, logging every key stroke with a time stamp.

Counting key frequency and how it gets distributed very quickly turns into basically seeing when I’m active in front of the computer and it also gave me thoughts around what a high key frequency actually means in terms of activity and productivity. Does a really high key frequency really mean that I was working intensely or isn’t that purpose more a sign of mail sending time? When I debug problems or research details, won’t those periods result in slower key activity?

In the end I guess that over time, the key frequency chart basically says that if I have pressed a lot of keys during a period, I was working on something then. Hours or days with a very low average key frequency are probably times when I don’t work as much.

The weekend key frequency is bound to be slightly wrong due to me sometimes doing weekend hacking on other computers where I don’t log the keys since my results are recorded from a single specific keyboard only.

Conclusions

So what did I learn? Here are some conclusions and results from 1276614 keystrokes done over a period of the most recent 52 calendar days.

I have a 105-key keyboard, but during this period I only pressed 90 unique keys. Out of the 90 keys I pressed, 3 were pressed more than 5% of the time – each. In fact, those 3 keys are more than 20% of all keystrokes. Those keys are: <Space>, <Backspace> and the letter ‘e’.

<Space> stands out from all the rest as it has been used more than 10%.

Only 29 keys were used more than 1% of the presses, giving this a really long tail with lots of keys hardly ever used.

Over this logged time, I have registered key strokes during 46% of all hours. Counting only the hours in which I actually used the keyboard, the average number of key strokes were 2185/hour, 36 keys/minute.

The average week day (excluding weekend days), I registered 32486 key presses. The most active sinngle minute during this logging period, I hit 405 keys. The most active single hour I managed to do 7937 key presses. During weekends my activity is much lower, and then I average at 5778 keys/day (7.2% of all activity were weekends).

When counting most active hours over the day, there are 14 hours that have more than 1% activity and there are 5 with less than 1%, leaving 5 hours with no keyboard activity at all (02:00- 06:59). Interestingly, the hour between 23-24 at night is the single most busy hour for me, with 12.5% of all keypresses during the period.

Random “anecdotes”

Longest contiguous time without keys: 26.4 hours

Longest key sequence without backspace: 946

There are 7 keys I only pressed once during this period; 4 of them are on the numerical keypad and the other three are F10, F3 and <Pause>.

More

I’ll try to keep the logging going and see if things change over time or if there later might end up things that can be seen in the data when looked over a longer period.

Syndicated 2014-11-12 08:19:13 from daniel.haxx.se

Changing networks on Mac with Firefox

Not too long ago I blogged about my work to better deal with changing networks while Firefox is running. That job was basically two parts.

A) generic code to handle receiving such a network-changed event and then

B) a platform specific part that was for Windows that detected such a network change and sent the event

Today I’ve landed yet another fix for part B called bug 1079385, which detects network changes for Firefox on Mac OS X.

mac miniI’ve never programmed anything before on the Mac so this was sort of my christening in this environment. I mean, I’ve written countless of POSIX compliant programs including curl and friends that certainly builds and runs on Mac OS just fine, but I never before used the Mac-specific APIs to do things.

I got a mac mini just two weeks ago to work on this. Getting it up, prepared and my first Firefox built from source took all-in-all less than three hours. Learning the details of the mac API world was much more trouble and can’t say that I’m mastering it now either but I did find myself at least figuring out how to detect when IP addresses on the interfaces change and a changed address is a pretty good signal that the network changed somehow.

Syndicated 2014-10-30 21:46:20 from daniel.haxx.se

daniel.haxx.se episode 8

Today I hesitated to make my new weekly video episode. I looked at the viewers number and how they basically have dwindled the last few weeks. I’m not making this video series interesting enough for a very large crowd of people. I’m re-evaluating if I should do them at all, or if I can do something to spice them up…

… or perhaps just not look at the viewers numbers at all and just do what think is fun?

I decided I’ll go with the latter for now. After all, I enjoy making these and they usually give me some interesting feedback and discussions even if the numbers are really low. What good is a number anyway?

This week’s episode:

Personal

Firefox

Fun

HTTP/2

TALKS

  • I’m offering two talks for FOSDEM

curl

  • release next Wednesday
  • bug fixing period
  • security advisory is pending

wget

Syndicated 2014-10-16 21:01:58 from daniel.haxx.se

What a removed search from Google looks like

Back in the days when I participated in the starting of the Subversion project, I found the mailing list archive we had really dysfunctional and hard to use, so I set up a separate archive for the benefit of everyone who wanted an alternative way to find Subversion related posts.

This archive is still alive and it recently surpassed 370,000 archived emails, all related to Subversion, for seven different mailing lists.

Today I received a notice from Google (shown in its entirety below) that one of the mails received in 2009 is now apparently removed from a search using a name – if done within the European Union at least. It is hard to take this seriously when you look at the page in question, and as there aren’t that very many names involved in that page the possibilities of which name it is aren’t that many. As there are several different mail archives for Subversion mails I can only assume that the alternative search results also have been removed.

This is the first removal I’ve got for any of the sites and contents I host.


Notice of removal from Google Search

Hello,

Due to a request under data protection law in Europe, we are no longer able to show one or more pages from your site in our search results in response to some search queries for names or other personal identifiers. Only results on European versions of Google are affected. No action is required from you.

These pages have not been blocked entirely from our search results, and will continue to appear for queries other than those specified by individuals in the European data protection law requests we have honored. Unfortunately, due to individual privacy concerns, we are not able to disclose which queries have been affected.

Please note that in many cases, the affected queries do not relate to the name of any person mentioned prominently on the page. For example, in some cases, the name may appear only in a comment section.

If you believe Google should be aware of additional information regarding this content that might result in a reversal or other change to this removal action, you can use our form at https://www.google.com/webmasters/tools/eu-privacy-webmaster. Please note that we can’t guarantee responses to submissions to that form.

The following URLs have been affected by this action:

http://svn.haxx.se/users/archive-2009-08/0808.shtml

Regards,

The Google Team

Syndicated 2014-10-12 11:56:12 from daniel.haxx.se

internal timers and timeouts of libcurl

wall clockBear with me. It is time to take a deep dive into the libcurl internals and see how it handles timeouts and timers. This is meant as useful information to libcurl users but even more as insights for people who’d like to fiddle with libcurl internals and work on its source code and architecture.

socket activity or timeout

Everything internally in libcurl is using the multi, asynchronous, interface. We avoid blocking calls as far as we can. This means that libcurl always either waits for activity on a socket/file descriptor or for the time to come to do something. If there’s no socket activity and no timeout, there’s nothing to do and it just returns back out.

It is important to remember here that the API for libcurl doesn’t force the user to call it again within or at the specific time and it also allows users to call it again “too soon” if they like. Some users will even busy-loop like crazy and keep hammering the API like a machine-gun and we must deal with that. So, the timeouts are mostly to be considered advisory.

many timeouts

A single transfer can have multiple timeouts. For example one maximum time for the entire transfer, one for the connection phase and perhaps even more timers that handle for example speed caps (that makes libcurl not transfer data faster than a set limit) or detecting transfers speeds below a certain threshold within a given time period.

A single transfer is done with a single easy handle, which holds a list of all its timeouts in a sorted list. It allows libcurl to return a single time left until the nearest timeout expires without having to bother with the remainder of the timeouts (yet).

Curl_expire()

… is the internal function to set a timeout to expire a certain number of milliseconds into the future. It adds a timeout entry to the list of timeouts. Expiring a timeout just means that it’ll signal the application to call libcurl again. Internally we don’t have any identifiers to the timeouts, they’re just a time in the future we ask to be called again at. If the code needs that specific time to really have passed before doing something, the code needs to make sure the time has elapsed.

Curl_expire_latest()

A newcomer in the timeout team. I figured out we need this function since if we are in a state where we need to be called no later than a certain specific future time this is useful. It will not add a new timeout entry in the timeout list in case there’s a timeout that expires earlier than the specified time limit.

This function is useful for example when there’s a state in libcurl that varies over time but has no specific time limit to check for. Like transfer speed limits and the like. If Curl_expire() is used in this situation instead of Curl_expire_latest() it would mean adding a new timeout entry every time, and for the busy-loop API usage cases it could mean adding an excessive amount of timeout entries. (And there was a scary bug reported that got “tens of thousands of entries” which motivated this function to get added.)

timeout removals

We don’t remove timeouts from the list until they expire. Like for example if we have a condition that is timing dependent, then we set a timeout with Curl_expire() and we know we should be called again at the end of that time.

If we wouldn’t add the timeout and there’s no socket activity on the socket then we may not be called again – ever.

When an internal state transition into something else and we therefore don’t need a previously set timeout anymore, we have no handle or identifier to the timeout so it cannot be removed. It will instead lead to us getting called again when the timeout triggers even though we didn’t really need it any longer. As we’re having an API that allows this anyway, this is already handled by the logic and getting called an extra time is usually very cheap and is not considered a problem worth addressing.

Timeouts are removed automatically from the list of timers when they expire. Timeouts that are in passed time are removed from the list and the timers following will then get moved to the front of the queue and be used to calculate how long the single timeout should be next.

The only internal API to remove timeouts that we have removes all timeouts, used when cleaning up a handle.

many easy handles

I’ve mentioned how each easy handle treats their timeouts above. With the multi interface, we can have any amount of easy handles added to a single multi handle. This means one list of timeouts for each easy handle.

To handle many thousands of easy handles added to the same multi handle, all with their own timeout (as each easy handle only show their closest timeout), it builds a splay tree of easy handles sorted on the timeout time. It is a splay tree rather than a sorted list to allow really fast insertions and removals.

As soon as a timeout expires from one of the easy handles and it moves to the next timeout in its list, it means removing one node (easy handle) from the splay tree and inserting it again with the new timeout timer.

Syndicated 2014-10-10 06:29:38 from daniel.haxx.se

844 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!