13 Jul 2014 Stevey   » (Master)

A brief twitter experiment

So I've recently posted a few links on Twitter, and I see followers clicking them. But also I see random hits.

Tonight I posted a link to http://transient.email/, a domain I use for "anonymous" emailing, specifically to see which bots hit the URL.

Within two minutes I had 15 visitors the first few of which were:

IP User-Agent Request Twitterbot/1.0; GET /robots.txt Twitterbot/1.0; GET /robots.txt python-requests/1.2.3 CPython/2.7.2+ Linux/3.0.0-16-virtual HEAD / Mozilla/5.0 (); GET / Google-HTTP-Java-Client/1.17.0-rc (gzip) HEAD / Google-HTTP-Java-Client/1.17.0-rc (gzip) HEAD / Twitterbot/1.0; GET /robots.txt Mozilla/5.0 (compatible; TweetmemeBot/3.0; +http://tweetmeme.com/) GET / MetaURI API/2.0 +metauri.com GET / Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp); GET /robots.txt

So what jumps out? The twitterbot makes several requests for /robots.txt, but never actually fetches the page itself which is interesting because there is indeed a prohibition in the supplied /robots.txt file.

A surprise was that both Google and Yahoo seem to follow Twitter links in almost real-time. Though the Yahoo site parsed and honoured /robots.txt the Google spider seemed to only make HEAD requests - and never actually look for the content or the robots file.

In addition to this a bunch of hosts from the Amazon EC2 space made requests, which was perhaps not a surprise. Some automated processing, and classification, no doubt.

Anyway beer. It's been a rough weekend.

Syndicated 2014-07-13 19:08:04 from Steve Kemp's Blog

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!