<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>Advogato blog for danbri</title>
    <link>http://www.advogato.org/person/danbri/</link>
    <description>Advogato blog for danbri</description>
    <language>en-us</language>
    <generator>mod_virgule</generator>
    <pubDate>Fri, 10 Feb 2012 15:51:50 GMT</pubDate>
    <item>
      <pubDate>Wed, 25 Jan 2012 14:27:08 GMT</pubDate>
      <title>MAMP / MySQL config notes</title>
      <link>http://www.advogato.org/person/danbri/diary.html?start=199</link>
      <guid>http://danbri.org/words/2012/01/25/768</guid>
      <description>&lt;p&gt;Problem: MySQL taking forever to load some large data dumps. Forever or longer.&lt;/p&gt;
&lt;p&gt;&#x201C;mysql&amp;gt; show processlist;&#x201D; shows it wedged at &#x201C;Repair with keycache&#x201D; and &#x201C;Waiting for table metadata lock&#x201D;.&lt;/p&gt;
&lt;p&gt;According to a handy &lt;a href="http://stackoverflow.com/questions/1067367/how-to-avoid-repair-with-keycache" &gt;Stack Overflow article&lt;/a&gt;, this is a known and dreaded &lt;a href="http://dev.mysql.com/doc/refman/5.5/en/metadata-locking.html" &gt;condition&lt;/a&gt;, which can be addressed by making sure tmp dir has plenty of space, and increasing size of &lt;a href="http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_myisam_max_sort_file_size " &gt;myisam_max_sort_file_size&lt;/a&gt; from 2G (2146435072) to 30G (32212254720). Using MAMP 1.9.6 it took some &lt;a href="http://stackoverflow.com/questions/678645/does-mysql-included-with-mamp-not-include-a-config-file" &gt;more digging&lt;/a&gt; to find out how to add a local my.cnf settings file for MySQL. This now lives in /Applications/MAMP/conf/my.cnf (I added into [mysqld] section a line saying &#x2018;myisam_max_sort_file_size = 30G&#x2019;.&lt;/p&gt;
&lt;p&gt;Does this work? Well I don&#x2019;t know yet. But enough times I&#x2019;ve searched around before and found my own notes, that I thought I should at least write this much down for my future self to find :)&lt;/p&gt;</description>
    </item>
    <item>
      <pubDate>Thu, 8 Dec 2011 17:27:19 GMT</pubDate>
      <title>Building R&#x2019;s RGL library for OSX Snow Leopard</title>
      <link>http://www.advogato.org/person/danbri/diary.html?start=198</link>
      <guid>http://danbri.org/words/2011/12/08/761</guid>
      <description>&lt;p&gt;&lt;a href="http://rgl.neoscientists.org/about.shtml" &gt;RGL&lt;/a&gt; is needed for &lt;a href="http://www.statmethods.net/graphs/scatterplot.html" &gt;nice&lt;/a&gt; interactive 3d plots in R, but a pain to find out how to build on a modern OSX machine.&lt;/p&gt;
&lt;p&gt;
  &lt;em&gt;&#x201C;The rgl package is a visualization device system for&#xA0;&lt;a href="http://www.r-project.org/" &gt;R&lt;/a&gt;, using OpenGL as the rendering backend. An rgl device at its core is a real-time 3D engine written in C++. It provides an interactive viewpoint navigation facility (mouse + wheel support) and an R programming interface.&#x201D;&lt;/em&gt;
&lt;/p&gt;
&lt;p&gt;The following commands worked for me in OSX Snow Leopard:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;svn checkout svn://svn.r-forge.r-project.org/svnroot/rgl&lt;/li&gt;
&lt;li&gt;R CMD INSTALL ./rgl/pkg/rgl &#x2013;configure-args=&#x201D;&#x2013;disable-carbon&#x201D; rgl&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Here&#x2019;s a test that should give an interactive 3D display if all went well, using a &lt;a href="http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/mtcars.html" &gt;built-in dataset&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;library(rgl)
cars.data &amp;lt;- as.matrix(sweep(mtcars[, -1], 2, colMeans(mtcars[, -1]))) # &lt;a href="http://www.ats.ucla.edu/stat/r/code/svd_demos.htm" &gt;cargo cult'd&lt;/a&gt;
xx &amp;lt;- svd(cars.data %*% t(cars.data))
xxd &amp;lt;- xx$v %*% sqrt(diag(xx$d))
x1 &amp;lt;- xxd[, 1]
y1 &amp;lt;- xxd[, 2]
z1 &amp;lt;- xxd[, 3]
&lt;a href="http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/plot3d.html" &gt;plot3d&lt;/a&gt;(x1,y1,z1,col="green", size=4)
&lt;a href="http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/text3d.html" &gt;text3d&lt;/a&gt;(x1,y1,z1, row.names(mtcars))&lt;/pre&gt;
&lt;p&gt;
  &lt;a href="http://danbri.org/words/wp-content/uploads/2011/12/rgltest.png" &gt;
    &lt;img class="size-full wp-image-766 alignnone" title="RGL demo" src="http://danbri.org/words/wp-content/uploads/2011/12/rgltest.png" alt="" width="598" height="621"/&gt;&lt;/a&gt;
&lt;/p&gt;</description>
    </item>
    <item>
      <pubDate>Thu, 3 Nov 2011 15:03:35 GMT</pubDate>
      <title>Dilbert schematics</title>
      <link>http://www.advogato.org/person/danbri/diary.html?start=197</link>
      <guid>http://danbri.org/words/2011/11/03/753</guid>
      <description>&lt;p&gt;How can we package, manage, mix and merge graph datasets that come from different contexts, without getting our data into a terrible mess?&lt;/p&gt;
&lt;p&gt;During the last W3C RDF Working Group meeting, we were discussing approaches to packaging up &#x2018;graphs&#x2019; of data into useful chunks that can be organized and combined. A related question, one always lurking in the background, was also discussed: how do we deal with data that goes out of date? Sometimes it is better to talk about events rather than changeable characteristics of something. So you might know my date of birth, and that is useful forever; with a bit of math and knowledge of today&#x2019;s date, you can figure out my current age, whenever needed. So &#x2018;date of birth&#x2019; on this measure has an attractive characteristic that isn&#x2019;t shared by &#x2018;age in years&#x2019;.&lt;/p&gt;
&lt;p&gt;At any point in time, I have at most one &#x2018;age in years&#x2019; property; however, you can take two descriptions of me that were at some time true, and merge them to form a messy, self-contradictory description. With this in mind, how far should we be advocating that people model using time-invariant idioms, versus working on better packaging for our data so it is clearer when it was supposed to be true, or which parts might be more volatile?&lt;/p&gt;
&lt;p&gt;The following scenario was &lt;a href="http://lists.w3.org/Archives/Public/public-rdf-wg/2011Oct/0232.html" &gt;posted to the RDF group&lt;/a&gt; as a way of exploring these tradeoffs. I repeat it here almost unaltered. I often say that RDF describes a simplified &#x2013; and sometimes over-simplified &#x2013; cartoon universe. So why not describe a real cartoon universe? Pat Hayes &lt;a href="http://lists.w3.org/Archives/Public/public-rdf-wg/2011Nov/0019.html" &gt;posted an interesting proposal&lt;/a&gt; that explores an approach to these problems; since he cited this scenario, I wrote it up as a blog post.&lt;/p&gt;
&lt;h2&gt;Describing Dilbert: theory and practice&lt;/h2&gt;
&lt;p&gt;Consider an RDF vocabulary for describing office assignments in the cartoon universe inhabited by Dilbert. Beyond the name, the examples here aren&#x2019;t tightly linked to the Dilbert cartoon. First I describe the universe, then some ways in&#xA0;which we might summarise what&#x2019;s going on using RDF graph descriptions. I would love to get a sense for any &#x2018;best practice&#x2019; claims here. Personally I see no single best way to deal with this, only different and annoying tradeoffs.&lt;/p&gt;
&lt;p&gt;So &#x2014; this is a fictional highly simplified company in which workers each are assigned to occupy exactly one cubicle, and in which every cubicle has at most one assigned worker. Cubicles&#xA0;may also sometimes be empty.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Every 3 months, the Pointy-haired boss&#xA0;has a strategic re-organization, and re-assigns workers to cubicles.&lt;/li&gt;
&lt;li&gt;He does this in a memo dictated to Dogbert, who will take the boss&#x2019;s&#xA0;vague and forgetful instructions and compare them&#xA0;to an Excel spreadsheet. This, cleaned up, eventually becomes an&#xA0;emailed Word .doc sent to the all-staff@ mailing list.&lt;/li&gt;
&lt;li&gt;The word document is basically a table of room moves, it is headed&#xA0;with a date and in bold type &#x201C;EFFECTIVE&#xA0;IMMEDIATELY&#x201D;, usually mailed out mid-evening and read by staff the&#xA0;next morning.&lt;/li&gt;
&lt;li&gt;In practice, employees move their stuff to the new cubicles over the&#xA0;course of a few days; longer if they&#x2019;re&#xA0;on holiday or off sick. Phone numbers are fixed later, hopefully. As&#xA0;are name badges etc.&lt;/li&gt;
&lt;li&gt;But generally the move takes place the day after the word file is&#xA0;circulated, and at any one point, a given&#xA0;cubicle can be fairly said to have at most one official occupant worker.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;So let&#x2019;s try to model this in RDF/RDFS/OWL.&lt;/p&gt;
&lt;p&gt;First, we can talk about the employees. Let&#x2019;s make a class, &#x2018;Employee&#x2019;.&lt;/p&gt;
&lt;p&gt;In the company systems, each employee has an ID, which is &#x2018;e-&#x2019; plus an&#xA0;integer. Once assigned, these are&#xA0;never re-assigned, even if the employee leaves or dies.&lt;/p&gt;
&lt;p&gt;We also need to talk about the office space units, the cubes or&#xA0;&#x2019;Cubicles&#x2019;. Let&#x2019;s forget for now that&#xA0;the furniture is movable, and treat each Cubicle as if it lasts&#xA0;forever. Maybe they are even somehow symbolic&#xA0;cubicle names, and the furniture that embodies them can be moved&#xA0;around to diferent office locations. But we&#xA0;don&#x2019;t try modelling that for now.&lt;/p&gt;
&lt;p&gt;In the company systems, each cubicle has an ID, which is &#x2018;c-&#x2019; plus an&#xA0;integer. Once assigned, these are&#xA0;never re-assigned, even if the cubicle becomes in any sense de-activated.&lt;/p&gt;
&lt;p&gt;Let&#x2019;s represent these as IRIs. Three employees, three cubicles.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;http://example.com/e-1&lt;/li&gt;
&lt;li&gt;http://example.com/e-2&lt;/li&gt;
&lt;li&gt;http://example.com/e-3&lt;/li&gt;
&lt;li&gt;http://example.com/c-1000&lt;/li&gt;
&lt;li&gt;http://example.com/c-1001&lt;/li&gt;
&lt;li&gt;http://example.com/c-1002&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;We can describe the names of employees. Cubicicles also have informal&#xA0;names. Let&#x2019;s say that neither change, ever.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;e-1 name &#x2018;Alice&#x2019;&lt;/li&gt;
&lt;li&gt;e-2 name &#x2018;Bob&#x2019;&lt;/li&gt;
&lt;li&gt;e-3 name &#x2018;Charlie&#x2019;&lt;/li&gt;
&lt;li&gt;c-1000 &#x2018;The Einstein Suite&#x2019;.&lt;/li&gt;
&lt;li&gt;c-1001 &#x2018;The doghouse&#x2019;.&lt;/li&gt;
&lt;li&gt;c-1002 &#x2018;Helpdesk&#x2019;.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Describing these in RDF is pretty straightforward.&lt;/p&gt;
&lt;p&gt;Let&#x2019;s now describe room assignments.&lt;/p&gt;
&lt;p&gt;At the beginning of 2011 Alice (e-1) is in c-1000; Bob (e-2) is in c-1001; Charlie (e-3) is in c-1002. How can&#xA0;we represent this in RDF?&lt;/p&gt;
&lt;p&gt;We define an RDF/RDFS/OWL relationship type aka property, called eg:hasCubicle&lt;/p&gt;
&lt;p&gt;Let&#x2019;s say our corporate ontologist comes up with this schematic&#xA0;description of cubicle assignments:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;eg:hasCubicle has a domain of eg:Employee, a range of eg:Cubicle. It is an owl:FunctionalProperty, because any Employee has at most&#xA0;one Cubicle related via hasCubicle.&lt;/li&gt;
&lt;li&gt;it is an owl:InverseFunctionalProperty, because any Cubicle is the&lt;/li&gt;
&lt;li&gt;value of hasCubicle for no more than one Employee.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;So&#x2026; at beginning of 2011 it would be truthy to assert these RDF claims:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt; &amp;lt;&lt;a href="http://example.com/e-1" &gt;http://example.com/e-1&lt;/a&gt;&amp;gt; &amp;lt;&lt;a href="http://example.com/hasCubicle" &gt;http://example.com/hasCubicle&lt;/a&gt;&amp;gt; &amp;lt;&lt;a href="http://example.com/c-1000" &gt;http://example.com/c-1000&lt;/a&gt;&amp;gt; .&lt;/li&gt;
&lt;li&gt; &amp;lt;&lt;a href="http://example.com/e-2" &gt;http://example.com/e-2&lt;/a&gt;&amp;gt; &amp;lt;&lt;a href="http://example.com/hasCubicle" &gt;http://example.com/hasCubicle&lt;/a&gt;&amp;gt; &amp;lt;&lt;a href="http://example.com/c-1001" &gt;http://example.com/c-1001&lt;/a&gt;&amp;gt; .&lt;/li&gt;
&lt;li&gt; &amp;lt;&lt;a href="http://example.com/e-3" &gt;http://example.com/e-3&lt;/a&gt;&amp;gt; &amp;lt;&lt;a href="http://example.com/hasCubicle" &gt;http://example.com/hasCubicle&lt;/a&gt;&amp;gt; &amp;lt;&lt;a href="http://example.com/c-1002" &gt;http://example.com/c-1002&lt;/a&gt;&amp;gt; .&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Now, come March 10th, everyone at the company receives an all-staff&#xA0;email from Dogbert, with cubicle reassignments.&#xA0;Amongst other changes, Alice and Bob are swapping cubicles, and&#xA0;Charlie stays in c-1002.&lt;/p&gt;
&lt;p&gt;Within a week or so (let&#x2019;s say by March 20th to be sure) The cubicle&#xA0;moves are all made real, in terms&#xA0;of where people are supposed to be based, where they are, and where&#xA0;their stuff and phone line routings are.&lt;/p&gt;
&lt;p&gt;The fictional world by March 20th 2011 is now truthily described by&#xA0;the following claims:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt; &amp;lt;&lt;a href="http://example.com/e-1" &gt;http://example.com/e-1&lt;/a&gt;&amp;gt; &amp;lt;&lt;a href="http://example.com/hasCubicle" &gt;http://example.com/hasCubicle&lt;/a&gt;&amp;gt;&#xA0;&amp;lt;&lt;a href="http://example.com/c-1001" &gt;http://example.com/c-1001&lt;/a&gt;&amp;gt; .&lt;/li&gt;
&lt;li&gt; &amp;lt;&lt;a href="http://example.com/e-2" &gt;http://example.com/e-2&lt;/a&gt;&amp;gt; &amp;lt;&lt;a href="http://example.com/hasCubicle" &gt;http://example.com/hasCubicle&lt;/a&gt;&amp;gt;&#xA0;&amp;lt;&lt;a href="http://example.com/c-1000" &gt;http://example.com/c-1000&lt;/a&gt;&amp;gt; .&lt;/li&gt;
&lt;li&gt; &amp;lt;&lt;a href="http://example.com/e-3" &gt;http://example.com/e-3&lt;/a&gt;&amp;gt; &amp;lt;&lt;a href="http://example.com/hasCubicle" &gt;http://example.com/hasCubicle&lt;/a&gt;&amp;gt;&#xA0;&amp;lt;&lt;a href="http://example.com/c-1002" &gt;http://example.com/c-1002&lt;/a&gt;&amp;gt; .&lt;/li&gt;
&lt;/ul&gt;&lt;h3&gt;Questions / view from Named Graphs.&lt;/h3&gt;
&lt;p&gt;1. Was it a mistake, bad modelling style etc, to describe things with&#xA0;&#x2019;hasCubicle&#x2019;? Should we have instead&#xA0;described a date-stamped &#x2018;CubicleAssignmentEvent&#x2019; that mentions for example the roles of Dogbert, Alice,&#xA0;and some Cubicle? Is there a &#x2018;better&#x2019; way to describe things? Is this&#xA0;an acceptable way to describe things?&lt;/p&gt;
&lt;p&gt;2. How should we express then the notion that each employee has at&#xA0;most one cubicle and vice versa? Is this&lt;br/&gt;
appropriate material to try to capture in OWL?&lt;/p&gt;
&lt;p&gt;3. How should a SPARQL store or TriG++ document capture the different&#xA0;graphs describing the evolving state of the&#xA0;company&#x2019;s office-space allocations?&lt;/p&gt;
&lt;p&gt;4. Can we offer any practical but machine-readable metadata that helps&#xA0;indicate to consuming applications&lt;br/&gt;
the potential problems that might come from merging different graphs&#xA0;that use this modelling style?&lt;br/&gt;
For example, can we write any useful definition for a class of&#xA0;property &#x201C;TimeVolatileProperty&#x201D; that could help&#xA0;people understand risk of merging different RDF graphs using &#x2018;hasCubicle&#x2019;?&lt;/p&gt;
&lt;p&gt;5. Can the &#x2018;snapshot of the world-as-it-now-is&#x2019; view and the&#xA0;&#x2019;transaction / event log view&#x2019; be equal citizens, stored in the same&#xA0;RDF store, and can metadata / manifest / table of contents info for&#xA0;that store be used to make the information usefully exploitable and&#xA0;reasonably truthy?&lt;/p&gt;</description>
    </item>
    <item>
      <pubDate>Tue, 11 Oct 2011 12:03:30 GMT</pubDate>
      <title>Linked Literature, Linked TV &#x2013; Everything Looks like a Graph</title>
      <link>http://www.advogato.org/person/danbri/diary.html?start=196</link>
      <guid>http://danbri.org/words/2011/10/11/720</guid>
      <description>&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6230460521/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6173/6230460521_6680742cce_m.jpg" alt="cloud" width="240" height="240"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;&lt;a href="http://twitter.com/#!/ben_fry" &gt;Ben Fry&lt;/a&gt; in &#x2018;&lt;a href="http://www.amazon.com/Visualizing-Data-Explaining-Processing-Environment/dp/0596514557" &gt;Visualizing Data&lt;/a&gt;&#x2018;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Graphs can be a powerful way to represent relationships between data, but they are also a very abstract concept, which means that they run the danger of meaning something only to the creator of the graph. Often, simply showing the structure of the data says very little about what it&#xA0;actually means, even though it&#x2019;s a perfectly accurate means of  representing the data. &lt;em&gt;Everything looks like a graph, but almost nothing should ever be drawn as one&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;There is a tendency when using graphs to become smitten with one&#x2019;s own data. Even though a graph of a few hundred nodes quickly becomes unreadable,&#xA0;it is often satisfying for the creator because the resulting figure is elegant and complex and may be subjectively beautiful, and the notion that the&#xA0;creator&#x2019;s data is &#x201C;complex&#x201D; fits just fine with the creator&#x2019;s own interpretation of it. Graphs have a tendency of making a data set look&#xA0;sophisticated and important, without having solved the problem of enlightening the viewer.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6230977880/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6051/6230977880_a78e467d3a.jpg" alt="markets" width="320" height="165"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Ben Fry is entirely correct.&lt;/p&gt;
&lt;p&gt;I suggest two excuses for this indulgence: if the visuals are meaningful&#xA0;only to the creator of the graph, then let&#x2019;s make everyone a graph curator. And if the things the data attempts to describe &#x2014; &lt;em&gt;for example, 14 million books and the world they in turn describe&lt;/em&gt; &#x2013;&#xA0;are complex and beautiful and under-appreciated in their complexity and interconnectedness, &#x2026; then perhaps it is ok to indulge ourselves. When do graphs become maps?&lt;/p&gt;
&lt;p&gt;I report here on some experiments that stem from two collaborations around Linked Data.  All the visuals in the post are views of bibliographic data, based on similarity measures derrived from book / subject keyword associations, with visualization and a little additional analysis using &lt;a href="http://gephi.org/" &gt;Gephi&lt;/a&gt;. Click-through to Flickr to see larger versions of any image.&lt;/p&gt;
&lt;p&gt;Firstly, in my ongoing work in the &lt;a href="http://notube.tv" &gt;NoTube project&lt;/a&gt;,&#xA0;we have been working with TV-related data, ranging from &#x2018;social Web&#x2019; activity streams, user profiles, TV archive catalogues and classification systems like &lt;a href="http://en.wikipedia.org/wiki/Lonclass" &gt;Lonclass&lt;/a&gt;. Secondly, over the&#xA0;summer I have been working with the&#xA0;&lt;a href="http://librarylab.law.harvard.edu/" &gt;Library Innovation Lab&lt;/a&gt; at Harvard, looking at ways of opening up bibliographic catalogues to the Web as Linked Data, and at&#xA0;ways of cross-linking Web materials (e.g. video materials) to a Webbified notion of &#x2018;&lt;a href="http://librarylab.law.harvard.edu/dpla/demo/" &gt;bookshelf&lt;/a&gt;&#x2018;.&lt;/p&gt;
&lt;p&gt;In NoTube we have been making use of the &lt;a href="http://mahout.apache.org" &gt;Apache Mahout&lt;/a&gt; toolkit,&#xA0;which provided us with software for collaborative filtering recommendations, clustering and automatic classification. We&#x2019;ve barely scratched the surface of&#xA0;what it can do, but here show some initial results applying Mahout to a 100,000 record subset of Harvard&#x2019;s 14 million entry catalogue. Mahout is built to scale, and the experiments here use datasets that are tiny from Mahout&#x2019;s perspective.&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6230457119/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6101/6230457119_00ba958baa.jpg" alt="gothic_idol" width="320" height="197"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;In NoTube, we used Mahout to compute similarity measures between each pair of items in a catalogue of BBC TV programmes for which we had privileged access to&#xA0;subjective viewer ratings. This was a sparse matrix of around 20,000 viewers, 12,500 broadcast items, with around 1.2 million ratings linking viewer to item. From these,&#xA0;after a few rather-too-casual tests using Mahout&#x2019;s evaluation measure system, we picked its most promising similarity measure for our data (&lt;code&gt;LogLikelihoodSimilarity&lt;/code&gt; or &lt;code&gt;Tanimoto&lt;/code&gt;),&#xA0;and then for the most similar items, simply dumped out  a huge data file that contained pairs of item numbers, plus a weight.&lt;/p&gt;
&lt;p&gt;There are many many smarter things we could&#x2019;ve tried, but&#xA0;in the spirit of &#x2018;&lt;a href="http://en.wikipedia.org/wiki/Minimum_viable_product" &gt;minimal viable product&lt;/a&gt;&#x2018;, we didn&#x2019;t try them yet. These include making use of additional metadata &lt;a href="http://www.bbc.co.uk/programmes/" &gt;published by the BBC&lt;/a&gt; in RDF, so we can help out Mahout by letting it know that when Alice loves item_62 and Bob loves item_82127, we also via RDF also knew that they are both in the same TV series and Brand. Why use fancy machine learning to rediscover things&#xA0;we already know, and that have been shared in the Web as data? We could make smarter use of metadata here. Secondly we could have used data-derrived or publisher-supplied metadata to explore whether &lt;em&gt;different&lt;/em&gt; Mahout techniques work better for different segments of the content (factual vs fiction) or even, as we have also some demographic data, different&#xA0;groups of users.&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6230977880/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6051/6230977880_a78e467d3a.jpg" alt="markets" width="320" height="165"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Anyway, Mahout gave us item-to-item similarity measures for TV. &lt;a href="http://notube.tv/2011/10/10/n-screen-a-second-screen-application-for-small-group-exploration-of-on-demand-content/" &gt;Libby has written already&lt;/a&gt; about how we used these in &#x2018;second screen&#x2019; (or &#x2018;N-th&#x2019; screen, aka N-Screen) prototypes&#xA0;showing the impact that new Web standards might make on tired and outdated notions of &#x201C;TV remote control&#x201D;.&lt;/p&gt;
&lt;p&gt;
  &lt;em&gt;What if your remote control could personalise a view of&#xA0;some content collection? What if it could show you similar things based on your viewing behavior, and that of others? What if you could explore the ever-growing&#xA0;space of TV content using simple drag-and-drop metaphors, sending items to your TV or to your friends with simple tablet-based interfaces?&lt;/em&gt;
&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6230458933/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6153/6230458933_70253b3f3b.jpg" alt="medieval_society" width="400" height="206"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;So that&#x2019;s what we&#x2019;ve been up to in NoTube. There are prototypes using BBC content (sadly not viewable by everyone due to rights restrictions), but also some&#xA0;experiments with TV materials from the Internet Archive, and some explorations that look at &lt;a href="http://www.ted.com/" &gt;TED&#x2019;s&lt;/a&gt; video collection as an example of Web-based content that (via ted.com and YouTube) are more generally viewable. Since every item in the BBC&#x2019;s Archive is catalogued using a library-based classification system (Lonclass, itself&#xA0;based on UDC) the topic of cross-referencing books and TV has cropped up a few times.&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6230979642/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6102/6230979642_ecb8c4505f.jpg" alt="new_colonialism" width="400" height="207"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Meanwhile, in (the digital Public Library of) America, &#x2026; the Harvard Library Innovation Lab team have a huge and fantastic dataset describing 14 million&#xA0;bibliographic records. I&#x2019;m not sure exactly how many are &#x2018;books&#x2019;; Libraries hold all kinds of objects these days. With the Harvard folk I&#x2019;ve been trying to help&#xA0;figure out how we could cross-reference their records with other &#x201C;Webby&#x201D; sources, such as online video materials. Again using TED as an example, because it&#xA0;is high quality but with very different metadata from the library records. So we&#x2019;ve been looking at various tricks and techniques that could help us associate book records with those. So for example, we can find tags for their videos on the TED site, but also on delicious, and on youtube. However taggers and librarians tend to describe things&#xA0;quite differently. Tags like &#x201C;todo&#x201D;, &#x201C;inspirational&#x201D;, &#x201C;design&#x201D;, &#x201C;development&#x201D; or &#x201C;science&#x201D; don&#x2019;t help us pin-point the exact library shelf where a viewer might&#xA0;go to read more on the topic. Or conversely, they don&#x2019;t help the library sites understand where within their online catalogues they could embed useful and engaging &#x201C;related link&#x201D;&#xA0;pointers off to TED.com or YouTube.&lt;/p&gt;
&lt;p&gt;So we turned to other sources. Matching TED speaker names against Wikipedia allows us to find more information&#xA0;about many TED speakers. For example the &lt;a href="http://en.wikipedia.org/wiki/Tim_Berners-Lee" &gt;Tim Berners-Lee&lt;/a&gt; entry, which in its Linked Data &lt;a href="http://dbpedia.org/page/Tim_Berners-Lee" &gt;form&lt;/a&gt; helpfully tells us that this TED speakers is in the categories&#xA0;&#x2019;Japan_Prize_laureates&#x2019;, &#x2018;English_inventors&#x2019;, &#x2019;1955_births&#x2019;, &#x2018;Internet_pioneers&#x2019;. All good to know, but it&#x2019;s hard to tell which categories tell us most about our&#xA0;speaker or video. At least now we&#x2019;re in the Linked Data space, we can navigate around to Freebase, VIAF and a growing Web of data-sources. It should be possible at least to&#xA0;associate TimBL&#x2019;s TED talks with library records for &lt;a href="http://openlibrary.org/books/OL38986M/Weaving_the_Web" &gt;his book&lt;/a&gt; (so we annotate one bibliographic entry, from 14 million! &#x2026;can&#x2019;t we map areas, not items?).&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6232014056/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6175/6232014056_090b4a4392.jpg" alt="tv" width="350" height="181"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Can we do better? What if we also associated Tim&#x2019;s two TED talk videos with other things in the library that had the same subject classifications or keywords? What if&#xA0;we could build links between the two collections based not only on published authorship, but on topical information (tags, full text analysis of TED talk transcripts). Can we&#xA0;plan for a world where libraries have access not only to MARC records, but also full text of each of millions of books?&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6233467501/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6100/6233467501_7075bc2eb0.jpg" alt="Screen%20shot%202011-10-11%20at%2010.15.07%20AM" width="400" height="206"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;I&#x2019;ve been exploring some of these ideas with David Weinberger, Paul Deschner and Matt Phillips at Harvard, and in NoTube with Libby Miller, Vicky Buser and others.&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6230976348/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6158/6230976348_2a58015f51.jpg" alt="edu" width="400" height="207"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Yesterday I took the time to make some visual sanity check of the bibliographic data as processed into a &#x2018;similarity space&#x2019; in some Mahout experiments. This is a messy&#xA0;first pass at everything, but I figured it is better to blog something and look for collaborations and feedback, than to chase perfection. For me, the big story is in linking TV materials to the gigantic back-story of context, discussion and debate curated by the world&#x2019;s libraries. If we can imagine a view of our TV content catalogues,&#xA0;and our libraries, as visual maps, with items clustered by similarity, then NoTube has shown that we can build these into the smartphones and tablets that are&#xA0;increasingly being used as TV remote controls.&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6233990814/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6160/6233990814_0b90daaeaa.jpg" alt="Screen%20shot%202011-10-11%20at%2010.12.25%20AM" width="400" height="251"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;And if the device you&#x2019;re using to pause/play/stop or rewind your TV also has access to these vast archives as they&#xA0;open up as Linked Data (as well as GPS location data and your Facebook password), all kinds of possibilities arise for linked, annotated and fact-checked TV, as well as for showing a path for libraries to continue to serve as maps of the entertainment, intellectual and scientific terrain around us.&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6233467669/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6038/6233467669_416946749f.jpg" alt="Screen%20shot%202011-10-11%20at%2010.16.46%20AM" width="400" height="206"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;A brief technical description. Everything you see here was made with Gephi, Mahout and experimental data from the Library Innovation Lab at Harvard, plus a few scripts to glue it all together.&lt;/p&gt;
&lt;p&gt;Mahout was given 100,000 extracts from the Harvard collection. Just main and sub-title, a local ID, and a list of topical phrases (mostly drawn from Library of&#xA0;Congress Subject Headings, with some local extensions). I don&#x2019;t do anything clever with these or their sub-structure or their library-documented inter-relationships. They are treated as atomic codes, and flattened into long pseudo-words such as &#x2018;occupational_diseases_prevention_control&#x2019; or &#x2018;french_literature_16th_century_history_and_criticism&#x2019;,&lt;br/&gt;
&#x2018;motion_pictures_political_aspects&#x2019;, &#x2018;songs_high_voice_with_lute&#x2019;, &#x2018;dance_music_czechoslovakia&#x2019;, &#x2018;communism_and_culture_soviet_union&#x2019;. All of human life is there.&lt;/p&gt;
&lt;p&gt;David Weinberger has been calling this gigantic scope our problem of the &#x2018;Taxonomy of Everything&#x2019;, and the label fits. By mushing phrases into fake words, I get to re-use some&#xA0;Mahout tools and avoid writing code. The result is a matrix of 100,000 bibliographic entities, by 27684 unique topical codes. Initially I made the simple test of&#xA0;feeding this as input to Mahout&#x2019;s &lt;a href="http://en.wikipedia.org/wiki/K-means_clustering" &gt;K-Means clustering&lt;/a&gt; implementation. Manually inspecting the most popular&#xA0;topical codes for each cluster (both where k=12 to put all books in 12 clusters, or k=1000 for more fine-grained groupings), I was impressed by the initial results.&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6233467903/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6100/6233467903_550ba3fa3f.jpg" alt="Screen%20shot%202011-10-11%20at%2010.22.37%20AM" width="400" height="274"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;I only have these in crude text-file form. See &lt;a href="http://danbri.org/2011/mahout/hv/_k1000.txt" &gt;hv/_k1000.txt&lt;/a&gt; and &lt;a href="http://danbri.org/2011/mahout/hv/_twelve.txt" &gt;hv/_twelve.txt&lt;/a&gt; (plus dictionary, see big file&lt;br/&gt;&lt;a href="http://danbri.org/2011/mahout/hv/_harv_dict.txt " &gt;_harv_dict.txt&lt;/a&gt; ).&lt;/p&gt;
&lt;p&gt;For example, in the 1000-cluster version, we get: &#x2018;medical_policy_united_states&#x2019;, &#x2018;health_care_reform_united_states&#x2019;, &#x2018;health_policy_united_states&#x2019;, &#x2018;medical_care_united_states&#x2019;,&lt;br/&gt;
&#x2018;delivery_of_health_care_united_states&#x2019;, &#x2018;medical_economics_united_states&#x2019;, &#x2018;politics_united_states&#x2019;, &#x2018;health_services_accessibility_united_states&#x2019;,&#xA0;&#x2019;insurance_health_united_states&#x2019;, &#x2018;economics_medical_united_states&#x2019;.&lt;/p&gt;
&lt;p&gt;Or another cluster: &#x2018;brain_physiology&#x2019;, &#x2018;biological_rhythms&#x2019;, &#x2018;oscillations&#x2019;.&lt;/p&gt;
&lt;p&gt;How about: &#x2018;museums_collection_management&#x2019;, &#x2018;museums_history&#x2019;, &#x2018;archives&#x2019;, &#x2018;museums_acquisitions&#x2019;, &#x2018;collectors_and_collecting_history&#x2019;?&lt;/p&gt;
&lt;p&gt;Another, conceptually nearby (but that proximity isn&#x2019;t visible through this simple clustering approach), &#x2018;art_thefts&#x2019;, &#x2018;theft_from_museums&#x2019;, &#x2018;archaeological_thefts&#x2019;,&#xA0;&#x2019;art_museums&#x2019;, &#x2018;cultural_property_protection_law_and_legislation&#x2019;, &#x2026;&lt;/p&gt;
&lt;p&gt;Ok, I am cherry picking. There is some nonsense in there too, but suprisingly little. And probably some associations that might cause offense. But it shows that the tooling&#xA0;is capable (by looking at book/topic associations) at picking out similarities that are significant. Maybe all of this is also available in LCSH SKOS form already, but I doubt it.&#xA0;(A side-goal here is to publish these clusters for re-use elsewhere&#x2026;).&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6233991710/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6100/6233991710_74faca926e.jpg" alt="Screen%20shot%202011-10-11%20at%2010.23.22%20AM" width="400" height="279"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;So, what if we take this, and instead compute (a bit like we did in NoTube from ratings data) similarity measures between books?&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6233468607/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6179/6233468607_c0756682ae.jpg" alt="Screen%20shot%202011-10-11%20at%2010.24.12%20AM" width="400" height="272"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;I tried that, without using much of Mahout&#x2019;s sophistication. I used its &#x2018;rowsimilarityjob&#x2019; facility and generated similarity measures for each book, then threw out most of the similarities&#xA0;except the top 5, later the top 3, from each book. From this point, I moved things over into the Gephi toolkit (&#x201C;photoshop for graphs&#x201D;), as I wanted to see how things looked.&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6233468419/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6180/6233468419_cf2c45b3d8.jpg" alt="Screen%20shot%202011-10-11%20at%2010.37.06%20AM" width="400" height="276"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;First results shown here. Nodes are books, links are strong similarity measures. Node labels are titles, or sometimes title + subtitle.  Some (the black-background ones) use Gephi&#x2019;s &#x201C;modularity detection&#x201D; analysis of the link graph. Others (white background) I imported the 1000 clusters from the earlier Mahout experiments. I tried various of the metrics in Gephi and mapped these to node size. This might fairly be called &#x2018;playing around&#x2019; at this stage, but there is at least a pipeline from raw data (eventually Linked Data I hope) through Mahout to Gephi and some visual maps of literature.&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://www.flickr.com/photos/danbri/6233990550/" &gt;
    &lt;img class="alignright" src="http://farm7.static.flickr.com/6056/6233990550_ffb261f053.jpg" alt="1k_overview" width="400" height="400"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;What does all this show?&lt;/p&gt;
&lt;p&gt;That if we can find a way to open up bibliographic datasets, there are solid opensource tools out there that can give new ways of exploring the items described in the data. That those tools (e.g. Mahout, Gephi) provide many different ways of computing similarity, clustering, and presenting. There is no single &#x2018;right answer&#x2019; for how to present literature or TV archive content as a visual map, clustering &#x201C;like with like&#x201D;, or arranging neighbourhoods. And there is also no restriction that we must work dataset-by-dataset, either. Why not use what we know from movie/TV recommendations to arrange the similarity space for books? Or vice-versa?&lt;/p&gt;
&lt;p&gt;I must emphasise (to return to Ben Fry&#x2019;s opening remark) that this is a proof-of-concept. It shows some potential, but it is neither a user interface, nor particularly informative. Gephi as a tool for making such visualizations is powerful, but it too is not a viable interface for navigating TV content. However these tools do give us a glimpse of what is hidden in giant and dull-sounding databases, and some hints for how patterns extracted from these collections could help guide us through literature, TV or more.&lt;/p&gt;
&lt;p&gt;Next steps? There are many things that could be tried; more than I could attempt. I&#x2019;d like to get some variant of these 2D maps onto ipad/android tablets, loaded with TV content. I&#x2019;d like to continue exploring the bridges between content (eg. TED) and library materials, on tablets and PCs. I&#x2019;d like to look at Mahout&#x2019;s &#x201C;collocated terms&#x201D; extraction tools in more details. These allow us to pull out recurring phrases (e.g. &#x201C;Zero Sum&#x201D;, &#x201C;climate change&#x201D;, &#x201C;golden rule&#x201D;, &#x201C;high school&#x201D;, &#x201C;black holes&#x201D; were found in &lt;a href="http://danbri.org/2011/mahout/_sorted_ted_filtered.txt" &gt;TED transcripts&lt;/a&gt;). I&#x2019;ve also tried extracting &lt;a href="http://danbri.org/2011/mahout/_sorted_harv_2gram.txt" &gt;bi-gram phrases from book titles&lt;/a&gt; using the same utility. Such tools offer some prospect of bulk-creating links not just between single items in collections, but between &lt;em&gt;neighbourhood regions&lt;/em&gt; in maps such as those shown here. The cross-links will never be perfect, but then what&#x2019;s a little serendipity between friends?&lt;/p&gt;
&lt;p&gt;As full text access to book data looms, and TV archives are &lt;a href="http://www.bbc.co.uk/blogs/bbcinternet/2011/10/digital_public_space_partnersh.html" &gt;finding their way&lt;/a&gt; &lt;a href="http://blog.archive.org/2011/08/24/understanding-911/" &gt;online&lt;/a&gt;, we&#x2019;ll need to find ways of combining user interface, bibliographic and data science skills if we&#x2019;re really going to make the most of the treasures that are being shared in the Web. Since I&#x2019;ve only fragments of each, I&#x2019;m always drawn back to think of this in terms of collaborative work.&lt;/p&gt;
&lt;p&gt;A few years ago, &lt;a href="http://www.netflixprize.com/" &gt;Netflix&lt;/a&gt; had the vision and cash to pretty much buy the attention of the entire machine learning community for a measly million dollars. Researchers love to have substantive datasets to work with, and the (now retracted) Netflix dataset is still widely sought after. Without a budget to match Netflix&#x2019;, could we still somehow offer prizes to help get such attention directed towards analysis and exploitation of linked TV and library data?  We could offer free access to the world&#x2019;s literature via a global network of libraries? Except everyone gets that for free already. Maybe we don&#x2019;t need prizes.&lt;/p&gt;
&lt;p&gt;Nearby in the Web: &lt;a href="http://notube.tv/2011/10/10/n-screen-a-second-screen-application-for-small-group-exploration-of-on-demand-content/" &gt;NoTube N-Screen&lt;/a&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <pubDate>Mon, 20 Jun 2011 16:04:21 GMT</pubDate>
      <title>Rorschach test: hidden structure or noise?</title>
      <link>http://www.advogato.org/person/danbri/diary.html?start=195</link>
      <guid>http://danbri.org/words/2011/06/20/717</guid>
      <description>&lt;p&gt;
  &lt;a href="http://danbri.org/words/wp-content/uploads/2011/06/Figure2.png" &gt;
    &lt;img class="size-large wp-image-718 alignnone" title="Splurg" src="http://danbri.org/words/wp-content/uploads/2011/06/Figure2-1024x768.png" alt="" width="640" height="480"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.archive.org/details/dw_griffith_birth_of_a_nation" &gt;Birth of a nation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.archive.org/details/his_girl_friday" &gt;His girl friday&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.archive.org/details/nosferatu" &gt;Nosferatu&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.archive.org/details/meet_john_doe" &gt;Meet John Doe&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.archive.org/details/The_Killer_Shrews" &gt;Killer Shews&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.archive.org/details/The_Amazing_Transparent_Man" &gt;The Amazing Transparent Man&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.archive.org/details/teenagers_from_outerspace" &gt;Teenagers from Outer Space&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.archive.org/details/last_woman_on_earth1960" &gt;Last Woman on Earth&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.archive.org/details/VoyagetothePlanetofPrehistoricWomen" &gt;Voyage to the Planet of Prehistoric Women&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>
    </item>
    <item>
      <pubDate>Sun, 19 Jun 2011 15:03:43 GMT</pubDate>
      <title>K-means test in Octave</title>
      <link>http://www.advogato.org/person/danbri/diary.html?start=194</link>
      <guid>http://danbri.org/words/2011/06/19/711</guid>
      <description>&lt;p&gt;Matlab comes with K-means clustering &#x2018;out of the box&#x2019;. The GNU Octave work-a-like system doesn&#x2019;t, and there seem to be quite a few implementations floating around. I picked the &lt;a href="http://www.christianherta.de/kmeans.html" &gt;first&lt;/a&gt; from Google, pretty carelessly, saving as myKmeans.m. These are notes from trying to reproduce this &lt;a href="http://www.youtube.com/watch?v=aYzjenNNOcc" &gt;Matlab demo&lt;/a&gt; with Octave. Not rocket science but worth writing down so I can find it again.&lt;/p&gt;
&lt;p&gt;
  &lt;a href="http://danbri.org/words/wp-content/uploads/2011/06/mykmeans.png" &gt;
    &lt;img class="alignright size-medium wp-image-712" title="mykmeans" src="http://danbri.org/words/wp-content/uploads/2011/06/mykmeans-300x234.png" alt="" width="300" height="234"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;pre&gt;M=4
W=2
H=4
S=500
a = M * [randn(S,1)+W, randn(S,1)+H];
b = M * [randn(S,1)+W, randn(S,1)-H];
c = M * [randn(S,1)-W, randn(S,1)+H];
d = M * [randn(S,1)-W, randn(S,1)-H];
e = M * [randn(S,1), randn(S,1)];
all_data = [a;b;c;d;e];
plot(a(:,1), a(:,2),'.');
hold on;
plot(b(:,1), b(:,2),'r.');
plot(c(:,1), c(:,2),'g.');
plot(d(:,1), d(:,2),'k.');
plot(e(:,1), e(:,2),'c.');
% using http://www.christianherta.de/kmeans.html as myKmeans.m
[centroid,pointsInCluster,assignment] = myKmeans(all_data,5)
scatter(centroid(:,1),centroid(:,2),'x');&lt;/pre&gt;</description>
    </item>
    <item>
      <pubDate>Wed, 11 May 2011 14:06:40 GMT</pubDate>
      <title>Querying Linked GeoData with R SPARQL client</title>
      <link>http://www.advogato.org/person/danbri/diary.html?start=193</link>
      <guid>http://danbri.org/words/2011/05/11/701</guid>
      <description>&lt;div id="_mcePaste"&gt;Assuming you already have the R statistics toolkit installed, this should be easy.&lt;/div&gt;
&lt;div&gt;Install &lt;a href="http://www.few.vu.nl/~wrvhage/" &gt;Willem van Hage&lt;/a&gt;&amp;#8216;s &lt;a href="http://www.few.vu.nl/~wrvhage/R/" &gt;R SPARQL client&lt;/a&gt;. I followed the instructions and it worked, although I had to also install the XML library, which was compiled and installed when I typed&#xA0;&lt;span style="font-family: arial, sans-serif; line-height: 15px; font-size: x-small;"&gt;&lt;em&gt;install&lt;/em&gt;.&lt;em&gt;packages&lt;/em&gt;(&amp;#8220;&lt;em&gt;XML&lt;/em&gt;&amp;#8220;, repos = &amp;#8220;http://www.omegahat.org/&lt;em&gt;R&lt;/em&gt;&amp;#8220;) &amp;#8216;&lt;/span&gt; within the R interpreter.&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;&lt;a href="http://swig.xmlhack.com/2011/05/10/2011-05-10.html#1305021973.421976" &gt;Yesterday I set up&lt;/a&gt; a simple SPARQL endpoint using Benjamin Nowack&amp;#8217;s&#xA0;&lt;a href="https://github.com/semsol/arc2/" &gt;ARC2&lt;/a&gt; and&#xA0;RDF data from the &lt;a href="http://www.lieber-ravensburg.de/developer/" &gt;Ravensburg&lt;/a&gt; dataset. The data includes category information about many points of interest in a German town. We can type the following 5 lines into R and show R consuming SPARQL results from the Web:&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;library(SPARQL)&lt;/li&gt;
&lt;li&gt;endpoint = &amp;#8220;&lt;a href="http://foaf.tv/hypoid/sparql.php" &gt;http://foaf.tv/hypoid/sparql.php&lt;/a&gt;&amp;#8220;&lt;/li&gt;
&lt;li&gt;q = &amp;#8220;PREFIX vcard: &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt;\nPREFIX foaf:\n&amp;lt;http://xmlns.com/foaf/0.1/&amp;gt;\nPREFIX rv:\n&amp;lt;http://www.wifo-ravensburg.de/rdf/semanticweb.rdf#&amp;gt;\nPREFIX gr:\n&amp;lt;http://purl.org/goodrelations/v1#&amp;gt;\n \nSELECT ?poi ?l ?lon ?lat ?k\nWHERE {\nGRAPH &amp;lt;http://www.heppresearch.com/dev/dump.rdf&amp;gt; {\n?poi\nvcard:geo ?l .\n &#xA0;?l vcard:longitude ?lon .\n &#xA0;?l vcard:latitude ?lat\n.\n ?poi foaf:homepage ?hp .\n?poi rv:kategorie ?k .\n\n}\n}\n&amp;#8221;&lt;/li&gt;
&lt;li&gt;res&amp;lt;-SPARQL(endpoint,q)&lt;/li&gt;
&lt;li&gt;pie(table(res$k))&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;This is the simplest thing that works to show the data flow. When combined with richer server-side support (eg. OWL tools, or &lt;a href="http://www.swi-prolog.org/pldoc/package/space.html" &gt;spatial reasoning&lt;/a&gt;) and the capabilities of R plus its other extensions, there is a lot of potential here. A pie chart doesn&amp;#8217;t capture all that, but it does show how to get started&amp;#8230;&lt;/p&gt;
&lt;div&gt;&lt;a href="http://danbri.org/words/wp-content/uploads/2011/05/rpie.png" &gt;&lt;img class="size-full wp-image-703 aligncenter" title="rpie" src="http://danbri.org/words/wp-content/uploads/2011/05/rpie.png" alt="" width="300" height="211" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;Note also that you can send any SPARQL query you like, so long as the server understands it and responds using W3C&amp;#8217;s &lt;a href="http://www.w3.org/TR/rdf-sparql-XMLres/" &gt;standard XML response&lt;/a&gt;. The R library doesn&amp;#8217;t try to interpret the query, so you&amp;#8217;re free to make use of any special features or experimental extensions understood by the server.&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
</description>
    </item>
    <item>
      <pubDate>Tue, 10 May 2011 20:08:35 GMT</pubDate>
      <title>Exploring Linked Data with Gremlin</title>
      <link>http://www.advogato.org/person/danbri/diary.html?start=192</link>
      <guid>http://danbri.org/words/2011/05/10/675</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/tinkerpop/gremlin/wiki" &gt;Gremlin&lt;/a&gt; is an opensource Java/&lt;a href="http://en.wikipedia.org/wiki/Groovy_(programming_language)" &gt;Groovy&lt;/a&gt; system for traversing &lt;a href="http://www.tinkerpop.com" &gt;graphs&lt;/a&gt;, including but not limited to RDF graphs.&#xA0;This post is just a log of running some examples from &lt;a href="http://twitter.com/twarko" &gt;@twarko&lt;/a&gt; and the Gremlin wiki and mailing list. The test run below goes pretty slowly, since it uses the Web as its database, via entry-by-entry fetches. In this case it&amp;#8217;s fetching from DBpedia, but I&amp;#8217;ve ran it with Freebase happily too. The on-demand RDF is handled by the&lt;a href="https://github.com/tinkerpop/gremlin/wiki/LinkedData-Sail" &gt; Linked Data Sail&lt;/a&gt;; the same thing would work directly against a graph database.&lt;/p&gt;
&lt;p&gt;Why is this interesting? Let me see if I can spell out what it&amp;#8217;s doing. I&amp;#8217;ll edit this post if I screw up &amp;#8230;&lt;/p&gt;
&lt;p&gt;Ok so the basic thing is that we start exploring the graph from one vertice, &amp;#8216;v&amp;#8217;, representing Stephen fry&amp;#8217;s dbpedia entry.&lt;/p&gt;
&lt;p&gt;From here, everything else is in one line, the core of which is:&lt;/p&gt;
&lt;p&gt;v.&lt;strong&gt;inE&lt;/strong&gt;(&amp;#8216;dbpedia-owl:starring&amp;#8217;).&lt;strong&gt;outV&lt;/strong&gt;.&lt;strong&gt;outE&lt;/strong&gt;(&amp;#8216;dbpedia-owl:starring&amp;#8217;).&lt;strong&gt;inV&lt;/strong&gt;.&lt;strong&gt;groupCount&lt;/strong&gt;(m).loop(5){it.loops &amp;lt; 3}&lt;/p&gt;
&lt;p&gt;This is a series of steps (which map to TinkerPop / Pipes API calls behind the scenes).&lt;/p&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;inE&lt;/strong&gt; &amp;#8216;starring&amp;#8217;: from v, a vertice, we step onto edges that come in to &amp;#8216;v&amp;#8217; if they are labelled &amp;#8216;dbpedia-owl:starring&amp;#8217;&lt;/li&gt;
&lt;li&gt;from those edges, we step to the vertices they come out (&amp;#8216;&lt;strong&gt;outV&lt;/strong&gt;&amp;#8216;) from (these are films etc that Stephen Fry stars in)&lt;/li&gt;
&lt;li&gt;from those, we step out (&amp;#8216;&lt;strong&gt;outE&lt;/strong&gt;&amp;#8216;) to edges; outgoing edges with that same &amp;#8216;starring&amp;#8217; label (we don&amp;#8217;t try filtering out Stephen here, but we could)&lt;/li&gt;
&lt;li&gt;from these edges, we step to the vertices that the &amp;#8216;starring&amp;#8217; edges enter (&amp;#8216;&lt;strong&gt;inV&lt;/strong&gt;&amp;#8216;) (vertices representing films and tv shows)&lt;/li&gt;
&lt;li&gt;we then call &lt;strong&gt;groupCount&lt;/strong&gt; and pass it our bookkeeping hashtable, m. I believe it increments a counter based on ID of current vertice or edge. As we revisit the same vertice later, the total counter for that entity goes up.&lt;/li&gt;
&lt;li&gt;from this point, we then go back 5 steps, and recurse 3 times. &amp;#8220;{ it.loops &amp;lt; 3 }&amp;#8221; (this last is a closure; we can drop any code in here&amp;#8230;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;I&amp;#8217;m not sure this rushed explanation is 100% right, but maybe gives some flavour. See the &lt;a href="https://github.com/tinkerpop/gremlin/wiki/LinkedData-Sail" &gt;Gremlin Wiki &lt;/a&gt;for the real goods.&lt;/p&gt;
&lt;p&gt;From an application and data perspective, this system is interesting as it allows quantitatively minded graph explorations to be used alongside classically factual SPARQL. The results below show that it can dig out an actor&amp;#8217;s co-stars (and then take account of their co-stars, and so on). This sort of neighbourhood exploration helps balance out the messyness of much Linked Data; rather than relying on explicitly asserted facts from the dataset, we can also add in derived data that comes from counting things expressed in dozens or hundreds of pages.&lt;/p&gt;
&lt;pre&gt;gremlin danbri$ sh gremlin.sh
\,,,/
(o o)
-----oOOo-(_)-oOOo-----&lt;/pre&gt;
&lt;p&gt;gremlin&amp;gt; g = new LinkedDataSailGraph(new MemoryStoreSailGraph())&lt;br /&gt;
==&amp;gt;sailgraph[linkeddatasail]&lt;br /&gt;
gremlin&amp;gt; v = g.v(&amp;#8216;&lt;a href="http://dbpedia.org/resource/Stephen_Fry" &gt;http://dbpedia.org/resource/Stephen_Fry&lt;/a&gt;&amp;#8216;)&lt;br /&gt;
==&amp;gt;v[http://dbpedia.org/resource/Stephen_Fry]&lt;br /&gt;
gremlin&amp;gt; g.addNamespace(&amp;#8216;dbpedia-owl&amp;#8217;, &amp;#8216;http://dbpedia.org/ontology/&amp;#8217;)&lt;br /&gt;
==&amp;gt;null&lt;br /&gt;
gremlin&amp;gt; rand = new Random()&lt;br /&gt;
==&amp;gt;java.util.Random@594560cf&lt;br /&gt;
gremlin&amp;gt; m = [:]&lt;br /&gt;
gremlin&amp;gt;&lt;br /&gt;
v.inE(&amp;#8216;dbpedia-owl:starring&amp;#8217;).outV.outE(&amp;#8216;dbpedia-owl:starring&amp;#8217;).inV.groupCount(m).loop(5){ it.loops &amp;lt; 3 }&lt;br /&gt;
In the background we can see the various dbpedia links being fetched (try &amp;#8216;&lt;span style="font-family: Consolas, Monaco, 'Courier New', Courier, monospace; line-height: 18px; white-space: pre;"&gt;tail -f ripple.log&amp;#8217;).&lt;br /&gt;
&lt;/span&gt;gremlin&amp;gt; m2 = m.sort{ a,b -&amp;gt; b.value &amp;lt;=&amp;gt; a.value }&lt;br /&gt;
[...]&lt;br /&gt;
gremlin&amp;gt; m2.subMap((m2.keySet() as List)[0..15])&lt;/p&gt;
&lt;pre&gt;==&amp;gt;v[http://dbpedia.org/resource/Stephen_Fry]=8160
==&amp;gt;v[http://dbpedia.org/resource/Hugh_Laurie]=3641
==&amp;gt;v[http://dbpedia.org/resource/Rowan_Atkinson]=2481
==&amp;gt;v[http://dbpedia.org/resource/Tony_Robinson]=2168
==&amp;gt;v[http://dbpedia.org/resource/Miranda_Richardson]=1791
==&amp;gt;v[http://dbpedia.org/resource/Tim_McInnerny]=1398
==&amp;gt;v[http://dbpedia.org/resource/Emma_Thompson]=1307
==&amp;gt;v[http://dbpedia.org/resource/Robbie_Coltrane]=1303
==&amp;gt;v[http://dbpedia.org/resource/Tony_Slattery]=911
==&amp;gt;v[http://dbpedia.org/resource/Colin_Firth]=854
==&amp;gt;v[http://dbpedia.org/resource/John_Lithgow]=732
==&amp;gt;v[http://dbpedia.org/resource/Emily_Watson]=673
==&amp;gt;v[http://dbpedia.org/resource/John_Hurt]=516
==&amp;gt;v[http://dbpedia.org/resource/John_Cleese]=495
==&amp;gt;v[http://dbpedia.org/resource/Michael_Gambon]=477
==&amp;gt;v[http://dbpedia.org/resource/Helen_Mirren]=472&lt;/pre&gt;
</description>
    </item>
    <item>
      <pubDate>Tue, 1 Feb 2011 20:07:40 GMT</pubDate>
      <title>Video Linking: Archives and Encyclopedias</title>
      <link>http://www.advogato.org/person/danbri/diary.html?start=191</link>
      <guid>http://danbri.org/words/2011/02/01/658</guid>
      <description>&lt;p&gt;This is a quick visual teaser for some &lt;a href="http://archive.org/" &gt;archive.org&lt;/a&gt;-related work I&amp;#8217;m doing with NoTube colleagues, and a collaboration &lt;a href="http://twitter.com/#!/kidehen/status/32470388189429760" &gt;with Kingsley Idehen&lt;/a&gt; on navigating it.&lt;/p&gt;
&lt;p&gt;In NoTube we are trying to match people and TV content by using rich linked data representations of both. I love &lt;a href="http://archive.org/" &gt;Archive.org&lt;/a&gt; and with &lt;a href="http://danbri.org/words/2010/10/27/565" &gt;their help&lt;/a&gt; have crawled an experimental subset of the video-related metadata for the Archive. I&amp;#8217;ve also used a couple of other sources; Sean P. Aune&amp;#8217;s &lt;a href="http://tech.blorge.com/Structure:%20/2010/08/11/top-40-best-free-legal-movies-you-can-download-right-now/" &gt;list&lt;/a&gt; of 40 great movies, and the Wikipedia page listing &lt;a href="http://en.wikipedia.org/wiki/List_of_films_in_the_public_domain_in_the_United_States" &gt;US public domain films&lt;/a&gt;. I fixed, merged and scraped until I had a reasonable &lt;a href="http://buttons.notube.tv/moredata/archive.org/films/_archivemeta.nt" &gt;sample dataset&lt;/a&gt; for testing. I wanted to test the &lt;a href="http://en.wikipedia.org/wiki/Microsoft_Live_Labs_Pivot" &gt;Microsoft Pivot Viewe&lt;/a&gt;r (a Silverlight control), and since OpenLink&amp;#8217;s Virtuoso package now has built-in support, I got talking with Kingsley and we ended up with the following demo. Since not everyone has Silverlight, and this is just a rough prototype that may be offline, I&amp;#8217;ve made a few screenshots. The real thing is very visual, with animated zooms and transitions, but screenshots give the basic idea.&lt;/p&gt;
&lt;p&gt;Notes: the core dataset for now is just &lt;em&gt;links&lt;/em&gt; between archive.org entries and Wikipedia/dbpedia pages. In NoTube we&amp;#8217;ll also try &lt;a href="http://lupedia.ontotext.com/" &gt;Lupedia&lt;/a&gt;, &lt;a href="http://www.zemanta.com/" &gt;Zemanta&lt;/a&gt;, Reuter&amp;#8217;s &lt;a href="http://www.opencalais.com/" &gt;OpenCalais&lt;/a&gt; services on the Archive.org descriptions to see if they suggest other useful links and categories, as well as any other enrichment sources (delicious tags, machine learning) we can find. There is also more metadata from the Archive that we should also be using.&lt;/p&gt;
&lt;p&gt;This simple preview simply shows how&lt;em&gt; one extra fact&lt;/em&gt; per Archived item creates new opportunities for navigation, discovery and understanding. Note that the UI is in no way tuned to be TV, video or archive specific; rather it just lets you explore a group of items by their &amp;#8216;facets&amp;#8217; or common properties. It also reveals that wiki data is rather chaotic, however some fields (release date, runtime, director, star etc.) are reliably present. And of course, since the data is from Wikipedia, users can always fix the data.&lt;/p&gt;
&lt;p&gt;You often hear Linked Data enthusiasts talk about data &amp;#8220;silos&amp;#8221;, and the need to interconnect them. All that means here, is that when collections are linked, then improvements to information on one side of the link bring improvements automatically to the other. When a Wikipedia page about a director, actor or movie is improved, it now also improves our means of navigating Archive.org&amp;#8217;s wonderful collection. And when someone contributes new video or new HTML5-powered players to the Archive, they&amp;#8217;re also enriching the Encyclopedia too.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://danbri.org/words/wp-content/uploads/2011/02/archive-pivot-releasedate.png" &gt;&lt;img class="alignnone size-large wp-image-660" title="archive-pivot-releasedate" src="http://danbri.org/words/wp-content/uploads/2011/02/archive-pivot-releasedate-1024x515.png" alt="Archive.org films on a timeline by release date according to Wikipedia." width="640" height="321" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;One thing to mention is that everything here comes from the Wikipedia data that is automatically extracted from by &lt;a href="http://dbpedia.org/" &gt;DBpedia&lt;/a&gt;, and that currently the extractors are not working perfectly on all films. So it should get better in the future. I also added a lot of the image links myself, semi-automatically. For now, this navigation is much more factually-based than topic; however we do have Wikipedia categories for each film, director, studio etc., and these have been mapped to other category systems (formal and informal), so there&amp;#8217;s a lot of other directions to explore.&lt;/p&gt;
&lt;p&gt;What else can we do? How about flip the tiled barchart to organize by the film&amp;#8217;s &lt;em&gt;distributor&lt;/em&gt;, and constrain the &amp;#8216;&lt;em&gt;release date&lt;/em&gt;&amp;#8216; facet to the 1940s:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://danbri.org/words/wp-content/uploads/2011/02/distributors-1940s.png" &gt;&lt;img class="alignnone size-large wp-image-662" title="distributors-1940s" src="http://danbri.org/words/wp-content/uploads/2011/02/distributors-1940s-1024x447.png" alt="" width="640" height="279" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That&amp;#8217;s nice. But remember that with Linked Data, you&amp;#8217;re always dealing with a subset of data. It&amp;#8217;s hard to know (and it&amp;#8217;s hard for the interface designers to show us) when you have all the relevant data in hand. In this case, we can see what this is telling us about the videos currently available within the demo. But does it tell us anything interesting about all the films in the Archive? All the films in the world? Maybe a little, but interpretation is difficult.&lt;/p&gt;
&lt;p&gt;Next: zoom in to a specific item. The legendary &lt;a href="http://www.archive.org/details/Plan_9_from_Outer_Space_1959" &gt;Plan 9 from Outer Space&lt;/a&gt; (&lt;a href="http://en.wikipedia.org/wiki/Plan_9_from_Outer_Space" &gt;wikipedia&lt;/a&gt; / &lt;a href="http://dbpedia.org/page/Plan_9_from_Outer_Space" &gt;dbpedia&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Note the HTML-based info panel on the right hand side. In this case it&amp;#8217;s automatically generated by Virtuoso from properties of the item. A TV-oriented version would be less generic.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://danbri.org/words/wp-content/uploads/2011/02/plan9.png" &gt;&lt;img class="alignnone size-large wp-image-663" title="plan9" src="http://danbri.org/words/wp-content/uploads/2011/02/plan9-1024x449.png" alt="" width="640" height="280" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Finally, we can explore the collection by constraining the timeline to show us items organized according to release date, for some facet. Here we show it picking out the career of one Edward J. Kay, at least as far as he shows up as composer of items in this collection:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://danbri.org/words/wp-content/uploads/2011/02/when-edward-kay.png" &gt;&lt;img class="alignnone size-large wp-image-664" title="when-edward-kay" src="http://danbri.org/words/wp-content/uploads/2011/02/when-edward-kay-1024x450.png" alt="" width="640" height="281" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Now turning back to Wikipedia to learn about &amp;#8216;Edward J. Kay&amp;#8217;, I find he has no entry (beyond these passing mentions of his name) in the English Wikipedia, despite his work on &lt;a href="http://en.wikipedia.org/wiki/The_Ape_Man" &gt;The Ape Man&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/The_Fatal_Hour" &gt;The Fatal Hour,&lt;/a&gt; and other films. &#xA0;While the German Wikipedia does honour him with &lt;a href="http://de.wikipedia.org/wiki/Edward_J._Kay" &gt;an entry&lt;/a&gt;, I wonder whether this kind of &lt;a href="http://en.wikipedia.org/wiki/Linked_Data" &gt;Linked Data&lt;/a&gt; navigation will change the dynamics of the &amp;#8216;&lt;a href="http://meta.wikimedia.org/wiki/Deletionism" &gt;deletionism&lt;/a&gt;&amp;#8216; debates at Wikipedia. &#xA0;Firstly by showing that structured data managed elsewhere can enrich the Wikipedia (and vice-versa), removing some pressure for a single Wiki to cover everything. Secondly it provides a tool to stand further back from the data and view things in a larger context; a context where for example Edward J. Kay&amp;#8217;s achievements become clearer. Much like &lt;a href="http://www.freebase.com/labs/parallax/" &gt;Freebase Parallax&lt;/a&gt;, the Pivot viewer hints at a future in which we explore data by navigating from &lt;em&gt;sets of things&lt;/em&gt; to other &lt;em&gt;sets of things&lt;/em&gt;. &#xA0;Pivot doesn&amp;#8217;t yet over this, but it does very vividly present the potential for this kind of navigation, showing that navigation of films, TV shows and actors may be richer when it embraces more general mechanisms.&lt;/p&gt;
</description>
    </item>
    <item>
      <pubDate>Sat, 1 Jan 2011 12:11:12 GMT</pubDate>
      <title>A Penny for your thoughts: New Year wishes from mechanical turkers</title>
      <link>http://www.advogato.org/person/danbri/diary.html?start=190</link>
      <guid>http://danbri.org/words/2011/01/01/650</guid>
      <description>&lt;p&gt;I wanted to learn more about &lt;a href="http://mturk.amazon.com" &gt;Amazon&amp;#8217;s Mechanical Turk&lt;/a&gt; service (&lt;a href="http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk" &gt;wikipedia&lt;/a&gt;), and perhaps also figure out how I feel about it.&lt;/p&gt;
&lt;p&gt;Named after a historical &lt;a href="http://en.wikipedia.org/wiki/The_Turk" &gt;faked chess-playing machine&lt;/a&gt;, it uses the Web to allow people around the world to work on short low-pay &amp;#8216;micro-tasks&amp;#8217;.  It&amp;#8217;s a disturbing capitalist fantasy come true, echoing Frederick Taylor&amp;#8217;s &amp;#8216;&lt;a href="http://en.wikipedia.org/wiki/Scientific_management" &gt;Scientific Management&lt;/a&gt;&amp;#8216; of the 1880s. Workers can be assigned tasks at the touch of the button (or through software automation); and rewarded or punished at the touch of other buttons.&lt;/p&gt;
&lt;p&gt;Mechanical Turk has become popular for outsourcing large scale data cleanup tasks, image annotation, and other topics where human judgement outperforms brainless software. It&amp;#8217;s also popular with spammers. For more background see &amp;#8216;&lt;a href="http://webupon.com/browsers/try-a-week-as-a-turker/" &gt;try a week as a turker&lt;/a&gt;&amp;#8216; or this &lt;a href="http://www.salon.com/technology/feature/2006/07/24/turks" &gt;Salon article&lt;/a&gt; from 2006. Turk is not alone, other sites either build on it, or offer similar facilities. See for example &lt;a href="http://crowdflower.com/" &gt;crowdflower&lt;/a&gt;, &lt;a href="http://txteagle.com/" &gt;txteagle&lt;/a&gt;, or Panos Ipeirotis&amp;#8217;&#xA0;&lt;a href="http://behind-the-enemy-lines.blogspot.com/2010/10/explosion-of-micro-crowdsourcing.html" &gt;list of micro-crowdsourcing services&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Crowdflower &lt;a href="http://crowdflower.com/about" &gt;describe themselves&lt;/a&gt; as offering &amp;#8220;multiple labor channels&amp;#8230; &#xA0;[using] crowdsourcing to harness a round-the-clock workforce that spans more than 70 countries, multiple languages, and can access up to half-a-million workers to dispatch diverse tasks and provide near-real time answers.&amp;#8221;&lt;/p&gt;
&lt;p&gt;Txteagle &lt;a href="http://txteagle.com/?q=about/overview" &gt;focuses&lt;/a&gt; on the explosion of mobile access in the developing world, claiming that &amp;#8220;t&lt;em&gt;xteagle&#x2019;s GroundSwell mobile engagement platform provides clients with the ability to communicate and incentivize over 2.1 billion people&lt;/em&gt;&amp;#8220;.&lt;/p&gt;
&lt;p&gt;Something is clearly happening here. As someone who works with data and the Web, it&amp;#8217;s hard to ignore the potential. As someone who doesn&amp;#8217;t like treating others as interchangeable, replaceable and disposable software components, it&amp;#8217;s hard to feel very comfortable. Classic liberal guilt territory. So I made an account, both as a worker and as a &amp;#8216;requester&amp;#8217; (an awkward term, but it&amp;#8217;s clear why &amp;#8216;employer&amp;#8217; is not being used).&lt;/p&gt;
&lt;p&gt;I tried a few tasks. I wrote 25-30 words for a blog on some medieval prophecies. I wrote 50 words as fast as I could on &amp;#8220;things I would change in my apartment&amp;#8221;. I tagged some images with keywords. I failed to pass a &amp;#8216;qualification&amp;#8217; test sorting scanned photos into scratched, blurred and OK. I &amp;#8216;like&amp;#8217;d some hopeless Web site on Facebook for 2 cents. In all I made 18 US cents. As a way of passing the time, I can see the appeal. This can compete with daytime TV or Farmville or playing Solitaire or Sudoko. I quite enjoyed the mini creative-writing tasks. As a source of income, it&amp;#8217;s quite another story, and the awful word &amp;#8216;&lt;em&gt;incentivize&lt;/em&gt;&amp;#8216; doesn&amp;#8217;t do justice to the human reality.&lt;/p&gt;
&lt;p&gt;Then I tried the other role: requester. After a little more liberal-guilt navelgazing (&amp;#8220;would it be &lt;em&gt;inappropriate&lt;/em&gt; to offer to buy people&amp;#8217;s immortal souls? etc.&amp;#8221;), I decided to offer a penny (well, 2 cents) for up to 100 people&amp;#8217;s new year wish thoughts, or whatever of those they felt like sharing for the price.&lt;/p&gt;
&lt;p&gt;I copy the results below, stripped of what little detail (eg. time in seconds taken) each result came with. I don&amp;#8217;t present this as any deep insight or sociological analysis or arty meditation. It&amp;#8217;s just what 100 people somewhere else in the Web responded with, when asked what they wish for 2011. If you want arty, check out &lt;a href="http://www.thesheepmarket.com/" &gt;the sheep market&lt;/a&gt;. If you want more from &amp;#8216;turkers&amp;#8217; in their own voice, do visit the &lt;a href="http://turkers.proboards.com/" &gt;&amp;#8216;Turker Nation&amp;#8217; forum&lt;/a&gt;. Also &lt;a href="http://turkopticon.differenceengines.com/" &gt;Turkopticon&lt;/a&gt; is essential reading, &#xA0;&amp;#8221;&lt;em&gt;watching out for the crowd in crowdsourcing because nobody else seems to be&lt;/em&gt;.&amp;#8221;&lt;/p&gt;
&lt;p&gt;The exact text used was &amp;#8220;Make a wish for 2011. Anything you like, just describe it briefly. Answers will be made public.&amp;#8221;, and the question was asked with a simple Web form, &amp;#8220;Make a wish for 2011, &amp;#8230; any thought you care to share&amp;#8221;.&lt;/p&gt;
&lt;hr /&gt;Here&amp;#8217;s what they said:&lt;/p&gt;
&lt;p&gt;When you&amp;#8217;re lonely, I wish you Love! When you&amp;#8217;re down, I wish you Joy! When you&amp;#8217;re troubled, I wish you Peace! When things seem empty, I wish you Hope! Have a Happy New Year!&lt;/p&gt;
&lt;p&gt;wish u a happy new year&amp;#8230;&amp;#8230;&amp;#8230;&amp;#8230;&lt;/p&gt;
&lt;p&gt;happy new year 2011. may this year bring joy and peace in your life&lt;/p&gt;
&lt;p&gt;My wish for 2011 is i want to mary my Girlfriend this year.&lt;/p&gt;
&lt;p&gt;I wish I will get pregnant in 2011!&lt;/p&gt;
&lt;p&gt;i wish juhi becomes close to me&lt;/p&gt;
&lt;p&gt;wish you a wonderful happy new year&lt;/p&gt;
&lt;p&gt;wish you happy new year&lt;/p&gt;
&lt;p&gt;for new year 2011 I wish Love of God must fill each human heart&lt;br /&gt;
Food inflation must be wiped off quickly&lt;br /&gt;
corruption must be rooted out smartly&lt;br /&gt;
Terrorism must be curtailed quickly&lt;br /&gt;
All People must get love, care, clothes, shelter &amp;amp; food&lt;br /&gt;
Love of God must fill each human heart&amp;#8230;&lt;/p&gt;
&lt;p&gt;Happy life.All desires to be fulfilled.&lt;/p&gt;
&lt;p&gt;wish to be best entrepreneur of the year 2011&lt;/p&gt;
&lt;p&gt;dont work hard if it is possible to do the same smarter way..&lt;br /&gt;
Be happy!&lt;/p&gt;
&lt;p&gt;New year is the time to unfold new horizons,realise new dreams,rejoice in simple pleasures and gear up for new challenges.wishing a fulfilling 2011.&lt;/p&gt;
&lt;p&gt;Remember that the best relationship is one where your love for each other is greater than your need for each other. Happy New Year&lt;/p&gt;
&lt;p&gt;To get a newer car, and have less car problems. and have more income&lt;/p&gt;
&lt;p&gt;I wish that my son&amp;#8217;s health problems will be answered&lt;/p&gt;
&lt;p&gt;Be it Success &amp;amp; Prosperity, Be it Fun and Frolic&amp;#8230;&lt;/p&gt;
&lt;p&gt;A new year is waiting for you. Go and enjoy the New Year on New Thought,&amp;#8221;Rebirth of My Life&amp;#8221;.&lt;/p&gt;
&lt;p&gt;Let us wish for a world as one family, then we can overcome all the problems man made and otherwise.&lt;/p&gt;
&lt;p&gt;My wish is to gain/learn more knowledge than in 2010&lt;/p&gt;
&lt;p&gt;My new years wish for 2011 is to be happier and healthier.&lt;/p&gt;
&lt;p&gt;I wish that I would be cured of heartache.&lt;/p&gt;
&lt;p&gt;I am really very happy to wish you all very happy new year&amp;#8230;..I wish you all the things to be success in your life and career&amp;#8230;&amp;#8230;.. Just try to quit any bad habit within you. Just forgot all the bad incidents happen within your friends and try to enjoy this new year with pleasant&amp;#8230;&amp;#8230;&lt;/p&gt;
&lt;p&gt;Wish you a happy and prosperous new year.&lt;/p&gt;
&lt;p&gt;I wish for a job.&lt;/p&gt;
&lt;p&gt;I would hope that people will end the wars in the world.&lt;/p&gt;
&lt;p&gt;Discontinue smoking and restrict intake of alcohol&lt;/p&gt;
&lt;p&gt;I wish that my retail store would get a bigger client base so I can expand.&lt;/p&gt;
&lt;p&gt;I Wish a wish for You Dear.Sending you Big bunch of Wishes from the Heart close to where.Wish you a Very Very Happy New Year&lt;/p&gt;
&lt;p&gt;I wish for 2011 to be filled with more love and happiness than 2010.&lt;/p&gt;
&lt;p&gt;Everything has the solution Even IMPOSSIBLE Makes I aM POSSIBLE. Happy Journey for New Year.&lt;/p&gt;
&lt;p&gt;May each day of the coming year be vibrant and new bringing along many reasons for celebrations &amp;amp; rejoices. Happy New year&lt;/p&gt;
&lt;p&gt;I have just moved and want to make some great new friends!  Would love to meet a special senior (man!!) to share some wonderful times with!!!&lt;/p&gt;
&lt;p&gt;My wish is that i wanna to live with my &amp;#8220;Pretty girl&amp;#8221; forever and also wanna to meet her as well,please god please, finish my this wish, no more aspire from me only once.&lt;/p&gt;
&lt;p&gt;that people treat each other more nicely and with greater civility, in both their private and public lives.&lt;/p&gt;
&lt;p&gt;that we would get our financial house in order&lt;/p&gt;
&lt;p&gt;Year&amp;#8217;s end is neither an end nor a beginning but a going on, with all the wisdom that experience can instill in us. Wish u very happy new year and take care&lt;/p&gt;
&lt;p&gt;Wish you a very happy And prosperous new year 2011&lt;/p&gt;
&lt;p&gt;Tom Cruise&lt;br /&gt;
Angelina Jolie&lt;br /&gt;
Aishwarya Rai&lt;br /&gt;
Arnold&lt;br /&gt;
Jennifer Lopez&lt;br /&gt;
Amitabh Bachhan&lt;br /&gt;
&amp;amp; me..&lt;br /&gt;
All the Stars wish u a Very Happy New Year.&lt;/p&gt;
&lt;p&gt;Oh my Dear, Forget ur Fear,&lt;br /&gt;
Let all ur Dreams be Clear,&lt;br /&gt;
Never put Tear, Please Hear,&lt;br /&gt;
I want to tell one thing in ur Ear&lt;br /&gt;
Wishing u a very Happy &amp;#8220;NEW YEAR&amp;#8221;!&lt;/p&gt;
&lt;p&gt;May The Year 2011 Bring for You&amp;#8230;. Happiness,Success and filled with Peace,Hope n Togetherness of your Family n Friends&amp;#8230;.&lt;/p&gt;
&lt;p&gt;i want to be happy&lt;/p&gt;
&lt;p&gt;Good health for my family and friends&lt;/p&gt;
&lt;p&gt;I wish my husband&amp;#8217;s children would stop being so mean and violent and act like normal children. I want to love my husband just as much as before we got full custody.&lt;/p&gt;
&lt;p&gt;to get wonderful loving girl for me.. :))&lt;/p&gt;
&lt;p&gt;Keep some good try. Wish u happy new year&lt;/p&gt;
&lt;p&gt;happy new year to all&lt;/p&gt;
&lt;p&gt;My wish is to find a good job.&lt;/p&gt;
&lt;p&gt;i wish i get a big outsourcing contract this year that i can re-set up my business and get back on track.&lt;/p&gt;
&lt;p&gt;I wish that I be firm in whatever I do. That I can do justice to all my endeavors. That I give my 100%, my wholehearted efforts to each and every minutest work I do.&lt;/p&gt;
&lt;p&gt;My wish for 2011, is a little patience and understanding for everyone, empathy always helps.&lt;/p&gt;
&lt;p&gt;To be able to afford a new house&lt;/p&gt;
&lt;p&gt;&amp;#8220;NEW YEAR 2011&amp;#8243;&lt;br /&gt;
+NEW AIM + NEW ACHIEVEMENT + NEW DREAM  +NEW IDEA + NEW THINKING +NEW AMBITION =NEW LIFE+SUCCESS   HAPPY NEW YEAR!&lt;/p&gt;
&lt;p&gt;let this year be terrorist free world&lt;/p&gt;
&lt;p&gt;Wish the world walk forward in time with all its innocence and beauty where prevails only love, and hatred no longer found in the dictionary.&lt;/p&gt;
&lt;p&gt;no&lt;/p&gt;
&lt;p&gt;Wish u a very happy New Year Friends and make this year as a pleasant days&amp;#8230;&lt;/p&gt;
&lt;p&gt;I wish the economy would get better, so people can afford to pay their bills and live more comfortably again.&lt;/p&gt;
&lt;p&gt;i wish, god makes life beautiful and very simple to all of us. and happy new year to world.&lt;/p&gt;
&lt;p&gt;Be always at war with your vices, at peace with your neighbors, and let each new year find you a better man and I wish a very very prosperous new year.&lt;/p&gt;
&lt;p&gt;i wish i would buy a house and car for my mom&lt;/p&gt;
&lt;p&gt;I wish to have a new car.&lt;br /&gt;
This new year will be full of expectation in the field of  investment.We concerned about US dollar. Hope this year will be a good for US dollar.&lt;/p&gt;
&lt;p&gt;this year is very enjoyment life&lt;/p&gt;
&lt;p&gt;Cheers to a New Year and another chance for us to get it right&lt;/p&gt;
&lt;p&gt;to get married&lt;/p&gt;
&lt;p&gt;Wishing all a meaningful,purposeful,healthier and prosperous New Year 2011.&lt;/p&gt;
&lt;p&gt;WISH YOU A HAPPY NEW YEAR 2011 MAY BRING ALL HAPPINESS TO YOU&lt;/p&gt;
&lt;p&gt;RAKKIMUTHU&lt;/p&gt;
&lt;p&gt;In 2011 I wish for my family to get in a better spot financially and world peace.&lt;/p&gt;
&lt;p&gt;Wish that economic conditions improve to the extent that the whole spectrum of society can benefit and improve themselves.&lt;/p&gt;
&lt;p&gt;I want my divorce to be final and for my children to be happy.&lt;/p&gt;
&lt;p&gt;This 2011 year is very good year for All with Health &amp;amp; Wealth.&lt;/p&gt;
&lt;p&gt;I wish that things for my family would get better. We have had a terrible year and I am wishing that we can look forward to a better and brighter 2011.&lt;/p&gt;
&lt;p&gt;This year bring peace and prosperity to all. Everyone attain the greatest goal of life. May god gives us meaning of life to all.&lt;/p&gt;
&lt;p&gt;This new year will bring happy in everyone&amp;#8217;s life and peace among countries.&lt;/p&gt;
&lt;p&gt;I hope for bipartisanship and for people to realize blowing up other people isn&amp;#8217;t the best way to get their point across.  It just makes everyone else angry.&lt;/p&gt;
&lt;p&gt;A better economy would be nice too&lt;/p&gt;
&lt;p&gt;I wish that in 2011 the government will work together as a TEAM for the betterment of all.  Peace in the world.&lt;/p&gt;
&lt;p&gt;i wish you all happy new year. may god bless all&amp;#8230;&amp;#8230;&lt;/p&gt;
&lt;p&gt;no i wish for you&lt;/p&gt;
&lt;p&gt;I wish that my family will move into our own house and we can be successful in getting good jobs for our future.&lt;/p&gt;
&lt;p&gt;I wish my girl comes back to me&lt;/p&gt;
&lt;p&gt;Wish You Happy New Year for All, especially to the workers and requester&amp;#8217;s of Mturk.&lt;/p&gt;
&lt;p&gt;Greetings!!!&lt;/p&gt;
&lt;p&gt;Wishing you and your family a very happy and prosperous NEW YEAR &#xE2;&#x20AC;&#x201C; 2011&lt;/p&gt;
&lt;p&gt;May this New Year bring many opportunities your way, to explore every joy of life and may your resolutions for the days ahead stay firm, turning all your dreams into reality and all your efforts into great achievements.&lt;/p&gt;
&lt;p&gt;Wish u a Happy and Prosperous New Year 2011&amp;#8230;.&lt;/p&gt;
&lt;p&gt;Wishing u lots of happiness..Success..and Love&lt;/p&gt;
&lt;p&gt;and Good Health&amp;#8230;&amp;#8230;.&lt;/p&gt;
&lt;p&gt;Wish you a very very happy new year&lt;/p&gt;
&lt;p&gt;WISHING YOU ALL A VERY HAPPY &amp;amp; PROSPEROUS NEW YEAR&amp;#8230;&amp;#8230;.&lt;/p&gt;
&lt;p&gt;I wish in this 2011 is to be happy,have a good health and also my family.&lt;/p&gt;
&lt;p&gt;I pray that the coming year should bring peace, happiness and good health.&lt;/p&gt;
&lt;p&gt;I wish for my family to continue to be healthy, for my cars to continue running, and for no 10th Anniversary attacks this upcoming September.&lt;/p&gt;
&lt;p&gt;be a good and help full for my family .&lt;/p&gt;
&lt;p&gt;Happy and Prosperous New Year&lt;/p&gt;
&lt;p&gt;New day new morning new hope new efforts new success and new feeling,a new year a new begening, but old friends are never forgotten, i think all who touched my life and made life meaningful with their support, i pray god to give u a verry &amp;#8220;HAPPY AND SUCCESSFUL NEW YEAR&amp;#8221;.&lt;/p&gt;
&lt;p&gt;Be a good person,as good as no one&lt;/p&gt;
&lt;p&gt;wish this new year brings cheers and happiness to one and all.&lt;/p&gt;
&lt;p&gt;For the year 2011 I simply wish for the ability to support my family properly and have a healthier year.&lt;/p&gt;
&lt;p&gt;I wish I have luck with getting a better job.&lt;/p&gt;
&lt;p&gt;Greater awareness of climate change, and a recovering US economy.&lt;/p&gt;
&lt;p&gt;this new year 2011 brings you all prosperous and happiness in your life&amp;#8230;&amp;#8230;.&lt;/p&gt;
&lt;p&gt;happy newyear wishes to all the beautiful hearts in the world in the world.god bless you all.&lt;/p&gt;
&lt;p&gt;wishing every happy new year to all my pals and relatives and to all my lovely countrymen&lt;/p&gt;
</description>
    </item>
  </channel>
</rss>

