Rorschach test: hidden structure or noise?
K-means test in Octave
Matlab comes with K-means clustering ‘out of the box’. The GNU Octave work-a-like system doesn’t, and there seem to be quite a few implementations floating around. I picked the first from Google, pretty carelessly, saving as myKmeans.m. These are notes from trying to reproduce this Matlab demo with Octave. Not rocket science but worth writing down so I can find it again.
M=4 W=2 H=4 S=500 a = M * [randn(S,1)+W, randn(S,1)+H]; b = M * [randn(S,1)+W, randn(S,1)-H]; c = M * [randn(S,1)-W, randn(S,1)+H]; d = M * [randn(S,1)-W, randn(S,1)-H]; e = M * [randn(S,1), randn(S,1)]; all_data = [a;b;c;d;e]; plot(a(:,1), a(:,2),'.'); hold on; plot(b(:,1), b(:,2),'r.'); plot(c(:,1), c(:,2),'g.'); plot(d(:,1), d(:,2),'k.'); plot(e(:,1), e(:,2),'c.'); % using http://www.christianherta.de/kmeans.html as myKmeans.m [centroid,pointsInCluster,assignment] = myKmeans(all_data,5) scatter(centroid(:,1),centroid(:,2),'x');
Querying Linked GeoData with R SPARQL client
This is the simplest thing that works to show the data flow. When combined with richer server-side support (eg. OWL tools, or spatial reasoning) and the capabilities of R plus its other extensions, there is a lot of potential here. A pie chart doesn’t capture all that, but it does show how to get started…
Exploring Linked Data with Gremlin
Gremlin is an opensource Java/Groovy system for traversing graphs, including but not limited to RDF graphs. This post is just a log of running some examples from @twarko and the Gremlin wiki and mailing list. The test run below goes pretty slowly, since it uses the Web as its database, via entry-by-entry fetches. In this case it’s fetching from DBpedia, but I’ve ran it with Freebase happily too. The on-demand RDF is handled by the Linked Data Sail; the same thing would work directly against a graph database.
Why is this interesting? Let me see if I can spell out what it’s doing. I’ll edit this post if I screw up …
Ok so the basic thing is that we start exploring the graph from one vertice, ‘v’, representing Stephen fry’s dbpedia entry.
From here, everything else is in one line, the core of which is:
v.inE(‘dbpedia-owl:starring’).outV.outE(‘dbpedia-owl:starring’).inV.groupCount(m).loop(5){it.loops < 3}
This is a series of steps (which map to TinkerPop / Pipes API calls behind the scenes).
I’m not sure this rushed explanation is 100% right, but maybe gives some flavour. See the Gremlin Wiki for the real goods.
From an application and data perspective, this system is interesting as it allows quantitatively minded graph explorations to be used alongside classically factual SPARQL. The results below show that it can dig out an actor’s co-stars (and then take account of their co-stars, and so on). This sort of neighbourhood exploration helps balance out the messyness of much Linked Data; rather than relying on explicitly asserted facts from the dataset, we can also add in derived data that comes from counting things expressed in dozens or hundreds of pages.
gremlin danbri$ sh gremlin.sh \,,,/ (o o) -----oOOo-(_)-oOOo-----
gremlin> g = new LinkedDataSailGraph(new MemoryStoreSailGraph())
==>sailgraph[linkeddatasail]
gremlin> v = g.v(‘http://dbpedia.org/resource/Stephen_Fry‘)
==>v[http://dbpedia.org/resource/Stephen_Fry]
gremlin> g.addNamespace(‘dbpedia-owl’, ‘http://dbpedia.org/ontology/’)
==>null
gremlin> rand = new Random()
==>java.util.Random@594560cf
gremlin> m = [:]
gremlin>
v.inE(‘dbpedia-owl:starring’).outV.outE(‘dbpedia-owl:starring’).inV.groupCount(m).loop(5){ it.loops < 3 }
In the background we can see the various dbpedia links being fetched (try ‘tail -f ripple.log’).
gremlin> m2 = m.sort{ a,b -> b.value <=> a.value }
[...]
gremlin> m2.subMap((m2.keySet() as List)[0..15])
==>v[http://dbpedia.org/resource/Stephen_Fry]=8160 ==>v[http://dbpedia.org/resource/Hugh_Laurie]=3641 ==>v[http://dbpedia.org/resource/Rowan_Atkinson]=2481 ==>v[http://dbpedia.org/resource/Tony_Robinson]=2168 ==>v[http://dbpedia.org/resource/Miranda_Richardson]=1791 ==>v[http://dbpedia.org/resource/Tim_McInnerny]=1398 ==>v[http://dbpedia.org/resource/Emma_Thompson]=1307 ==>v[http://dbpedia.org/resource/Robbie_Coltrane]=1303 ==>v[http://dbpedia.org/resource/Tony_Slattery]=911 ==>v[http://dbpedia.org/resource/Colin_Firth]=854 ==>v[http://dbpedia.org/resource/John_Lithgow]=732 ==>v[http://dbpedia.org/resource/Emily_Watson]=673 ==>v[http://dbpedia.org/resource/John_Hurt]=516 ==>v[http://dbpedia.org/resource/John_Cleese]=495 ==>v[http://dbpedia.org/resource/Michael_Gambon]=477 ==>v[http://dbpedia.org/resource/Helen_Mirren]=472
Video Linking: Archives and Encyclopedias
This is a quick visual teaser for some archive.org-related work I’m doing with NoTube colleagues, and a collaboration with Kingsley Idehen on navigating it.
In NoTube we are trying to match people and TV content by using rich linked data representations of both. I love Archive.org and with their help have crawled an experimental subset of the video-related metadata for the Archive. I’ve also used a couple of other sources; Sean P. Aune’s list of 40 great movies, and the Wikipedia page listing US public domain films. I fixed, merged and scraped until I had a reasonable sample dataset for testing. I wanted to test the Microsoft Pivot Viewer (a Silverlight control), and since OpenLink’s Virtuoso package now has built-in support, I got talking with Kingsley and we ended up with the following demo. Since not everyone has Silverlight, and this is just a rough prototype that may be offline, I’ve made a few screenshots. The real thing is very visual, with animated zooms and transitions, but screenshots give the basic idea.
Notes: the core dataset for now is just links between archive.org entries and Wikipedia/dbpedia pages. In NoTube we’ll also try Lupedia, Zemanta, Reuter’s OpenCalais services on the Archive.org descriptions to see if they suggest other useful links and categories, as well as any other enrichment sources (delicious tags, machine learning) we can find. There is also more metadata from the Archive that we should also be using.
This simple preview simply shows how one extra fact per Archived item creates new opportunities for navigation, discovery and understanding. Note that the UI is in no way tuned to be TV, video or archive specific; rather it just lets you explore a group of items by their ‘facets’ or common properties. It also reveals that wiki data is rather chaotic, however some fields (release date, runtime, director, star etc.) are reliably present. And of course, since the data is from Wikipedia, users can always fix the data.
You often hear Linked Data enthusiasts talk about data “silos”, and the need to interconnect them. All that means here, is that when collections are linked, then improvements to information on one side of the link bring improvements automatically to the other. When a Wikipedia page about a director, actor or movie is improved, it now also improves our means of navigating Archive.org’s wonderful collection. And when someone contributes new video or new HTML5-powered players to the Archive, they’re also enriching the Encyclopedia too.
One thing to mention is that everything here comes from the Wikipedia data that is automatically extracted from by DBpedia, and that currently the extractors are not working perfectly on all films. So it should get better in the future. I also added a lot of the image links myself, semi-automatically. For now, this navigation is much more factually-based than topic; however we do have Wikipedia categories for each film, director, studio etc., and these have been mapped to other category systems (formal and informal), so there’s a lot of other directions to explore.
What else can we do? How about flip the tiled barchart to organize by the film’s distributor, and constrain the ‘release date‘ facet to the 1940s:
That’s nice. But remember that with Linked Data, you’re always dealing with a subset of data. It’s hard to know (and it’s hard for the interface designers to show us) when you have all the relevant data in hand. In this case, we can see what this is telling us about the videos currently available within the demo. But does it tell us anything interesting about all the films in the Archive? All the films in the world? Maybe a little, but interpretation is difficult.
Next: zoom in to a specific item. The legendary Plan 9 from Outer Space (wikipedia / dbpedia).
Note the HTML-based info panel on the right hand side. In this case it’s automatically generated by Virtuoso from properties of the item. A TV-oriented version would be less generic.
Finally, we can explore the collection by constraining the timeline to show us items organized according to release date, for some facet. Here we show it picking out the career of one Edward J. Kay, at least as far as he shows up as composer of items in this collection:
Now turning back to Wikipedia to learn about ‘Edward J. Kay’, I find he has no entry (beyond these passing mentions of his name) in the English Wikipedia, despite his work on The Ape Man, The Fatal Hour, and other films. While the German Wikipedia does honour him with an entry, I wonder whether this kind of Linked Data navigation will change the dynamics of the ‘deletionism‘ debates at Wikipedia. Firstly by showing that structured data managed elsewhere can enrich the Wikipedia (and vice-versa), removing some pressure for a single Wiki to cover everything. Secondly it provides a tool to stand further back from the data and view things in a larger context; a context where for example Edward J. Kay’s achievements become clearer. Much like Freebase Parallax, the Pivot viewer hints at a future in which we explore data by navigating from sets of things to other sets of things. Pivot doesn’t yet over this, but it does very vividly present the potential for this kind of navigation, showing that navigation of films, TV shows and actors may be richer when it embraces more general mechanisms.
A Penny for your thoughts: New Year wishes from mechanical turkers
I wanted to learn more about Amazon’s Mechanical Turk service (wikipedia), and perhaps also figure out how I feel about it.
Named after a historical faked chess-playing machine, it uses the Web to allow people around the world to work on short low-pay ‘micro-tasks’. It’s a disturbing capitalist fantasy come true, echoing Frederick Taylor’s ‘Scientific Management‘ of the 1880s. Workers can be assigned tasks at the touch of the button (or through software automation); and rewarded or punished at the touch of other buttons.
Mechanical Turk has become popular for outsourcing large scale data cleanup tasks, image annotation, and other topics where human judgement outperforms brainless software. It’s also popular with spammers. For more background see ‘try a week as a turker‘ or this Salon article from 2006. Turk is not alone, other sites either build on it, or offer similar facilities. See for example crowdflower, txteagle, or Panos Ipeirotis’ list of micro-crowdsourcing services.
Crowdflower describe themselves as offering “multiple labor channels… [using] crowdsourcing to harness a round-the-clock workforce that spans more than 70 countries, multiple languages, and can access up to half-a-million workers to dispatch diverse tasks and provide near-real time answers.”
Txteagle focuses on the explosion of mobile access in the developing world, claiming that “txteagle’s GroundSwell mobile engagement platform provides clients with the ability to communicate and incentivize over 2.1 billion people“.
Something is clearly happening here. As someone who works with data and the Web, it’s hard to ignore the potential. As someone who doesn’t like treating others as interchangeable, replaceable and disposable software components, it’s hard to feel very comfortable. Classic liberal guilt territory. So I made an account, both as a worker and as a ‘requester’ (an awkward term, but it’s clear why ‘employer’ is not being used).
I tried a few tasks. I wrote 25-30 words for a blog on some medieval prophecies. I wrote 50 words as fast as I could on “things I would change in my apartment”. I tagged some images with keywords. I failed to pass a ‘qualification’ test sorting scanned photos into scratched, blurred and OK. I ‘like’d some hopeless Web site on Facebook for 2 cents. In all I made 18 US cents. As a way of passing the time, I can see the appeal. This can compete with daytime TV or Farmville or playing Solitaire or Sudoko. I quite enjoyed the mini creative-writing tasks. As a source of income, it’s quite another story, and the awful word ‘incentivize‘ doesn’t do justice to the human reality.
Then I tried the other role: requester. After a little more liberal-guilt navelgazing (“would it be inappropriate to offer to buy people’s immortal souls? etc.”), I decided to offer a penny (well, 2 cents) for up to 100 people’s new year wish thoughts, or whatever of those they felt like sharing for the price.
I copy the results below, stripped of what little detail (eg. time in seconds taken) each result came with. I don’t present this as any deep insight or sociological analysis or arty meditation. It’s just what 100 people somewhere else in the Web responded with, when asked what they wish for 2011. If you want arty, check out the sheep market. If you want more from ‘turkers’ in their own voice, do visit the ‘Turker Nation’ forum. Also Turkopticon is essential reading, ”watching out for the crowd in crowdsourcing because nobody else seems to be.”
The exact text used was “Make a wish for 2011. Anything you like, just describe it briefly. Answers will be made public.”, and the question was asked with a simple Web form, “Make a wish for 2011, … any thought you care to share”.
When you’re lonely, I wish you Love! When you’re down, I wish you Joy! When you’re troubled, I wish you Peace! When things seem empty, I wish you Hope! Have a Happy New Year!
wish u a happy new year…………
happy new year 2011. may this year bring joy and peace in your life
My wish for 2011 is i want to mary my Girlfriend this year.
I wish I will get pregnant in 2011!
i wish juhi becomes close to me
wish you a wonderful happy new year
wish you happy new year
for new year 2011 I wish Love of God must fill each human heart
Food inflation must be wiped off quickly
corruption must be rooted out smartly
Terrorism must be curtailed quickly
All People must get love, care, clothes, shelter & food
Love of God must fill each human heart…
Happy life.All desires to be fulfilled.
wish to be best entrepreneur of the year 2011
dont work hard if it is possible to do the same smarter way..
Be happy!
New year is the time to unfold new horizons,realise new dreams,rejoice in simple pleasures and gear up for new challenges.wishing a fulfilling 2011.
Remember that the best relationship is one where your love for each other is greater than your need for each other. Happy New Year
To get a newer car, and have less car problems. and have more income
I wish that my son’s health problems will be answered
Be it Success & Prosperity, Be it Fun and Frolic…
A new year is waiting for you. Go and enjoy the New Year on New Thought,”Rebirth of My Life”.
Let us wish for a world as one family, then we can overcome all the problems man made and otherwise.
My wish is to gain/learn more knowledge than in 2010
My new years wish for 2011 is to be happier and healthier.
I wish that I would be cured of heartache.
I am really very happy to wish you all very happy new year…..I wish you all the things to be success in your life and career…….. Just try to quit any bad habit within you. Just forgot all the bad incidents happen within your friends and try to enjoy this new year with pleasant……
Wish you a happy and prosperous new year.
I wish for a job.
I would hope that people will end the wars in the world.
Discontinue smoking and restrict intake of alcohol
I wish that my retail store would get a bigger client base so I can expand.
I Wish a wish for You Dear.Sending you Big bunch of Wishes from the Heart close to where.Wish you a Very Very Happy New Year
I wish for 2011 to be filled with more love and happiness than 2010.
Everything has the solution Even IMPOSSIBLE Makes I aM POSSIBLE. Happy Journey for New Year.
May each day of the coming year be vibrant and new bringing along many reasons for celebrations & rejoices. Happy New year
I have just moved and want to make some great new friends! Would love to meet a special senior (man!!) to share some wonderful times with!!!
My wish is that i wanna to live with my “Pretty girl” forever and also wanna to meet her as well,please god please, finish my this wish, no more aspire from me only once.
that people treat each other more nicely and with greater civility, in both their private and public lives.
that we would get our financial house in order
Year’s end is neither an end nor a beginning but a going on, with all the wisdom that experience can instill in us. Wish u very happy new year and take care
Wish you a very happy And prosperous new year 2011
Tom Cruise
Angelina Jolie
Aishwarya Rai
Arnold
Jennifer Lopez
Amitabh Bachhan
& me..
All the Stars wish u a Very Happy New Year.
Oh my Dear, Forget ur Fear,
Let all ur Dreams be Clear,
Never put Tear, Please Hear,
I want to tell one thing in ur Ear
Wishing u a very Happy “NEW YEAR”!
May The Year 2011 Bring for You…. Happiness,Success and filled with Peace,Hope n Togetherness of your Family n Friends….
i want to be happy
Good health for my family and friends
I wish my husband’s children would stop being so mean and violent and act like normal children. I want to love my husband just as much as before we got full custody.
to get wonderful loving girl for me.. :))
Keep some good try. Wish u happy new year
happy new year to all
My wish is to find a good job.
i wish i get a big outsourcing contract this year that i can re-set up my business and get back on track.
I wish that I be firm in whatever I do. That I can do justice to all my endeavors. That I give my 100%, my wholehearted efforts to each and every minutest work I do.
My wish for 2011, is a little patience and understanding for everyone, empathy always helps.
To be able to afford a new house
“NEW YEAR 2011″
+NEW AIM + NEW ACHIEVEMENT + NEW DREAM +NEW IDEA + NEW THINKING +NEW AMBITION =NEW LIFE+SUCCESS HAPPY NEW YEAR!
let this year be terrorist free world
Wish the world walk forward in time with all its innocence and beauty where prevails only love, and hatred no longer found in the dictionary.
no
Wish u a very happy New Year Friends and make this year as a pleasant days…
I wish the economy would get better, so people can afford to pay their bills and live more comfortably again.
i wish, god makes life beautiful and very simple to all of us. and happy new year to world.
Be always at war with your vices, at peace with your neighbors, and let each new year find you a better man and I wish a very very prosperous new year.
i wish i would buy a house and car for my mom
I wish to have a new car.
This new year will be full of expectation in the field of investment.We concerned about US dollar. Hope this year will be a good for US dollar.
this year is very enjoyment life
Cheers to a New Year and another chance for us to get it right
to get married
Wishing all a meaningful,purposeful,healthier and prosperous New Year 2011.
WISH YOU A HAPPY NEW YEAR 2011 MAY BRING ALL HAPPINESS TO YOU
RAKKIMUTHU
In 2011 I wish for my family to get in a better spot financially and world peace.
Wish that economic conditions improve to the extent that the whole spectrum of society can benefit and improve themselves.
I want my divorce to be final and for my children to be happy.
This 2011 year is very good year for All with Health & Wealth.
I wish that things for my family would get better. We have had a terrible year and I am wishing that we can look forward to a better and brighter 2011.
This year bring peace and prosperity to all. Everyone attain the greatest goal of life. May god gives us meaning of life to all.
This new year will bring happy in everyone’s life and peace among countries.
I hope for bipartisanship and for people to realize blowing up other people isn’t the best way to get their point across. It just makes everyone else angry.
A better economy would be nice too
I wish that in 2011 the government will work together as a TEAM for the betterment of all. Peace in the world.
i wish you all happy new year. may god bless all……
no i wish for you
I wish that my family will move into our own house and we can be successful in getting good jobs for our future.
I wish my girl comes back to me
Wish You Happy New Year for All, especially to the workers and requester’s of Mturk.
Greetings!!!
Wishing you and your family a very happy and prosperous NEW YEAR – 2011
May this New Year bring many opportunities your way, to explore every joy of life and may your resolutions for the days ahead stay firm, turning all your dreams into reality and all your efforts into great achievements.
Wish u a Happy and Prosperous New Year 2011….
Wishing u lots of happiness..Success..and Love
and Good Health…….
Wish you a very very happy new year
WISHING YOU ALL A VERY HAPPY & PROSPEROUS NEW YEAR…….
I wish in this 2011 is to be happy,have a good health and also my family.
I pray that the coming year should bring peace, happiness and good health.
I wish for my family to continue to be healthy, for my cars to continue running, and for no 10th Anniversary attacks this upcoming September.
be a good and help full for my family .
Happy and Prosperous New Year
New day new morning new hope new efforts new success and new feeling,a new year a new begening, but old friends are never forgotten, i think all who touched my life and made life meaningful with their support, i pray god to give u a verry “HAPPY AND SUCCESSFUL NEW YEAR”.
Be a good person,as good as no one
wish this new year brings cheers and happiness to one and all.
For the year 2011 I simply wish for the ability to support my family properly and have a healthier year.
I wish I have luck with getting a better job.
Greater awareness of climate change, and a recovering US economy.
this new year 2011 brings you all prosperous and happiness in your life…….
happy newyear wishes to all the beautiful hearts in the world in the world.god bless you all.
wishing every happy new year to all my pals and relatives and to all my lovely countrymen
XMPP untethered – serverless messaging in the core?
In the XMPP session at last february’s FOSDEM I gave a brief demo of some NoTube work on how TV-style remote controls might look with XMPP providing their communication link. For the TV part, I showed Boxee, with a tiny Python script exposing some of its localhost HTTP API to the wider network via XMPP. For the client, I have a ‘my first iphone app‘ approximation of a remote control that speaks a vapourware XMPP remote control protocol, “Buttons”.
The point of all this is about breaking open the Web-TV environment, so that different people and groups get to innovate without having to be colleagues or close-nit business partners. Control your Apple TV with your Google Android phone; or your Google TV with your Apple iPad, or your Boxee box with either. Write smart linking and bookmarking and annotation apps that improve TV for all viewers, rather than only those who’ve bought from the same company as you. I guess I managed to communicate something of this because people clapped generously when my iphone app managed to pause Boxee. This post is about how we might get from evocative but toy demos to a useful and usable protocol, and about one of our largest obstacles: XMPP’s focus on server-mediated communications.
So what happened when I hit the ‘pause’ button on the iphone remote app? Well, the app was already connected to the XMPP network, e.g. signed in as bob.notube@gmail.com via Google Talk’s servers. And so an XMPP stanza flowed out from the room we were in, across to Google somewhere, and then via XMPP server-to-server protocol over to my self-run XMPP server (an ejabberd hosted on Amazon EC2′s east USA zone somewhere). And from there, the message returned finally to Brussels, flowing through whichever Python library I was using to Boxee (signed in as buttons@foaf.tv), causing the video to pause. This happened quite quickly, and generally very quickly; but sometimes it can take more than a second. This can be very frustrating, and while there are workaround (keep-alive messages, smart code that ignores sequences of buffered ‘Pause!’ messages, apps that download metadata and bring more UI to the second screen, …), the problem has a simple cause: it just doesn’t make sense for a ‘pause’ message to cross the atlantic twice, and pass through two XMPP servers, on its the way across the living room from remote control to TV.
But first – why are we even using XMPP at all, rather than say HTTP? Partly because XMPP lets us easily address devices on home networks, that aren’t publically exposed as running a Web server. Partly for the symmetry of the protocol, since ipads, touch tables, smart phones, TVs and media centres all can host and play media items on their own displays, and we may have several such devices in a home setting that need to be in touch with one another. There’s also a certain lazyness; XMPP already defines lots of useful pieces, like buddylist rosters, pubsub notifications, group chats; it has an active and friendly community, and it comes with a healthy collection of tools and libraries. My own interests are around exploring and collectively annotating the huge archives of content that are slowly coming online, and an expectation that this could be a more shared experience, so I’m following an intuition that XMPP provides more useful ‘raw materials’ for social content exploration than raw HTTP. That said, many elements of remote control can be defined and implemented in either environment. But for today, I’m concentrating on the XMPP side.
So back at FOSDEM I raised a couple of concerns, as a long-term XMPP well-wisher but non-insider.
The first was that the technology presents itself as a daunting collection of extensions, each of which might or might not be supported in some toolkit. To this someone (likely Dave Cridland) responded with the reassuring observation that most of these could be implemented by 3rd party app developer simply reading/writing XMPP stanzas. And that in fact pretty much the only ‘core’ piece of XMPP that wasn’t treated as core in most toolkits was the serverless, point-to-point XEP-0174 ‘serverless messaging‘ mode. Everything else, the rest of us mortals could hack in application code. For serverless messaging we are left waiting and hoping for the toolkit maintainers to wire things in, as it generally requires fairly intimate knowledge of the relevant XMPP library.
My second point was in fact related: that if XMPP tools offered better support for serverless operation, then it would open up lots of interesting application options. That we certainly need it for the TV remotes use case to be a credible use of XMPP. Beyond TV remotes, there are obvious applications in the area of open, decentralised social networking. The recent buzz around things like StatusNet, GNU Social, Diaspora*, WebID, OneSocialWeb, alongside the old stuff like FOAF, shows serious interest in letting users take more decentralised control of their online social behaviour. Whether the two parties are in the same room on the same LAN, or halfway around the world from each other, XMPP and its huge collection of field-tested, code-supported extensions is relevant, even when those parties prefer to communicate directly rather than via servers.
With XMPP, app party developers have a well-defined framework into which they can drop ad-hoc stanzas of information; whether it’s a vCard or details of recently played music. This seems too useful a system to reserve solely for communications that are mediated by a server. And indeed, XMPP in theory is not tied to servers; the XEP-0174 spec tells us both how to do local-network bonjour-style discovery, and how to layer XMPP on top of any communication channel that allows XML stanzas to flow back and forth.
From the abstract,
This specification defines how to communicate over local or wide-area networks using the principles of zero-configuration networking for endpoint discovery and the syntax of XML streams and XMPP messaging for real-time communication. This method uses DNS-based Service Discovery and Multicast DNS to discover entities that support the protocol, including their IP addresses and preferred ports. Any two entities can then negotiate a serverless connection using XML streams in order to exchange XMPP message and IQ stanzas.
But somehow this remains a niche use of XMPP. Many of the toolkits have some support for it, perhaps as work-in-progress or a patch, but it remains somewhat ‘out there’ rather than core to the XMPP approach. I’d love to see this change in 2011. The 0174 spec combines a few themes; it talks a lot about discovery, motivated in part by trade-fair and conference type scenarios. When your Apple laptop finds people locally on some network to chat with by “Bonjour”, it’s doing more or less XEP-0174. For the TV remote scenario, I’m interested in having nodes from a normal XMPP network drop down and “re-discover” themselves in a hopefully-lower-latency point to point mode (within some LAN or across the Internet, or between NAT-protected home LANs). There are lots of scenarios when having a server in the loop isn’t needed, or adds cost and risk (latency, single point of failure, privacy concerns).
XEP-0174 continues,
6. Initiating an XML StreamIn order to exchange serverless messages, the initiator andrecipient MUST first establish XML streams between themselves,as is familiar from RFC 3920.First, the initiator opens a TCP connection at the IP addressand port discovered via the DNS lookup for an entity and opensan XML stream to the recipient, which SHOULD include 'to' and'from' address. [...]
This sounds pretty precise; point-to-point communication is over TCP. The Security Considerations section discussed some of the different constraints for XMPP in serverless mode, and states that …
To secure communications between serverless entities, it is RECOMMENDED to negotiate the use of TLS and SASL for the XML stream as described in RFC 3920
Having stumbled across Datagram TLS (wikipedia, design writeup), I wonder whether that might also be an option for the layer providing the XML stream between entities. For example, the chownat tool shows a UDP-based trick for establishing bidirectional communication between entities, even when they’re both behind NAT. I can’t help but wonder whether XMPP could be layered somehow on top of that (OpenSSL libraries have Datagram TLS support already, apparently). There are also other mechanisms I’ve been discussing with Mo McRoberts and Libby Miller lately, e.g. Mo’s dynamic dns / pubkeys idea, or his trick of running an XMPP server in the home, and opening it up via UPnP. But that’s for another time.
So back on my main theme: XMPP is holding itself back by always emphasising the server-mediated role. XEP-0174 has the feel of an afterthought rather than a core part of what the XMPP community offers to the wider technology scene, and the support for it in toolkits lags similarly. I’d love to hear from ‘live and breath XMPP’ folk what exactly they think is needed before it can become a more central part of the XMPP world.
From the TV remotes use case we have a few constraints, such as the need to associate identities established in different environments (eg. via public key). If xmpp:danbri-ipad@danbri.org is already on the server-based XMPP roster of xmpp:nevali-tv@nevali.net, can pubkey info in their XMPP vCards be used to help re-establish trusted communications when the devices find themselves connected in the same LAN? It seems just plain nuts to have a remote control communicate with another box in the same room via transatlantic links through Google Talk and Amazon EC2, and yet that’s the general pattern of normal XMPP communications. What would it take to have more out-of-the-box support for XEP-0174 from the XMPP toolkits? Some combination of beer, money, or a shared sense that this is worth doing and that XMPP has huge potential beyond the server-based communications model it grew from?
How to tell you’re living in the future: bacterial computers, HTML and RDF
Clue no.1. Papers like “Solving a Hamiltonian Path Problem with a bacterial computer” barely raise an eyebrow.
Clue no.2. Undergraduates did most of the work.
And the clincher, …
Clue no.3. The paper is shared nicely in the Web, using HTML, Creative Commons document license, and useful RDF can be found nearby.
From those-crazy-eggheads dept, … bacterial computers solving graph data problems. Can’t wait for the javascript API. Except the thing of interest here isn’t so much the mad science but what they say about how they did. But the paper is pretty fun stuff too.
The successful design and construction of a system that enables bacterial computing also validates the experimental approach inherent in synthetic biology. We used new and existing modular parts from the Registry of Standard Biological Parts [17] and connected them using a standard assembly method [18]. We used the principle of abstraction to manage the complexity of our designs and to simplify our thinking about the parts, devices, and systems of our project. The HPP bacterial computer builds upon our previous work and upon the work of others in synthetic biology [19-21]. Perhaps the most impressive aspect of this work was that undergraduates conducted every aspect of the design, modeling, construction, testing, and data analysis.
…undergraduates! Meanwhile, over on partsregistry.org you can read more about the bits and pieces they squished together. It’s like a biological CPAN. And in fact the anology is being actively pursued: see openwetware.org’s work on an RDF description of the catalogue.
I grabbed an RDF file from that site and confirm that simple queries like
select * from <SemanticSBOLv0.13_BioBrick_Data_v0.13.rdf> where {<http://sbol.bhi.washington.edu/rdf/sbol.owl#BBa_I715022> ?p ?v }
and
select * from <SemanticSBOLv0.13_BioBrick_Data_v0.13.rdf> where {?x ?p <http://sbol.bhi.washington.edu/rdf/sbol.owl#BBa_I715022> }
… do navigate me around the graph that describes the pieces described in their paper.
Here’s what the HTML paper says right now,
We designed and built all the basic parts used in our experiments as BioBrick compatible parts and submitted them to the Registry of Standard Biological Parts [17]. Key basic parts and their Registry numbers are: 5′ RFP (BBa_I715022), 3′ RFP (BBa_ I715023), 5′ GFP (BBa_I715019), and 3′ GFP (BBa_I715020). All basic parts were DNA sequence verified. The basic parts hixC(BBa_J44000), Hin LVA (BBa_J31001) were used from our previous experiments [8]. The parts were assembled by the BioBrick standard assembly method [18] yielding intermediates and devices that were also submitted to the Registry. Important intermediate and devices constructed are: Edge A (BBa_S03755), Edge B (BBa_S03783), Edge C (BBa_S03784), ABC HPP construct (BBa_I715042), ACB HPP construct (BBa_I715043), and BAC HPP construct (BBa_I715044). We previously built the Hin-LVA expression cassette (BBa_S03536) [8].
How nice to have a scholarly publication in HTML format, open-access published under creative commons license, and backed by machine-processable RDF data. Never mind undergrads getting bacteria to solve NP-hard graph problems, it’s the modern publishing and collaboration machinery described here that makes me feel I’m living in the future…
(World Wide Web – Let’s Share What We Know…)
ps. thanks to Dan Connolly for nudging me to get this shared with the planetrdf.com-reading community. Maybe it’ll nudge Kendall into posting something too.
‘Republic of Letters’ in R / Custom Widgets for Second Screen TV navigation trails
As ever, I write one post that perhaps should’ve been two. This is about the use and linking of datasets that aid ’second screen’ (smartphone, tablet) TV remotes, and it takes as a quick example a navigation widget and underlying dataset that show us how we might expect to navigate TV archives, in some future age when TV lives more fully in the World Wide Web. I’ll argue that access to the ‘raw data‘ and frameworks for embedding visualisation apps are of equal importance when thinking about innovative ways of exploring the ever-growing archives. All of this comes from many discussions with my NoTube colleagues and other collaborators; rambling scribblyness is all my own.
Ben Hammersley points us at a lovely Flash visualization of correspondence patterns, “Mapping the Republic of Letters“.
Mapping the Republic of Letters has at its center a multidimensional data set which spans 300 years and nearly 100,000 letters. We use computing tools that help us to measure and analyze data quantitatively, though that will not take us to our goal. While we use software and computing techniques that were designed for scientific and statistical methods, we are seeking to develop computing tools to enhance humanistic methods, to help us to explore qualitative aspects of the Republic of Letters. The subject of our study and the nature of the material require it. The collections of correspondence and records of travel from this period are incomplete. Of that incomplete material only a fraction has been digitized and is available to us. Making connections and resolving ambiguities in the data is something that can only be done with the help of computing, but cannot be done by computing alone. (from ‘methods and philosophy‘)
See their detailed writeup for more on this fascinating and quite beautiful work. As I’m working lately on linking TV content more deeply into the Web, and on ’second screen’ navigation, this struck me as just the kind of interface which it ought to be possible to re-use on a tablet PC to explore TV archives. Forgetting for the moment difficulties with Flash on iPads and so on, the idea roughly is that it would be great to embed such a visualization within a TV watching environment, such that when the ‘republic of letters’ widget is focussed on some person, place, or topic, we should have the opportunity to scan the available TV archives for related materials to show.
So a glance at Chrome’s ‘developer tools’ panel gave me a link to the underlying data used by the visualisation. I don’t know exactly whose it is, nor how they want it used, so please treat it with respect. Still, there it is, sat in the Web, in tab-separated format, begging to be used. There’s a lot you can do with the Flash application that I’ve barely touched, but I’m intrigued by the underlying dataset. In particular, where they have the string “Tonson, Jacob”, the data linker in me wants to see a Wikipedia or DBpedia link, since they provide explanation, context, related people, places and themes; all precious assets when trying to scrape together related TV materials to inform, educate or entertain someone with. From a few test searches, it turns out that (many? most?) the correspondents are quite easily matched to Wikipedia: William Congreve, Montagu, 1st earl of Halifax, Charles; Hough, bishop of Worcester, John; Stanyan, Abraham; … Voltaire and others. But what about the data?
Lately I’ve been learning just a little about R, a language used mainly for statistics and related analysis. Here’s what it’ll do ‘out of the box’, in untrained hands:
letters<-read.csv('data.txt',sep='\t', header=TRUE)
v_author = letters$Author=="Voltaire"
v_letters = letters[v_author, ]
> cbind(summary(v_letters$dest_country))
The requirements of our project are very much in sync with current work being done in the linked-data/ semantic web community and in the data visualization community, which is why collaboration with computer science has been critical to our project from the start.
Lonclass and RDF
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!