Testing post formats
Move along, nothing to see here.
Testing post formats
Move along, nothing to see here.
Why Growstuff is Open Source
This was originally posted on the new Growstuff blog, which I set up the other day. I also set up a fortnightly newsletter, to which you should subscribe if you want to keep up with what’s happening with Growstuff as we count down to our public launch, in (eep!) about 2-and-a-bit months.
My background is in open source software, and I’ve been using and producing it for almost twenty years. Sometimes it’s easy to live in the open source bubble, and fail to notice that there are areas where open source software is not common or standard. Over the past few months, working on Growstuff, I’ve attended a number of events for social enterprises and sustainability, and checked out dozens of websites aimed at food gardeners or people trying to live more sustainable lives. Venturing outside my former bubble, I’ve found that open source software is the exception rather than the rule in these areas, so I thought it would be a good idea to talk about why Growstuff is open source, and why we think it’s important.
It’d be traditional at this point to talk about what open source software is, and to give a quick definition. But open source is at least three things, and each needs its own explanation.
First of all, open source is a political movement that aims to change the power balance between software creators and software users. When you use traditional software, you have to take it as-is. If you don’t like it, you have few options. Software makers can change the software any way they like, charge you what they want for it, or withdraw their support for it at any time. You’re locked in an unequal relationship with them, where they hold all the power. Open source software gives power back to the users, letting them — us — understand how it works, use the software how we want, modify it if we need to, and access it regardless of who we are, where we’re from, or how rich we happen to be.
It does this through special software licenses. You’ve probably clicked “Accept” on a lot of software licenses in your time, and open source licenses are just like this, except that they offer you (as a software user) a bunch of rights, where other licenses typically take them away. An open source license says that you have the right to use the software for any purpose whatsoever. It says that you’re allowed to read the source code — the underlying program that makes the software run — and to change it if you need to, to suit your needs. It says that you can share the software freely, passing it on to friends or colleagues without having to pay license fees or worry that the software creator will come after you. In some cases (as in the license Growstuff uses) it says that if you modify the software and share it with others, you must use the same open source license, to make sure that people down the line have the same rights you do, and to share the love as widely as possible.
Finally, by changing the balance of power between software creators and users, and enshrining that greater equality in a formal document, we open ourselves up to a more collaborative way of working. Software creators and users are able to come together to build the software they need, and users can even contribute directly to the software itself, by modifying the source code and offering their changes back to the original creator. Over the years, open source software developers have learned all kinds of effective ways to work together as distributed, often international teams, and to engage their user communities in developing something that they really want to use and in which they feel a sense of ownership.
So what’s this got to do with social enterprise, sustainability, and Growstuff? In my mind, open source, sustainability, and social enterprise are closely intertwined, to the point where I feel that choosing open source is a vital part of the whole picture.
When we talk about social enterprises — businesses that hope to achieve a social good through their business activities — we seldom look at their software practices. But the choice of software to use, or decision to develop software under a closed or open model, has a social impact, just as do the choice of environmentally friendly materials for physical manufacturing, or the decision to employ people from disadvantaged backgrounds. We expect social enterprises to follow ethical business practices; why not expect them to follow software practices that support equal access, transparency, and accountability?
When it comes to sustainability, it’s about more than changing your light bulbs or using a fancy water bottle. Sustainability’s about developing communities and ways of living and working that can survive and thrive in the long term. Open source is a sustainable way of building software. If a company that writes closed software goes under, the software dies with it, but an open source software project can live long beyond the people or institutions that started it. Since there’s a broad community of people familiar with the software, who know how to read and modify its source code, new developers can step up. An open source project is one that builds community and resilience against all kinds of change: exactly what sustainability is about!
These are the reasons why we think it’s important that Growstuff be open source. We want to work openly and ethically, in collaboration with our members, building a community that feels a sense of ownership and deep involvement in the software that runs our website. We want other projects, especially those working in similar areas, to be able to look at what we’re doing and learn from us, through reading or re-using our source code. We want to know that if something happens to Growstuff itself, a new Growstuff — or a hundred new Growstuffs — could sprout up, and that people could continue to benefit from what we’ve built far into the future.
Hurrah, I’m $37 richer!
Just a quick post to note that Growstuff (my open source project for food gardeners) was selected as one of the winners of Pinboard’s satirical startup incubator program. I get $37 in funding, woohoo!
While the $37 won’t pay for much of anything — that’s the point, after all — I’m looking forward to Maciej’s advice and help with getting our name out there, and to getting to know the other winners. I’m pleased to see another food startup on the list (home baked goods via the Internet!), would love to be able to use the pre-hardened machine images for AWS, and can’t help but be excited that a sailing-related startup is amongst the winners. While I don’t play board games much, nor have a kid in school, both those projects sound useful and likely to succeed, too. Congrats to my co-selectees!
I just asked the Internet to crowdsource a professional bio for me, figuring that literally anything would be better than having to write one myself. The results aren’t bad, though the process was far messier than that would suggest. (Etherpad link will disappear in 30 days, may get messed up before that. I’ve saved a copy offline for posterity.)
My favourite quote from the process, from Sumana:
She reinvents herself so frequently that any given moment is an inflection point, unextrapolatable.
I don’t know where I can possibly use that, but I love it, so I’m posting it here.
A couple of weeks ago on Twitter, prompted mostly by Maciej’s Pinboard Investment Co-Prosperity Cloud, I asked whether there was any sort of discussion/community/nexus of information around tech startups that don’t follow the VC-funded Silicon Valley model, but look for alternative/more sustainable ways to do things. I got a few answers with links to things of interest, but nothing that really made me say “Yes! There is a thing here!”
Still, I thought it was worth collecting links somewhere. So this post is just to say that I’ve put together a reading list of sorts, and I’m going to keep tagging stuff there as I find it. So far it includes things about tech co-ops, criticism of Silicon Valley’s “disruptive” business models, thoughtful posts about business models, and some examples of alt startups that I really like.
If you were going to start reading anywhere, I’d recommend Anil Dash’s To Less Efficient Startups. I think what he’s saying is really important.
If you have any other good links, please let me know.
No resolutions this year
Just wanted to note the new year and say, yes, it is indeed 2013. I didn’t feel moved to make any resolutions this time round. I figure I’ll be busy enough with Growstuff and if I can do a good job of that, that’s achievement enough.
A number of my friends have made or renewed resolutions to read more books by people of colour. I was at the public library yesterday and found myself looking at the shelves with that in mind. I wasn’t looking for anything in particular, so I just started on the nearest shelves, which happened to cover the history of the Middle East, Asia, and Africa. Very few authors’ names struck me as being other than Anglo. Sigh. I did find two books about Egypt though, and a couple of books on Australian Aboriginal history in the next row. I’m glad that other people’s resolutions made me more mindful of this.
What are your resolutions this year? Or have you punted like me?
Global Shifts conference
Tomorrow I’m off to Global Shifts, a three day social enterprise conference being hosted at RMIT. I’m very glad someone happened to mention it to me last week, just in time for me to register.
I’ve started describing Growstuff, in appropriate circles, as a social enterprise. Lots of people don’t know what the term means, so I’ll just quickly define it: a social enterprise is a business which hopes to achieve a social good, but does so through its business practices rather than the fundraising/donations model that most charities use.
I consider Growstuff to be a social enterprise on several levels. The first is that by helping people grow their own food, we are addressing food insecurity and promoting environmental sustainability. The second is that by aggregating data about people’s food growing activities and releasing it under a Creative Commons license, along with our open source code, we’re freely providing technology to help other people build tools and services for food growers, or to help researchers understand how people are growing food. The third way that Growstuff works as a social enterprise is through our community and development processes: as a non-traditional software project, we offer training/mentoring and a supportive environment for people from non-traditional technology backgrounds or who are marginalised in the technology industry to learn, improve their skills, and take leadership roles.
I’ve been to an uncountable number of tech conferences over the past decade or so, but Global Shifts will be my first social enterprise one. I keep remembering something someone said in an intro session the one time I attended SXSW: “Don’t attend sessions about things you already know. You’ll only sit there being annoyed they’re not covering your favourite topics, and thinking you could do better. Instead, go to sessions about things you know nothing about.” Some of the best conferences I’ve been to have been the ones where I’m stepping outside my usual field — I’m thinking especially about the museum/library/archive events and digital humanities “THATCamps” I attended in 2010-2011 — and I’m hoping that Global Shifts is going to have the same effect: lots of new subjects to fill my brain, and very few where I doze off because I’ve heard it all before.
Here are some of the sessions I’m hoping to attend:
So, that’s my plan for Global Shifts. I’ll probably be tweeting from there (hashtag #globalshifts). If anyone reading this is attending and wants to meet up, drop me a line.
My Name Is Me is back
A few people have contacted me lately asking where “My Name Is Me” (previously at http://my.nameis.me/) had got to. Well, the domain registration expired, the WordPress site that I didn’t login to very often got malwared to hell and back, and when I asked around, nobody wanted to take it over.
However, I recently set up WordPress Multisite (and wow, that was easier than I thought it would be — recommended!) and I’m in the process of moving all my various blogs to it. Among them, since I had an archive sitting around, is MNIM.
And so, in “celebration” (a ha ha) of Google+ releasing a “community” feature that excludes LGBTQ people; abuse survivors; refugees; whistleblowers; people in the military, medical, legal, political, education, or social work fields; people from countries which commonly use monomyms or mixed character sets for names; people who want to chat with their gaming, open source, fandom, or SCAdian buddies; nuns and monks; performers known by their stage names; authors known by their pen names; activists and political dissidents… oh look, just go see the site. In recognition of all these people and their exclusion from G+ and similar social networks, MNIM is now back at mynameisme.org.
Note that it’s in “archival” mode — I’m not actively soliciting new people to list on the site, and the forms for submitting stories have been removed. It took a team of hard workers slogging away at all the editorial work for MNIM, and we’re no longer up for that. Hopefully the work we did last year will still be useful as it stands.
Importing data is hard: a rant about integrating open data projects
A few times on the Growstuff mailing list or IRC channel, someone’s excitedly suggested that we should import data from another CC-licensed data set. Each time, I say, “Trust me, that’s pretty complicated,” but I’ve never actually sat down and explained the full gory details of why.
The following is something I wrote up for our wiki so that I could point people at it next time the subject comes up. I thought it might be interesting to a wider audience, too, so that’s why I’m posting it here.
This is a bit of a rant by Skud, who used to work on Freebase, a large open-licensed data repository which imported data in bulk from a range of sources, including Wikipedia, Netflix, the Open Library Project, and many more. She’s had a lot of experience in this area, and learnt a lot about the weird complications of mass data imports.
They have a database. You have a database. Your fields are the same. Their API is easy to use and their license is compatible.
What if the fields aren’t quite equivalent? For instance, let’s say they have measurements in imperial and we use metric. We’ll need to have ways to convert them. That’s actually a really simple example. Import incompatibilities are more often at a semantic/ontological level. Growstuff has the idea of “crops” and “varieties” but what if the other database only has “plants” with no distinction? Or what if they have crops and varieties but draw the line somewhere slightly different to where we do? These sorts of incompatibilities are more common than not, and massively complicate any import effort.
Nothing against that other database — some of everyone’s data is bogus! But we need to check it. What “bogus” means will vary from place to place, but it might be spam entries, duplicate records, simple errors, or it might be cruft from their own broken imports. We need to look carefully at every import and make sure we’re skipping as much of this as possible. And this is largely a manual process, since what the bogosity will never be the same twice. You can do this by sampling, of course, but you still need to look at something on the order of a hundreds of records, and know what you’re looking for. Could you spot a mixed-up scientific name on a randomly chosen herb? I couldn’t.
Let’s say we want to import from a database of plant life that lists 10,000 edible plants and their nutritional content. Growstuff has 300 crops at present. We import everything! Now we have 9,700 pages with nothing but nutritional data. Nobody on Growstuff is using them, they have no pictures, they have no planting data, they have no discussions (except maybe spam comments that nobody cleans up because nobody notices). Our “newest crops” page, usually a source of interest, is now just a wasteland of grey placeholder images.
Should we have imported all 10,000 plants, or just the nutritional data of the 300 we already have? Or something in between? The answer is usually “something in between” — you might want that data if and only if you can get other partial data from other imports to make it more interesting.
The best way to do this is to import the 300 and make a note of the 9700. Then later, you can cross-correlate the notes you’ve made from various data imports and re-import those that have, say, at least 3 useful data sources and a picture. But that’s pretty complicated. (Also, see the discussion of repeated imports, below.)
Let’s assume that their data is licensed compatibly — that means CC-BY-SA or CC-BY in our case, since we’re CC-BY-SA and none of the other clauses (ND, NC) are compatible with us. (Ignoring CC-0 and public domain stuff for now — those don’t need attribution at all.)
So by importing, we have to credit them. Now we need some way to represent that in the database. If we do this at the object level, it’s fairly simple: each thing in the database (crop, etc) has many licensors, each of which includes a name for the work (eg. “Katie’s Plants”), a license (eg. CC-BY), a licensor name (eg. Katie Smith), and a URL to link to the original data.
Now we have to display them on the page. Where? Probably at the bottom somewhere: “Some information on this page came from: Katie’s Plants (that would be a link) — CC-BY Katie Smith; SuperPlantDB under CC-BY-SA SuperPlants Inc; etc.”
Now imagine that the data on those sites came from other sites. For instance, let’s say Katie’s Plants previously did an import from Freebase.com, and SuperPlantDB did one from Wikipedia. We not only need to credit Katie’s Plants and SuperPlantDB but also those places.
Some questions to consider:
Sure, we could just choose not to chain licenses, or to do it in some restricted way… but the moral high road here is to respect everyone’s license and attribution, and besides, if you only attribute some contributors, where do you decide to draw the line?
This is a subset of license chaining problems. Let’s say Growstuff (a commercial entity using CC-BY-SA) imports from Katie’s Plants (a non-profit entity using CC-BY-SA) which imports in turn imports from Hippie Herbs (a non-profit entity using CC-BY-NC — note the “non-commercial” clause).
Katie’s fine — she imports from Hippie Herbs’ data with impunity because she’s non-profit. She attributes them on her site, and Hippie Herbs is happy. She doesn’t have to use the same license as them because they don’t have a “SA” (Share Alike) clause.
Now Growstuff comes along and wants to import data from Katie’s Plants. Katie’s Plants is CC-BY which is compatible with Growstuff… but what about the data that originally came from Hippie Herbs? We’re commercial, so we’re not meant to use it.
But how do we tell what’s what? Katie probably doesn’t attribute HH at the level of individual bits of data, so we can’t extract just the ok-for-commercial-use bits.
Basically, if you believe in license chaining (and as I said, it’s definitely the moral high road to take, so I think we should) then you have to be constantly vigilant for the taint of NC-licensed data anywhere in the sprawling tree of ancestors to your data.
The simple case is fine for a green-field import with no existing data, which is described above. But let’s say we’re importing data into an area where we already have some contributions from Growstuff members.
Let’s say we decide to adjudicate. We now need to build an app to let people vote on which one is “correct” — probably best of three or something like that. Freebase did this (multiple times) and I was involved in some of it. We called them “data games” and had leaderboards for who’d voted the most. We couldn’t get enough throughput, though, and sometimes by the time something had been adjudicated, another community member had edited the field on our site, thus invalidating the whole thing. We ended up paying people in developing countries to churn through these votes for us (we used ODesk, but you could use Amazon’s Mechanical Turk or whatever). However, they needed training, and weren’t cheap — even after all the work of setting up the voting queue, there was still considerable expense.
This came up quite often with Freebase because sometimes they would import from “authoritative sources” who licensed their work specially to Freebase but didn’t generally have a CC license or an open community editing process. For instance, the time when I was talking to some people from the BBC, and one (an older dude) said, “If we gave you our programme data, we wouldn’t want anyone to edit it because we are the experts on our programmes.” This was pretty silly of course — another, younger BBC dude immediately turned to him and said “Ha ha ha, I’ve got two words for you: Doctor Who.” — but sadly these situations are common when you’re dealing with closed/non-community-based/”authoritative” data sources who don’t understand the power of crowdsourcing information.
But even when dealing with compatibly CC-licensed sources with open developer communities, there can still be some problems around the “authority” of the data and how it’s attributed.
Take the case where Katie’s Plants community have spent heaps of time editing their data and are very proud of it. We import it to Growstuff, then our community looks at it and decides that bits of it are wrong and change it.
Do we leave the license link to Katie’s Plants intact? Most likely yes, because our data has theirs in its DNA, so to speak. But what if we essentially deleted all the data from there? This might happen if, for example, we’d imported a picture from Wikimedia Commons then found that the picture was incorrect or inappropriate, so we blew it away. Now we should probably remove the license note. But how do you tell when data has been completely removed as opposed to modified or built upon?
In the Katie’s Plants example, what if Katie’s high quality medicinal plant information gets mixed up with ($DEITY forbid!) low-quality data from less experienced Growstuff members or from yet another import? What implications does this have for Katie’s site and their reputation? Under the license we’re allowed to mess it up because there is no “No Derivatives” (ND) clause, but socially/culturally they’ll be pretty unhappy if we do, and we can expect some backlash.
Great news! Katie got a government grant and some fantastic press coverage, and her database has expanded enormously. We want to re-run the import. But now consider this case:
When we first imported, we put it to adjudication and found that Growstuff’s data was better, so we went with that.
Now we re-import, and Katie’s data has changed:
So of course we put it through adjudication again. The correct answer is probably a union of the two sets.
Now, Katie’s database is growing fast, and so is Growstuff. We want to do a regular import from there — perhaps monthly. But somehow along the way, we’ve ended up with different ideas of tomato colour. Every month, their data is different to ours, and we have to keep re-adjudicating the same question: what colour/s are tomatoes? Boring. Our community is tired of playing the voting game, and/or it’s costing us money with our Mechanical Turk people.
So we decide to implement a check: if nothing’s changed on either side since the last adjudication, leave it. But now we have to implement change tracking, not just on Growstuff, but on Katie’s Plants as well. We need to keep a history of changes for every site we import. This is in addition to the infrastructure we’ve had to build to automatically run imports at regular time intervals.
Obviously we have an API for people to access our data under CC-BY-SA. But keep in mind the license-chaining effect: if anyone uses data from Growstuff, they will also be constrained by the licenses of all the data sources we import. We will need to make that license information available in the API alongside our data, and make sure all our API docs and related materials explain the necessity of license chaining.
Take a look at Freebase’s Attribution Policy. They use CC-BY, but because of attribution chaining, they can’t just say that — they need a whole page with a wizard to help people figure out how to attribute something on the site. It’s incomplete, too: Freebase decided that they would only require license chaining for “content” as opposed to “facts” (a complicated issue in itself) which means images Wikipedia-based descriptions. They don’t require chained license information for other data sources. This is dubious in terms of the legality and culture of how Creative Commons works — there’s no really firm guidelines on this, but in my opinion the most moral/ethical stance is to always chain your attributions, and Freebase has chosen otherwise. In the past, this has caused some concern from the owners of other data sources that were imported to Freebase. Even Wikipedians have complained that Freebase doesn’t enforce their Wikipedia attributions strongly enough. This sort of thing can lead to reputation problems, if not legal ones.
One final complication. Various courts have ruled that “facts” aren’t copyrightable. For instance, the fact that the crop “Corn” has the scientific name “Zea mays” can’t be copyrighted. Even if you have thousands of these facts all together, they can’t be copyrighted, because they’re not a “creative work”. They’re just a statement of fact.
This actually throws the whole idea of CC-licensing collections of data into doubt. And yet we have nothing better, so we do it anyway.
Some data projects have come up with various justifications for this. For instance, Freebase says that the arrangement of the facts is a creative work — that what’s CC-licensed is their schema. That’s pretty creative in itself! The thing is, none of this has really been tested. And so most open data projects have some kind of Terms of Service which explains what they think the CC-license is for and how it’s meant to be used. These generally say, “By accessing our data via our website or API, you agree to behave as if this CC license applied to it (even if there’s not a very strong legal basis for that outside this TOS).”
The original idea of CC licenses was to stop people having to write their own terms and conditions of use for their work, and standardise in such a way that people could easily re-use creative content. Yet for data projects, we end up having to make up our own TOS just to apply a CC license, and we’re back where we started — having to peer at a bunch of legalese and figure out what the hell it means.
Of course once you get into the complexities of license chaining described above, you now also have TOS chaining — if Growstuff uses Katie’s data under their TOS, and Katie uses Hippie Herbs’ under their TOS, is Growstuff now subject to Hippie Herbs’ TOS? No idea! I am not a lawyer! I don’t want to be one! I just want to make a website about growing food!
Importing data is hard! That doesn’t mean we shouldn’t do it, but we should go into it with an awareness of the potential potholes, and carefully weigh up whether importing something is the best choice for us at any given time.
Katies Plants, Hippie Herbs, and SuperPlantDB are all made-up examples. Any resemblance to actual open data projects is coincidental. Freebase, Wikimedia Commons, and the BBC are real, though.
The joys of jobseeking
Technically, for most the last year or so since leaving Google, I’ve been unemployed. I didn’t receive unemployment benefits, though, because I didn’t really need it and because the paperwork overhead seemed higher than I was prepared to deal with. (Plus of course the periods when I was studying or overseas.) But now I’m working on Growstuff and I’d like to get onto the New Enterprise Incentive Scheme, which offers small business training and mentoring and some small amount of funding for a year while you work on your new thing. Thing is, you need to be on unemployment benefits to qualify, and so I recently applied for Newstart.
As a Newstart recipient, I’m required to search for jobs, even though my goal is to run my own business. Whatever, I can play the game. I applied for a number of jobs online, figuring I’m probably over-qualified for most of them, but it fulfils the requirements. Today I got an email back from one of them, asking me to fill in an online questionaire. Obviously, to show good faith in my “job search” I need to do this, but I have to admit that it sapped my will to live.
The first part of the process was an 80 question “IQ” test which included the following questions:
The idea that the Earth is the centre of the universe is
Such things as language, clothing, customs, color, idicate[sic]:
Which of the following is a trait of personality?
The second part of the questionaire, about the “product” of my most recent work (i.e. my year at Google) was even worse. Luckily their web app crashed and I couldn’t actually complete it.
My prospects as a trainee fleet co-ordinator seem less than stellar.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!