GitTorrent, The Movie

Posted 4 Dec 2008 at 15:56 UTC (updated 4 Dec 2008 at 18:22 UTC) by lkcl Share This

Git promises to be a distributed software management tool, where a repository can be distributed. Yet, the mechanisms used to date to actually "distribute", such as rsync, http and ssh, are very much "single path" and centralised.

GitTorrent makes Git truly distributed. The initial plans are for reducing mirror loading, however the full plans include totally distributed development: no central mirrors whatsoever. PGP signing and other web-of-trust-based mechanisms will take over from protocols on ports (e.g. ssh) as the access control "clearing house".

The implications of a truly distributed revision control system are truly staggering: unrestricted software freedom; the playing field is levelled in so many ways, as "the web site" no longer becomes the central choke-point of control. This article will explain more fully some of these implications, not only from a technical perspective but also including the political implications for Software Freedom.

What is GitTorrent?

From the gittorrent page: The GitTorrent Protocol (GTP) is a protocol for collaborative git repository distribution across the Internet.

Straight from the homepage, as it's put so succinctly:

GitTorrent is a first step towards applying decentralizing Peer to Peer concepts to Git. If you decentralize the download layer, it's just another small step before you decentralize the push rights and tie it to a web of trust such as PGP, and then you don't actually need discrete mirror sites. Every mirror can track the git repositories the owners want it to carry, and those authorized to sign updates can make signed updates to push the cloud forward. Your local mirror can become a one-stop git push and pull stop depot, and the source code is preserved in many more places, increasing resilience, availability and download performance for all.

(Hurrah. Wouldn't it be nice if money and real goods could be exchanged and distributed as easily, we'd be living in paradise...)

Why is GitTorrent so important?

The possibilities that GitTorrent opens up are just mind-blowing. Here are a few:

  • Imagine that an entire project - its web site, documentation, wiki, bug-tracker, source code and binaries are all managed and stored in a peer-to-peer distributed git repository.
    • To view the web site, you either go to the main site, http://web-site.org, or, if you are offline or want faster access, you go to the locally checked out copy.
    • To read the documentation, you likewise either go to the main site or you go the locally checked out copy.
    • To contribute to the wiki, you either go the main web site, or you edit the local pages and git push them to the cloud.
    • To report a bug, you either go to the main web site, or you run your own local web server that duplicates the bug database, and, once you've reported it, the submit ends up with the bugreport being in the locally checked out git repository which, on a push, gets uploaded into the cloud - and ends up on the main web site.
    • To contribute to the project's source, you already understand that you do local edits and then git push them.
    • To upgrade to the latest release, you do a git checkout (of the binaries). They're digitally signed; they're pulled not from "a mirror", they're pulled from gittorrent peers. Only the components that you don't already have will be pulled. Documentation need not be separately included with the binary distribution because that can already be obtained direct from the source repository.
  • Imagine that you want to fork a project, but you feel intimidated doing so because of the "centralisation". You have no control over the "central web site".
    • With GitTorrent-distributed projects, there is no "central" web site: the PGP keys are far more important.
    • Abandoned projects can easily be revived, through a simple process of a new developer announcing their PGP public key identity, and for Users to start pulling in code that's tagged with that PGP key.
    • Users will be able to decide whom to trust based on who contributes, not on who controls the project's web site.
  • Imagine what would happen if you made a git-based filesystem on top of a distributed GitTracker repository, and a Linux Distribution was actually placed into the Git Repository.
    • In combination with automount, there would be no more "downloading" and "installing" you would simply endeavour to run an application and, on finding that the application did not exist, the git-based filesystem would automatically go hunting, through the GPG-Digitally-signed peer-to-peer cloud, looking for the binary.
    • Upgrades would be a matter of "git checkout -b debian-testing; git pull"

Here's the very strange thing about all of these idea: they are not new; they are all in development, or exist in one form or another - they just haven't been tied in behind GitTorrent. yet.

GPG-signed distributed distribution

git tag provides the means to digitally-sign a release. It's therefore possible to make GitTorrent aware of this by specifying whose GPG keys you trust, as part of the "pull" process. update-hook in tracker shows the principle (using the cogito command, cg tag).

Keynote for advanced trust infrastructure management

KeyNote, aka RFC2704, allows access control rules to be digitally signed. Integration of KeyNote into git would provide a formal language for pulling git repositories from people that you trust - or, specifically from groups of people that are trusted.

GPG signatures go onto git tags in an RFC922-compliant fashion: there can thus be multiple such signatures: the initial person who created the patch; the lieutenant who signed it off; Linus himself; the Distribution maintainer and finally the package maintainer. At each stage, the use of a KeyNote formally-specified "gateway", written into a file that itself is digitally signed, is an automated double-check on where the source code, the wiki content, the bug and the binaries will end up being pulled or pushed, across the cloud.

The alternative is to have shell-scripts, as git hooks, that hard-code the people who must GPG sign a tag before it can be distributed: that just gets messy, and it should be clear that KeyNote is a much better tool for the job.

Distributed Wiki

IkiWiki is a Wiki where the original wiki content is stored in a repository, and, in the case of git, hooks can be executed to turn the wiki pages into HTML. That's all very well-known.

What happens when GitTorrent is thrown into the mix is very exciting: Wiki-based documentation becomes decentralised. Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers.

Distributed Bugtracker

dist-bugs is a project to design a worldwide globally-useable format (strictly: microformat) for bug tracking. The underlying transport is not part of the specification, as the microformat is generic enough to be transferred over anything.

Imagine dist-bugs being stored in a GitTorrent-backed distributed wiki or other web server. In this way, the bug database could be used for offline work as well as online work. And, thanks to the combination of dist-bugs and GitTorrent, bugs would be world-wide globally unique, GPG-Digitally-signed, version-trackable (one distro has the bug listed as fixed and another independent linux distro has it as still open) - it's just an incredibly powerful combination.

Distributed Linux Distribution

vcs-pkg has as its goal:

The aim of the vcs-pkg project is to investigate the use of version control for distro package maintenance. We bring together people interested in taking the next step in distro package maintenance: the proper integration of version control into the package maintenance workflow.

An earlier advogato article, Distributed Debian Distribution Development discusses how debian's packages can be peer-to-peer distributed, and vcs-pkg is a generalisation of the issues involved.

It goes without saying that the binary distribution is not the only part that needs to be distributed, but it is a big part of the picture.

Whilst DDDD advocated that projects such as debtorrent and apt-p2p would help with debian binary package distribution, vcs-pkg with GitTorrent as the underlying transport would be much more powerful, as it would allow anyone to create their own Linux - or FreeBSD - or other software - distribution, based on top of existing packages.

Branching a distribution: git checkout -b ubuntu-8.1-custom

Suddenly, creating a major overnight runaway successful distribution no longer needs the resources of a corporate-backed RedHat or even the charity-backed Debian: anyone could start a distribution themselves, and it would automatically be peer-to-peer replicated.

If the GitTorrent-backed Debian Distribution concept had existed at the time, ubuntu would not need to have forked and copied the entire debian codebase / repository at the time. Debian users who wanted to try out Ubuntu could have done so with a single command such as "git checkout -b ubuntu".

Root-mounted Git Filesystem

GitFS is a FUSE (File System in User Space) plugin that allows a Git repository to be accessed as a mounted filesystem. Although it is read-only at present, that is more than enough for the required purpose.

Imagine running an entire Linux (or FreeBSD) distribution off of a GPG-digitally signed GitTorrent peer-to-peer distributed binary repository. That's a long sentence with a hell of a lot of buzzwords. The implications are that there would no longer need to be binary mirrors, and, as long as one person in the swarm still has an application that's needed locally, everyone else can automatically get it, too.

Distributed automated Backups

Many developers check their home directory into a git repository, using it as a backup mechanism. Imagine what then happens when GitTorrent is added to the mix: a group of developers could set up a peering arrangement where they make automated distributed backups of each others' computers.

An entirely old business model becomes new and easy: providing backups for linux n00bs and linux gurus alike becomes a matter of doing regular git pulls onto Amazon EC2 cloud machines...

Political and Free Software Freedom implications

It's worth explicitly spelling out the significance of the use of GitTorrent for Free Software development, as outlined above.

  • Freedom from political interference. A government or an organisation decides that it doesn't want free software to be used, as it undermines their ability to exert "control". By going fully distributed, the only way for a government or an organisation - covert or otherwise - to exert any influence or "control" is, just like anyone else, through the GPG-digitally-signed web of trust (such as the Debian one). In this way, the only influences that can be exerted are publically accountable influences. Democracy with mathematically backed teeth.
  • Freedom from project maintainer manipulation. If a project maintainer becomes manipulative or is manipulated, to exert a negative influence on a project, users can simply shunt them aside, by setting up a new list of GPG keys from whom they will trust to receive patches and updates. Even the web site content can be forked. The only thing that can't be forked is the web site domain name - but, as has been mentioned previously, distributed peer to peer dns takes care even of that. There's even an implementation of a distributed dns system.
  • Freedom for Governments to fork entire distributions. Many governments - especially those in emerging markets and the third world - find it difficult to adopt a particular linux distribution, on the basis that they find the corporate sponsors (Redhat, Novell) distasteful and untrustworthy. Whilst some free software developers may find this to be upsetting, being upset about it doesn't make the problem go away. However, allowing a country to fork an entire distribution does make the problem go away, as it allows that country - that government - to issue their own GPG keys for their own distribution. The other nice thing is that their contributions should (unless special effort is made to ensure that they don't) automatically find their way back into the GitTorrent cloud, digitally-signed and easily identifiable, just like everyone else's.
  • Freedom from resource limitations. As the entire free software development process - documentation, ideas, source, bugs and binaries - is distributed, suddenly the only limitation on the distribution and deployment on a useful idea is ... well... it's hard to think of one. Even network bandwidth should not be a problem, as entire repositories could be git-cloned onto CDs, DVDs or memory sticks and communicated by postal service to a location with better bandwidth.
  • Freedom from SPAM. With the entire infrastructure GPG signed, the possibility of individuals posting GPG-signed SPAM becomes... somewhat moot. If it were to happen (to an otherwise trusted user), it would indicate that a user's computer had been compromised, and that they were stupid enough to not keep their GPG private key physically separated (USB key) from their computer. Slapped wrists all round, but nothing remotely like the present situation.

Flies in the ointment

Here is a list of technical challenges that need to be overcome - to get from here to there:

  • GitTorrent needs attention - and time and money are the ways to do that!
  • Git tagging in a single git repository is a "global" operation. For GitTorrent to work as a binary distribution mechanism, as things stand at present there are a couple of options.
    • One is to split packages down into separate git repositories - one GitTorrent repository per package, and then have a "top level" git which lists the seeds/trackers from which individual packages can be found. In "debian / apt" terminology, there would be one git repository containing Releases.gz and Packages.gz etc.
    • Another option is for git itself to be enhanced so that it can "tag" portions of a tree, not the entire tree.
  • dist-bugs is a protocol with (as best can be determined) no released implementation, as of yet. However, there are some perfectly good distributed bug tracking systems - all of which can use git:
  • GitTorrent - and BitTorrent - do not have "search" mechanisms, unlike eMule and mlDonkey. the eMule protocol provides a DHT-based search mechanism as part of the peer-to-peer distribution: bittorrent completely lacks such mechanisms, and so searching through packages or source code would require downloading of the entire codebase or package base (somewhere). This has to be properly addressed, by augmenting GitTorrent. Versions of BitTorrent from bittorrent.com have been enhanced, as can be seen in the full source code, to include a DHT search algorithm.

Conclusion

From a simple, simple project that is suffering from an inexplicable near complete lack of attention from the free software community comes a revolutionary change in the way that free software is developed and distributed. A previous article made it clear the scale of the issues that just Debian on its own faces, and if Linux ever takes off from its current market share to mainstream, much of the infrastructure that's currently taken for granted is going to collapse.

GitTorrent isn't a complete panacea; it isn't a completely utopian idealistic piece of non-existent airhead software, either, because it's real. It's been developed "because it can be". It's just that the implications of its deployment really haven't been fully uncovered. Those that have been discussed here are pretty monumental.

Piece by piece, free software is inexorably getting its act together. GitTorrent is just another bit of the puzzle...

(Note: this article continues the Tech Fusion series.)


Ticgit, posted 4 Dec 2008 at 20:12 UTC by lkcl » (Master)

ticgit - thank you to the slashdot reader who found this additional example of a git-distributing bug-tracking system.

Git over Freenet fproxy, posted 4 Dec 2008 at 20:26 UTC by lkcl » (Master)

Git over fproxy

a link to another interesting idea: that of using freenet to host the project. exactly how this would work (or what features it would provide) isn't clear to me, but i'm adding it here for completeness, in case anyone's interested.

Git over Freenet fproxy: explanation, posted 4 Dec 2008 at 20:37 UTC by lkcl » (Master)

slashdot comment explaining a bit more about git over fproxy, and its limitations.

Wonderful idea, posted 13 Dec 2008 at 15:30 UTC by DeepNorth » (Journeyer)

This is one of those things that is simple and powerful and good on its face. I love it.

I think GitTorrent is a neat idea, posted 29 Dec 2008 at 00:30 UTC by Omnifarious » (Journeyer)

But I really wish people would stop acting like git was the only DVCS. I don't like it. I don't want to use it. I avoid projects that do use it. It is not an enjoyable or fun DVCS to use.

sledgehammer, posted 30 Dec 2008 at 15:17 UTC by lkcl » (Master)

yeahhhh omnifarious i know what you mean - it's taken me about three to four months to gain confidence with git, but i persisted, despite finding cvs and svn trivial to get my head round, because of its design and its features.

cvs and svn, as they are server-based, actually risk you the loss or destruction of work, due to not allowing you to commit work before performing an update! when you perform an update, you end up corrupting your source code by having merge-conflicts _overwrite_ your uncommited, un-backed-up work!

bzr, mecurial and the others - they're just... *hand-waving*... why??

the other siignificant advantages of git are its extensibility - the fact that the main "git" command is just an exec() runner - and the fact that it's backed by the linux kernel developers is of exxtreeeeme importance, to guarantee its reliability, features and success.

overall, then, "fun" or "enjoyability" doesn't really come into it: amazing amounts of respect for its insane ambitious design and its heritage count far more in my book than whether it's "easy to use".

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page