Advogato: Let your cycles do the talking

Posted 17 Jul 2000 at 20:57 UTC by graydon

Recent startups popular power and the process tree network are hoping to cash in on the wide-area distributed processing model pioneered by volunteer networks (notably distributed.net and seti@home). While the new ventures offer "cash for cycles", by doing so they introduce difficult moral issues to potential cycle donors.

The problem introduced by money, and by centralized organizations doling out the work, is that cycle donors (clients) no longer choose which effort to support solely by moral conviction. A cash reward can dilute or even eliminate the warm fuzzies one might get from supporting scientific or engineering computation. The situation comes to resemble popular democracy, in which competing parties attempt to "buy votes" from people who would otherwise find their platforms bland-bordering-on-offensive. Popular Power and Process Tree Network both claim that volunteer projects will be made available, but make no indication that they will actually be promoting their use. Indeed, it appears to not be in their intrest to promote non-profits, as a non-profit will most likely not be a revenue source. Furthermore, as both client and server software appear to be strictly proprietary, there is no reason to believe that just because you are told that you're running influenza vaccine optimization, that you might not really be running a rendering job for toy story 7 or a molecular simulation of the new ultra-foamy gilette shaving cream.

In the same way that bought votes and broken political promises give rise to political apathy and low voter turnout, commercial distributed processing may eliminate the sense of accomplishment people get from their donated cycles, if they begin to suspect all their CPU time is going to support issues they don't care about. People may simply stop participating, or worse stop caring at all.

Nobody voted The Process Tree or Popular Power as the decision making gods of CPU time. Just as local governments continue to require local juristiction, specialized projects continue to require control over security, code quality, optimization, anonymity, payment (or lack thereof), and the ability to make decisions about who's running what, for whom. The necessary analogue of local government is a free-as-in-speech toolkit for independent groups to quickly and simply construct wide-area distributed job processing services, in the same way the beowulf linux cluster concept has been packaged and sold as a supercomputer toolkit.

We haven't come this far with free software only to have a CPU monopolist centralize the political voice of independent internet users with a fast CPUs. Our cycles can make a serious difference to projects with real-world goals, and we owe it to ourselves to build the infrastructure to allow anyone to run such projects, on their own terms.

I don't see what's wrong with these two startups, or any similar venture, offering cash in exchange for idle cpu cycles on somebody's super fast game machine. I may have misunderstood part of your argument, in which case I apologize. But these startups are not trying to bully distributed.net and seti@home out of the market. If you want to be able to lend your cpu cycles to a purely scientific cause, you can still set your distributed.net client to only work on the OGR project. Even the RC5 project offers some sort of reward if you find the winning key, doesn't it?

If someone spent a couple of thousand dollars on a machine for playing the latest computer game and wants to earn back a few of those dollars while he/she sleeps, then how does that make Popular Power the decision making God of CPU time? And if one of these companies does gain a major share of the CPU-cycle-donating population, then those users are the ones who have "voted" for the company to be in charge of their CPU-cycles.

I do, however, think that a generic idle-cycle sharing framework would be pretty cool. I could definitely see applications in universities or any organization with a lot of computers letting their idle cycles go idle.

I've thought about this problem off and on for nearly a year. I've reached an impass, however, at several basic needs for a general open source (something not currently done) distributed processing system:

Result Authenticity - How do you verify that the client indeed ran the algorithm? Current technique appears to be binary-only clients with secret authentication schemes. Even these protections can be overcome, as was shown with the fake block fiasco at Distributed.net. So far, the only possible solution I can imagine is redundant computation. Send each block out more than once and compare results. It slows down your system and relies on the assumption that most of the clients are legit.
Client Code Safety - I imagine a system where a generic client joins a project by downloading a code module from the result server. How does the client know that the module does not improperly use the client hardware? We don't want people to setup "projects" that are really covers for DDoS attack clients.
Client Code Authenticity - This issue was not something I had considered until I read this editorial. How does the client know that the code module it is running actually does what its author claims? They say it computes weather patterns, but how do you know that?

The first still doesn't have a good answer, but the last two issues seem to be solved by a system where the server holds modules written in some language (preferrably optimized for mathematical computation). The client downloads the source and compiles it on the fly. The language is limited in such a way that memory usage is limited, and access to other resources is denied. (This is starting to sound like a cross between Matlab, Perl, and Java.) Then the source code can be examined and the program has no way to exploit the system.

Of course, Mr. Meanie can now access your cliient code, "optimize" it so it does nothing but send random blocks of data back, and boast to his friends on IRC about how his Celeron 366 destroys everyone's Athlon 800s. Even worse, some well-meaning person could try to optimize the client code, inadvertently break it, and send bad results back.

Does anyone have any idea on how to verify that a particular algorithm was used to generate a block?

One possibly working idea is to send off the same "code block" to multiple client, checking that the results agree (optionally having some sort of voting scheme). This would take processing power, but given enough clients...

There is another moral and legal problem with these processing models in which you trade cycles for money: what if you do not own the computer(s) on which you installed the client software? If the computer belongs to your school or to your employer, they could probably sue you for drawing illegal revenue from their equipment.

Most schools and employers do not mind if you run a seti@home client on the machines that are available to you (as long as the client is not competing with any other background processes that might be run by other users). In fact, some of them are even supporting this actively. They tolerate or encourage these distributed efforts because they are for a "good cause" and because you will not make any money out of this (and if you ever get a reward, you will share it, right?).

But if money is involved, then things look very different. There are probably many clauses in your contract or in your country's laws that prevent you from making money out of some equipment that was given to you by your school or employer for a different purpose. So in this case, it is absolutely necessary to get a prior (written) approval from your employer, allowing you to do this. I doubt that any schools would allow that. Maybe some companies would allow that if they are desesperately looking for money (that would be ridiculous!) and they ask for 90% of what you would gain.

Cosm is an open application/protocol framework for building distributed computing applications. It's the brainchild of Adam Beberg, founder of the successful distributed.net effort. The recently-released Client-Server SDK is supposedly functional enough to begin building real distributed applications, like the Stanford Folding@home project.

As I understand it, Cosm has mainly been hampered by licensing issues. The goal is to use a dual licensing structure that permits academic/research applications and commercial ventures to use the code. Unfortunately, it's not quite ready yet.

Hi there, I'm the CTO of Popular Power. I've been an Advogato member for awhile now, and have written a fair amount of open source software (Hive, Swarm, and html-helper-mode, a popular emacs mode for HTML).

Thanks for the thoughtful comments, we've had many similar discussions here at our company. I'd like to answer a few of the points made here.

An important part of Popular Power is that you have choice in what your computer does. The current preferences pane has a slider between non-profit and profit work and we're working on ways you can more specifically allocate time to particular projects. We are committed to supporting non-profit work. We think it's important to donate some of the spare resources of the net to the public good. It also makes good business sense for us: people like to help the world, our non-profit work is one of the main reasons people choose to run our software. Our current project (written by one of our own employees) is software modeling to understand how to develop better flu vaccines, research that could save millions of lives.

Graydon expresses some concern about the idea of having a centralized company building this kind of system, but there are advantages to having a company manage it. First off, a system like this is best when it's large: the more participants, the more powerful. (Of course, the whole capacity of the system doesn't have to be running just one job.) And while the basic software is fairly simple, the operational costs for a high quality service like this are pretty high. I think there's an analogy to SourceForge; for many projects, it's much easier to let someone else handle the server so you can focus on doing your work, not running the service. We're building an open service that anyone can use. BTW, we're not a strictly proprietary client; we are doing an open source release of the client early next year. Many of our employees and investors (such as Brian Behlendorf) have open source backgrounds.

If you're interested in distributed computing, I'd encourage you to download our preview release from http://www.popularpower.com/, try it out, and give us feedback. We have Linux and Windows software out there now; the Linux release is fairly new, so doesn't have a fancy GUI to configure it yet. We've been up and running for several months now and have already donated a huge amount of time to flu vaccine research.

Not Just Cycles, posted 23 Jul 2000 at 19:36 UTC by linas »

Its not just cycles that count. A distributed file system is critical to the endevour as well. Adam Beberg of Cosm does understand that part. The two commercial endevours don't, and this will hamper thier long-term viability. (Curiously, the folks who work on Eternity-like projects (Freenet, Publius, napster, Gnutella), haven't yet figured out that they also need a Cosm /ProcessTree /popularpower type interface.)

(I've put together a screed on this issue at http://www.linas.org/theory/eternity.html)

The original issue? Is it moral to pay for cycles? Ah, its like paying for banner advertising. Mostly harmless, sometimes irritating. Far, far more important is finding a way to open up and offer these types of services to everyone, to anyone who might want some spare computing cycles, not just to the customers of process-tree/popular-power. It is the populist, decentralized aspect of the www that made it wildly sucessful. It would not have happened if netscape said 'to publish a web page, you must work through us'. That's what AOL was doing back in 1993, and it only worked for big customers with lots of money.

Another issue, posted 25 Jul 2000 at 14:59 UTC by Schemer »

A year or so ago my self and a friend came up with the idea of using distributed computing to drive a company (like amny people here i suppose), but we decided that it wouldnt work for a reason that i havent seen mentioned here: Preventing reverse engineering of the customer's code.

We figured that anyone interested in using our system to perform some large computation would want a way to protect their code from being dissasembled by the clients, and we couldnt figure out any way to provide protection from that. It's the old security through obscurity thing, if anyone can run the code, then a determined and skilled person could come along figure out as much as they want about the project being worked on. Even if you use some kind of virtual machine, it is still possible to get at it with a dissasemble somehow, especially in linux.

Maybe we were being too paranoid, but i think that this is a major concern.

Let your cycles do the talking

Posted 17 Jul 2000 at 20:57 UTC by graydon

What's so bad about cash for cycles?, posted 17 Jul 2000 at 23:00 UTC by prash » (Apprentice)

A General Parallel Processing Framework, posted 18 Jul 2000 at 01:43 UTC by volsung » (Journeyer)

Client result verification, posted 18 Jul 2000 at 08:02 UTC by ingvar » (Master)

What if you do not own the CPU?, posted 18 Jul 2000 at 08:33 UTC by Raphael » (Master)

Cosm might be a solution, posted 19 Jul 2000 at 18:16 UTC by ian » (Apprentice)

Some thoughts from the Popular Power CTO, posted 19 Jul 2000 at 19:18 UTC by nelsonminar » (Master)

Not Just Cycles, posted 23 Jul 2000 at 19:36 UTC by linas » (Master)

Another issue, posted 25 Jul 2000 at 14:59 UTC by Schemer » (Apprentice)