Let your cycles do the talking
Posted 17 Jul 2000 at 20:57 UTC by graydon
Recent startups popular power
and the process tree network
are hoping to cash in on the wide-area distributed processing model
pioneered by volunteer networks (notably distributed.net and seti@home). While the new ventures offer "cash for cycles",
by doing so they introduce difficult moral issues to potential cycle
donors.
The problem introduced by money, and by centralized organizations doling
out the work, is that cycle donors (clients) no longer choose which
effort to support solely by moral conviction. A cash reward can dilute
or even eliminate the warm fuzzies one might get from supporting
scientific or engineering computation. The situation comes to
resemble popular democracy, in which competing parties attempt to "buy
votes"
from people who would otherwise find their platforms
bland-bordering-on-offensive.
Popular Power and Process Tree Network both claim that volunteer
projects will be made available, but make no indication that they will
actually be promoting their use. Indeed, it appears to not be in their
intrest to promote non-profits, as a non-profit will most likely not be
a revenue source. Furthermore, as both client and server software appear
to be strictly proprietary, there is no reason to believe that just
because you are told that you're running influenza vaccine
optimization, that you might not really be running a
rendering job for toy story 7 or a molecular simulation of the
new ultra-foamy gilette shaving cream.
In the same way that bought votes and broken political promises give
rise to political apathy and low voter turnout, commercial distributed
processing may eliminate the sense of accomplishment people get from
their donated cycles, if they begin to suspect all their CPU time is
going to support issues they don't care about. People may simply stop
participating, or worse stop caring at all.
Nobody voted The Process Tree or Popular
Power as the decision making gods of CPU time. Just as local governments
continue to require local juristiction, specialized projects
continue to require control over security, code quality,
optimization, anonymity, payment (or lack thereof), and the ability to
make decisions about who's running what, for whom. The necessary
analogue of local
government is a
free-as-in-speech toolkit for independent groups to quickly and simply
construct wide-area distributed job processing services, in the same way
the beowulf linux cluster concept has been packaged and sold as a
supercomputer toolkit.
We haven't come this far with free software only to have a CPU
monopolist centralize the political voice of independent internet
users with a fast CPUs. Our cycles can make a serious difference to
projects with real-world goals, and we owe it to ourselves to build the
infrastructure to allow anyone to run such projects, on their
own terms.
I don't see what's wrong with these two startups, or any similar
venture, offering cash in exchange for idle cpu cycles on somebody's
super fast game machine. I may have misunderstood part of your argument,
in which case I apologize. But these startups are not trying to bully
distributed.net and seti@home out of the market. If you want to be able
to lend your cpu cycles to a purely scientific cause, you can still set
your distributed.net client to only work on the OGR project. Even the
RC5 project offers some sort of reward if you find the winning key,
doesn't it?
If someone spent a couple of thousand dollars on a machine for playing
the latest computer game and wants to earn back a few of those dollars
while he/she sleeps, then how does that make Popular Power the decision
making God of CPU time? And if one of these companies does gain a major
share of the CPU-cycle-donating population, then those users are the
ones who have "voted" for the company to be in charge of their
CPU-cycles.
I do, however, think that a generic idle-cycle sharing framework would
be pretty cool. I could definitely see applications in universities or
any organization with a lot of computers letting their idle cycles go
idle.
I've thought about this problem off and on for nearly a year. I've
reached an impass, however, at several basic needs for a general
open source (something not currently done) distributed
processing system:
- Result Authenticity - How do you verify that the client indeed ran
the algorithm? Current technique appears to be binary-only clients
with secret authentication schemes. Even these protections can be
overcome, as was shown with the fake block fiasco at Distributed.net.
So far, the only possible solution I can imagine is redundant
computation. Send each block out more than once and compare results.
It slows down your system and relies on the assumption that most of the
clients are legit.
- Client Code Safety - I imagine a system where a generic client
joins a project by downloading a code module from the result server.
How does the client know that the module does not improperly use the
client hardware? We don't want people to setup "projects" that are
really covers for DDoS attack clients.
- Client Code Authenticity - This issue was not something I had
considered until I read this editorial. How does the client know that
the code module it is running actually does what its author claims?
They say it computes weather patterns, but how do you know that?
The first still doesn't have a good answer, but the last two issues
seem to be solved by a system where the server holds modules written in
some language (preferrably optimized for mathematical computation).
The client downloads the source and compiles it on the fly. The
language is limited in such a way that memory usage is limited, and
access to other resources is denied. (This is starting to sound like a
cross between Matlab, Perl, and Java.) Then the source code can be
examined and the program has no way to exploit the system.
Of course, Mr. Meanie can now access your cliient code, "optimize" it
so it does nothing but send random blocks of data back, and boast to
his friends on IRC about how his Celeron 366 destroys everyone's Athlon
800s. Even worse, some well-meaning person could try to optimize the
client code, inadvertently break it, and send bad results back.
Does anyone have any idea on how to verify that a particular algorithm
was used to generate a block?
One possibly working idea is to send off the same "code block" to multiple client, checking that the results agree (optionally having some
sort of voting scheme). This would take processing power, but given enough clients...
There is another moral and legal problem with these processing models
in which you trade cycles for money: what if you do not own the
computer(s) on which you installed the client software? If the computer
belongs to your school or to your employer, they could probably sue you
for drawing illegal revenue from their equipment.
Most schools and employers do not mind if you run a seti@home client
on the machines that are available to you (as long as the client is not
competing with any other background processes that might be run by other
users). In fact, some of them are even supporting this actively. They
tolerate or encourage these distributed efforts because they are for a
"good cause" and because you will not make any money out of this (and if
you ever get a reward, you will share it, right?).
But if money is involved, then things look very different. There are
probably many clauses in your contract or in your country's laws that
prevent you from making money out of some equipment that was given to
you by your school or employer for a different purpose. So in this
case, it is absolutely necessary to get a prior (written) approval from
your employer, allowing you to do this. I doubt that any schools would
allow that. Maybe some companies would allow that if
they are desesperately looking for money (that would be ridiculous!) and
they ask for 90% of what you would gain.
Cosm is an open application/protocol framework for building distributed computing applications. It's the brainchild of
Adam Beberg, founder of the successful
distributed.net effort. The recently-released Client-Server SDK is supposedly functional enough to begin building real distributed applications, like the Stanford
Folding@home project.
As I understand it, Cosm has mainly been hampered by licensing issues. The goal is to use a dual licensing structure that permits academic/research applications and commercial ventures to use the code. Unfortunately, it's not quite ready yet.
Hi there, I'm the CTO of
Popular Power. I've been an
Advogato member
for awhile now, and have written a fair amount of open source software
(Hive, Swarm, and html-helper-mode, a popular emacs mode for HTML).
Thanks for the thoughtful comments, we've had many similar
discussions
here at our company. I'd like to answer a few of the points made
here.
An important part of Popular Power is that you have choice in what
your computer does. The current preferences pane has a slider between
non-profit and profit work and we're working on ways you can more
specifically allocate time to particular projects. We are committed to
supporting non-profit work. We think it's important to donate some of
the spare resources of the net to the public good. It also makes good
business sense for us: people like to help the world, our non-profit
work is one of the main reasons people choose to run our software. Our
current project (written by one of our own employees) is software
modeling to understand how to develop better flu vaccines, research
that could save millions of lives.
Graydon expresses some concern about the idea of having a centralized
company building this kind of system, but there are advantages to
having a company manage it. First off, a system like this is best when
it's large: the more participants, the more powerful. (Of course, the
whole capacity of the system doesn't have to be running just one job.)
And while the basic software is fairly simple, the operational costs
for a high quality service like this are pretty high. I think there's
an analogy to SourceForge; for many projects, it's much easier to let
someone else handle the server so you can focus on doing your work,
not running the service. We're building an open service that anyone
can use. BTW, we're not a strictly proprietary client; we are doing an
open source release of the client early next year. Many of our
employees and investors (such as Brian Behlendorf) have open source
backgrounds.
If you're interested in distributed computing, I'd encourage you to
download our preview release from http://www.popularpower.com/, try it
out, and give us feedback. We have Linux and Windows software out
there now; the Linux release is fairly new, so doesn't have a fancy
GUI to configure it yet. We've been up and running for several months
now and have already donated a huge amount of time to flu vaccine
research.
Not Just Cycles, posted 23 Jul 2000 at 19:36 UTC by linas »
(Master)
Its not just cycles that count. A distributed file system is critical
to the endevour as well. Adam Beberg of Cosm does understand that part.
The two commercial endevours don't, and this will hamper thier long-term
viability. (Curiously, the folks who work on Eternity-like projects
(Freenet, Publius, napster, Gnutella), haven't yet figured out that
they also need a Cosm /ProcessTree /popularpower type interface.)
(I've put together a screed on this issue at
http://www.linas.org/theory/eternity.html)
The original issue? Is it moral to pay for cycles? Ah, its like paying
for banner advertising. Mostly harmless, sometimes irritating.
Far, far more important is finding a way to open up and offer these
types of services to everyone, to anyone who might want some spare
computing cycles, not just to the customers of
process-tree/popular-power. It is the populist, decentralized aspect
of the www that made it wildly sucessful. It would not have happened
if netscape said 'to publish a web page, you must work through us'.
That's what AOL was doing back in 1993, and it only worked for big
customers with lots of money.
Another issue, posted 25 Jul 2000 at 14:59 UTC by Schemer »
(Apprentice)
A year or so ago my self and a friend came up with the idea of using
distributed computing to drive a company (like amny people here i
suppose), but we decided that it wouldnt work for a reason that i havent
seen mentioned here: Preventing reverse engineering of the
customer's code.
We figured that anyone interested in using our system to perform some
large computation would want a way to protect their code from being
dissasembled by the clients, and we couldnt figure out any way to
provide protection from that. It's the old security through obscurity
thing, if anyone can run the code, then a determined and skilled person
could come along figure out as much as they want about the project being
worked on. Even if you use some kind of virtual machine, it is still
possible to get at it with a dissasemble somehow, especially in linux.
Maybe we were being too paranoid, but i think that this is a major
concern.