When you have a list of n elements and want to pick m, you want each element to have a probablilty m/n of being picked. Walking the list and picking elements where each element's probability of being picked is m/n doesn't work, because you're not guaranteed to end up with m elements.
However, if you pick the first element (with probability m/n), you've changed the problem to picking m-1 elements out of the remaining list of size n. And if you don't pick the first element, you've changed the problem to picking m elements out of the remaining list of size n-1. Pseudocode, recursive:
pickFromList(list L,
int nWanted, /* number of elements you want to pick */
int nItems) /* size of the list */
{
if(nWanted == 0){
return;
}else if(nWanted == nItems){
pick every element in L;
}else{
if(drand48() < nWanted/nItems){ /* drand48 gives Uniform(0.0, 1.0) */
pick(L->data);
pickFromList(nWanted-1, nItems-1, L->next);
}else{
pickFromList(nWanted, nItems-1, L->next);
}
}
}
Pseudocode, iterative:
pickFromList(list L, int nWanted, int nItems)
{
int nPicked, nSeen;
nPicked=0;
for(nSeen=0, p = L; p != NULL; p = p->next, nSeen){
if(drand48() < (nWanted-nPicked)/(nItems-nSeen))
pick(p);
nPicked++;
}
}
}
(probably a few off by ones in there)
Handwaving to show that the above works:
pick 2 out of list of 5 elements: a, b, c, d, e
ac = the event that a is chosen
bc = the event that b is chosen
etc.
A = probability that a is chosen
B = probability that b is chosen
etc.
A = 2/5
B = 1/4 if ac = 1/4 * 2/5 + 2/4 * 3/5 = 2/20 + 6/20 = 2/5
2/4 if !ac
P(a chosen & b chosen) = P(bc if ac) * A = 1/4 * 2/5 = 2/20
P(a chosen & ! b chosen) = P(!bc if ac) * A = 3/4 * 2/5 = 6/20
P(! a chosen & b chosen) = P(bc if !ac) * (1-A) = 2/4 * 3/5 = 6/20
P(! a chosen & ! b chosen) = P(!bc if !ac) * (1-A) = 2/4 * 3/5 = 6/20
C = 0 if ac & bc = 0 = 2/5
1/3 if ac & !bc + 6/20 * 1/3
1/3 if !ac & bc + 6/20 * 1/3
2/3 if !ac & !bc + 6/20 * 2/3
etc...
Oh, if you're annoyed at netscape/mozilla/whatever text input things, lynx allows you to invoke an external editor for textareas. I know it has security holes, but I know of no other text browsers with that feature (or that allow turning off table and frame rendering. sure, it's fine for some sites, but others' table-layout gets mangled beyond recognition, and it would be nice to actually be able to read the content). Maybe someday I'll take lynx, links, or w3m and add/remove stuff until I get my ideal web browser..
On the Worldforge front, the Atlas protocol library people have apparently been doing "cool" stuff, so I'm holding back on client work until that stabilizies. In the meantime, I've written a nntp news reading cgi (haven't done threads yet. not exactly sure how threaading works, actually.) and did a uml diagram which I'm sure would be a source of amusement and/or confusion to someone who actually knows about uml than what can be learned from the tooltips in dia. I've also been working on a paper about character communities and implicit knowledge sharing.. might end up using mod_virgule to test out some of my ideas.
Ok, that's all. The machine I'm typing this on is promising to reboot in 5 minutes.
I agree with mbp, sometimes the diaries seem more like yelling into a huge room. Of course, you can't see into it, and the only reason to think anyone's listening is that occasionally, someone shouts back. Interesting how people use their diaries instead of the article comments to discuss articles, too.
Maybe if advogato had sections for articles, like "Technical", "Philosophy", or "Social", there would be less complaining and anti-certifying about off-topicness.
While on the topic of meta-osfs[1], I think it would be interesting to document how specific large open projects (more than a few dozen dedicated developers/contributors) work in terms of coordination and project management. This kind of thing seems to be reinvented by the individual projects. While, obviously, something that works for one project doesn't necessarily work for another, sharing insights and experiences might be helpful, and just interesting in its own right for the armchair anthropologists. I suppose everyone who knows is too busy working on their respective projects, though.
Does anyone actually read the TOS for Sourceforge? IANAL, but some of the stuff looks a little strange, like 6a, which covers "any Content" that is "harmful", "vulgar", "abusive", or "otherwise objectionable". Does stuff like "Hey, you suck", or "This code is a POS" count? And 6i, which states that you can't "cause a screen to 'scroll' faster than other users of the Service are able to type". When debugging stuff on IRC, I often paste a few screenfuls of logs or compiler output, definitely "faster than other users of the Service are able to type". Am I allowed to do that? Actually, I'd say any heated discussion in realtime with more than a few participants goes faster than I can type (in that I wouldn't be able to log the conversation by typing it. It doesn't seem terribly well defined. (does "faster than other users are able to type have a legal definition?)). How about 6h? I can't "transmit any material that contains ... files or programs designed to interrupt destroy or limit the functionality of any computer software...". When doing software devlopment, there are often legitimate reasons to do so, such as posting "Hey, I crashed $PROGRAM, here's how", or "I saw an exploit for $PROGRAM on bugtraq, we should fix it". Note that this isn't even restricted to public forums, so you technically can't even do this on, say, a private developer mailing list. I hope I'm just being unreasonably paranoid.
[1] Open Source and/or Free Software, because I don't want to make a distinction between them in contexts where there is none.
Web site's finally back up today. I actually had a few people email me about it over the weekend, which hadn't happened before (I suppose they just read the website..). I never knew people outside the project had actually heard about us. I suppose my indiscriminate meme scattering is working <Laugh Role="diabolical">
We've decided to use DocBook for our whitepapers and the Circe ruleset, so I've been trying to learn it, mostly from examples. I've pretty much got the basics down, so now I'm trying to figure out how to get the rest of the project to learn how to write it. Docbook: TDG seems a bit too deep for those unfamiliar with SGML, even hackers, and Get Going With DocBook: Notes for Hackers is great, but the sections containing only the word "dude" detract from it just a bit. I looked at the LDP's Using DocBook HOWTO, and didn't particularly like it either. I took a look at the OSWG's PID project, which looks like just what I need, but it's not done yet (and doesn't look like it's very active). There has got to be a good introduction to writing DocBook for the intelligent SGML-newbie. Hasn't there?
Advogato
Looked at the site in color for the first time (since lynx is just better when one actually wants to write something of any length. btw, anyone know if mozilla will allow spawning an external editor for textareas? I'd do it myself, but mozilla seems too... huge to get into for something so small). Nifty. One interesting thing I just noticed is that the comments and articles are mostly posted by Journeyers, wheras the diary entries are evenly split among Master, Journeyer, and Apprentice. I suppose the Apprentices are somewhat intimidated. Not sure if that's good, bad, or none of the above.
Enough rambling about that. But life sucks anyway, and I don't feel like elaborating on that in public.
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!