Older blog entries for schoen (starting at number 144)

I posted a diary entry earlier today which contains a proof of the arithmetic mean/geometric mean inequality and some comments on solar thermal power plants and other stuff. You can look at that earlier diary entry, if you'd like (it's still there). I don't like editing older diary entries, if I can avoid it.

My friend Eric sent me some Moxie! I've got Moxie!

Pseudonym: a very interesting problem. Hmmmm....

By the way, is there such a thing as a "logarithmic mean"? I'd look at MathWorld, but it's subject to "downtime by law" right now.

ahosey, 8-bit bytes have 256 possibilities each. So n bytes represent 256^n possibilities. (I slipped up with exponents writing about that last week... thanks, Jamie.)

OK, so if you have 256^n possibilities and you want to write them in decimal digits, just remember that writing them in an alphabet of 10 symbols will require a total of log(256^n)/log(10)=n log(256)/log(10)=2.408n symbols; if you must use a whole number of symbols, and without loss of generality for any n, you'd better use 3n of them.

That number 2.408... is also equal to 8/log_2(10), which is the quantity that would give you the same result if you'd phrased the question as "how many digits do we need to use to write a number with n bits" and then multiplied by 8 to convert that answer from bit-units to byte-units.

The general formula is ceil(log_b(a)) or equivalently ceil(log(a)/log(b)) to give you the largest number of symbols in an alphabet of size b needed to code for a symbol from an alphabet of size a. So if you prefer, in the specific application, n*ceil(log_b(a)) is the largest number of digits possible in an n-digit base a number after it's been converted to base b. n*log_b(a) is the average number of digits, but we have to be careful about what kind of "average" that is, which reduces this to the previous conversation. :-)

OK, if you want a more rigorous proof, you need to show that place value representations always use each symbol with equal probability -- which can be phrased in other ways. The log-ratio thing is stolen from information theory, although there are many different ways to derive it.

Actually, it suffices to show that in base b, there are exactly b^n numbers, counting from zero, whose representations are not longer than n digits (alternatively, "shorter than the representation of b^n", since the representation of b^n in base b has exactly n+1 digits, and b^n is the smallest number whose representation has at least n+1 digits). This is reasonably easy to show, because there are b^n numbers from 0 to (b^n)-1, since (b^n)-1+1=(b^n)-1. See, I told you it was easy!

Given that there are b^n numbers, counting from zero, whose representations are not longer than n digits in base b, we are justified in using that logarithm rule above, even if we don't do any information theory or probability stuff.

We can also see from information theory that where log_b(a) is not an integer, there will sometimes be more than one possible choice in the coding, so that we can choose to store additional information if we want. For example, if you have 3 decimal digits but you know that you're only writing an 8-bit number, you can say that the most significant digit should be interpreted modulo 3; then we actually have access to a spare bit or more to play with in coding each time! (There is even more information than that available some of the time; I'll leave it as an exercise for someone to propose an encoding of 8-bit numbers in 3-digit numbers that yields at least 1 bit free all the time and an average of more than 2 bits free in general. Of course, it's not really a matter of "propose" but of "show how to make use of".)

Skip to the end of my diary entry, if you'd like.

Arithmetic means and geometric means

Pseudonym, thanks. With your hint, I got an inductive proof.

(All variables are always positive here. We can't necessarily take geometric means of negative numbers, at least if we want to stay in the reals and have a unique solution to everything and ensure that all magnitudes are comparable.)

Here we go: first, to show that, for any two numbers a and b, GM(a,b)<=AM(a,b).

This is equivalent to (ab)^(1/2)<=(a+b)/2, or ab<=((a+b)^2)/4, or 4ab<=a^2+2ab+b^2. Now, that's equivalent to 2ab<=a^2+b^2, which is equivalent to 0<=a^2-2ab+b^2, or 0<=(a-b)^2, which is clearly true. These steps can be reversed to show that GM(a,b)<=AM(a,b). (Note that GM(a,b)=AM(a,b) when, and only when, 0=(a-b)^2, that is, when 0=a-b, or b=a.)

Next, to show that if GM({s=any n numbers})<=AM({s}), then GM({t=any 2n numbers})<=AM({t}).

We observe that GM({s,s'})=GM(GM({s}),GM({s'}) where {s} and {s'} are sets of the same size, and AM({s},{s'})=AM(AM({s}),AM({s'}). This is very easy to see from the definitions of AM and GM. So, if GM({s})<=AM({s}) whenever {s} has exactly n elements, then we can break any set {t} of 2n elements into a pair of sets {s} and {s'} of n elements each (arbitrarily). Then GM({s})<=AM({s}) and also GM({s'})<=AM({s'}). Also, GM(a,b)<=AM(c,d) if a<=b and c<=d (since GM(c,d)<=AM(c,d) and making any element in a set smaller makes the mean, geometric or arithmetic, of that set smaller too). Therefore, GM({s},{s'})<=AM({s},{s'}).

Next, to show that if, for any set {s}, some number x<=AM({s},x), then x<=AM({s}). If x<=AM({s},x), then if {s} contains (n-1) elements, nx<=(n-1)AM({s})+x (since the sum of the elements of {s} equals (n-1)AM({s}). Then nx-x<=(n-1)AM({s}), or (n-1)x<=(n-1)AM({s}), or x<=AM({s}).

Finally, given that GM({t})<=AM({t}) for any set t of n elements, to show that GM({s})<=AM({s}) for any set s of n-1 elements. Let {s} be a given set of n-1 elements. Then take {t} as the set {s} with the addition of the new element GM({s}). Then we have {t} with n elements, so that GM({t})<=AM({t}), by hypothesis. Now GM({s},GM{s})<=AM({s},GM{s}). But clearly GM({s},GM{s})=GM({s}), since adding a new element to a set equal to the mean of the set will not change the mean. Therefore, GM({s})<=AM({s},GM{s}). But we have shown above that this implies that GM({s})<=AM({s}).

By induction, then, the geometric mean of any set is less than or equal to the arithmetic mean of that set.

I still wonder if my binomial thing would work out; there's so much factoring after the binomial expansion. It gets really messy, and it's hard to see whether or not it's going anywhere.

Solar thermal power generation

I spent a lot of time on Sunday wondering about and then reading and writing about solar thermal power plants, an alternative to photovoltaic cells for solar power. If you have lenses or mirrors to focus the sun's rays on a particular point, and you have a good absorber at that location, you should be able to achieve temperatures like the temperature inside the combustion chamber of an engine, but without burning any fuel. I heard a rumor about a power plant design on this principle a few years ago, but I never really looked into it until today.

Given that the sun provides around 1 kW/m^2 of energy at the Earth's surface (depending where and how you measure, and whether you're just talking about visible light or other things too), you'd think that you could get 1 MW/1000 m^2 (which is the area of a square whose sides are about 31 meters). But photovoltaic cells are notoriously inefficient, even though they have been getting a lot better.

It might be that if you used that 1 MW/1000 m^2 to heat some target, which then, for example, boiled water for a steam engine, you might be able to convert the power with relatively high efficiency -- especially since efficient commercial heat engines, including large external combustion engines, have been in production for many generations and are being refined all the time. Semiconductors, and especially light-sensitive semiconductors, are still extremely new. (Imagine what semiconductors would be like if they were as mature as heat engines!)

So, for example, why should we power electrical power plants by burning coal in a combustion chamber, when we could have an array of mirrors concentrating sunlight on the same chamber to raise it to a comparable temperature? Engines in general just rely on steep temperature gradients over long periods of time in order to generate useful power; those temperature gradients can be provided by the absorption of concentrated solar radiation, relative to the cooler outside world.

So, it turns out that there are a lot of people actively working on solar thermal power plants today, including a consortium in Israel which is trying to build a 250 kW prototype plant. The U.S. Department of Energy actually has a test facility at the Sandia National Laboratory in Albuquerque, NM, where you can bring a solar thermal engine for testing -- they rent it out to the public if you have something you want to try! So you put your engine up on a tower or something, and they have a field full of mirrors in the New Mexico desert that all reflect solar energy onto your engine, and you see what happens.

There are a lot of other projects like this, too. I'd like to get some information from some of the people who are working on these things, and maybe go and see one.

I wrote a comparison of solar thermal power plants to other kinds of electrical power plants, showing salient pros and cons of each. The solar thermal plants seem to do really well, but they need a fair amount of research in order to be made cheap enough to be appealing to large energy companies. But the "no fuel, no emissions" bit is a big plus, and solar thermal power beats out hydroelectric power for environmental impact.

I wonder whether a practical solar thermal generator can be built on a small scale, for example as a replacement for a Diesel-powered backup generator. I know that some people are already using solar power to provide all their home energy needs, but most of those are using photovoltaics, or solar thermal heating but not solar thermal electricity generation.

But I think this is really cool stuff.

In other news

I learned a bit more about HTML parsing in connection with some Peacefire discussions. I need to meet some of those folks face to face; I hope we'll all make it to Computers, Freedom, and Privacy 2001 in Cambridge, MA.

Lots of things are still coming up.

It's now a month after the party I went to, um, a month ago.

I was re-reading some Catullus in Latin and English. Still very powerful stuff. Of course, all I'm really familiar with is the Lesbia poems.

mwh, cool, and thanks! I didn't think of using logarithms at all (I was trying to put together an inductive proof using the binomial theorem).

Once you've taken the logarithm of both sides, the proof that the logarithm of the average is larger than the average of the logarithms seems pretty easy.

I think there is an interesting analogy there to a famous picture from nuclear physics, the curve of binding energy per nucleon in an atomic nucleus -- the graph that shows that fusion of nuclei below iron or fission of nuclei above iron can yield energy, and that iron nuclei are the most stable. ("Everything wants to be iron.") If you look at the region of that curve which shows where fusion occurs, you see that the average (arithmetic mean) energy is greater when you move to the right, in the increasing direction, toward iron. The curve is everywhere concave downward, so that in its increasing region, between hydrogen and iron, the average energy of the sum of nucleons (into a single nucleus) is always greater than the sum of the average energies of nucleons (in separate nuclei).

Where the curve is decreasing (but still concave downward), between iron and infinity, the average energy of the sum of nucleons is always less than the sum of their average energies in separate nuclei.

So, with the arithmetic and geometric mean inequality, after you've taken the logarithm of both sides, the problem is quite similar to explaining how the curve of binding energy shows which nuclear reactions will release energy as a result of changes in mass deficit.

Sorry if that wasn't completely clear; it's very difficult for me to talk about the curve of binding energy without having one handy to point at.

In other news

I had a great dinner with a bookseller couple in Berkeley.

On Friday, my friend Anirvan took me to see a pair of documentaries at a movie theater in Berkeley. Called "Showdown in Seattle" and "Breaking the Bank", they concerned two major U.S. demonstrations of the past year (the WTO protest in Seattle and the World Bank protest in Washington, DC). The two of these were probably the most distressing thing I've seen in years, and definitely "Showdown in Seattle" was the most violent movie I've seen in a long time.

I could probably write about these two documentaries for pages, and I probably will, but maybe in a letter rather than on Advogato.

After that, and once I could breathe again (I discovered that watching pictures of people getting teargassed does interesting things to my autonomic nervous system), we went to a post-election party at Sumana's place, and it was very geeky and a lot of fun. I got to cite Martin Gardner on voting paradoxes, and we played Hangman, and various other things.

Hi, Anirvan, Sumana, Michelle, Nathaniel. Did I miss anybody?

I stayed over in Berkeley and got up extremely early to make a 3.5-hour trip from Berkeley to Santa Rosa by way of San Francisco. I'm still yawning and falling asleep intermittently.

I was in Santa Rosa because I was invited to attend a technology planning meeting for Sonoma Academy, a new independent high school which will open next year. (Non-U.S. readers: an independent school is a private, i.e. non-tax-supported, competitive-admission, tuition-based school which isn't affiliated with a larger organization like a religious denomination.)

The meeting was pretty cool. We covered a lot of ground, and I met some nice people. It's amazing to be in a situation where people are talking about computer technology and they aren't even considering using free software -- in fact, they haven't even heard of free software! Most of the people at the meeting assumed that the school would be using Microsoft Windows for all its computing.

All the other computer people I've met in recent memory were free software deveopers, or free software advocates, or the like. I have pretty much entirely stopped interacting with technologists outside the free software world. (The notable exception was Dave Winer; lilo and jpick and I met him at the Moscone Center, and that was interesting.) So it's kind of a shock to have this reality check, like, not only do these people not assume that they will use Linux or another free PC Unix, but they assume instead that they will use Windows 2000 or another proprietary Microsoft operating system.

I guess it's like spending a couple of years in a somewhat insular religious community and then venturing outside momentarily.

If I have any further influence in the concrete decision-making for Sonoma Academy's IT policy, I'll try to get them to consider Linux for all their new servers and maybe for some lab machines, too. But I think I made some useful general points completely outside of the free software issue, on the subject of the role of information technology in high school education. So even if they go with Windows everywhere, I hope my general points will have had an effect.

Sonoma County is pretty beautiful, like much of California, but I was either asleep on a bus or engrossed in conversation about LBL each time I was passing through, so I didn't actually get to see much of it.

I think I had better become a teacher or something.

It's gotten cold recently, in San Francisco, Berkeley, and Santa Rosa alike. I guess I'll have to start wearing socks and a jacket or something.

Tired, headache, too much writing about CIDR, too little sleep. I hadn't realized to what extent netmasks were already a pervasive concept before CIDR; but there weren't netmasks in border routers (or at least not dynamic routing protocols), and there wasn't prefix notation, right?

I taught a Python class again and I dragged in digital roots for some reason -- I couldn't resist. "Hey, what's 37533094858041282253294 % 9? Hey, no paper, now!"

There are some nice memory tricks if you can do various running-total mod n calculations in your head. I saw Arthur Benjamin do one of those once when he performed at UC Berkeley; it was awesome.

Can anybody tell me a good way to get to Santa Rosa from San Francisco on Saturday?

deekayen, one time I told a Christian friend that I thought I'd experienced koinonia at a CABAL meeting, but she didn't really believe me.

In other news

It looks like the sugar substitute "Sweet'n'Low" is named after a line in a poem by Tennyson.

Sure enough, it is.

I've had distressingly little math around here recently. Hmmm... so I wondered why the arithmetic mean of two numbers must be larger than their geometric mean.

I proved it for two numbers; now I'm trying to prove it for n numbers.

apgarcia, one of several difficulties in the Wager is the account of the nature of belief that it presumes.

Can we really believe that belief is a matter of choice and that this choice can be taken upon factors unrelated to the content of the belief or to evidence for it?

Pascal suggests that we have to choose whether to believe in God, but his account of the nature of that choice is really suspect. He presents it as sort of a question of whether we want to live in one city or another city, knowing that the consequences might be extremely serious (perhaps more serious than the consequences of any other choice).

I don't see belief as formed that way!

In other news

Hooray, no force of law for administrative classification in the U.S. (no "official secrets act"). I'm actually totally amazed that Clinton vetoed that bill; I thought there was almost no chance of that.

Lots of turmoil and reports of old turmoil that I didn't know about in the anti-censorware community, currently centered around the Censorware Project and spilling over into other things. It's very sad.

Somehow I always think that only the bad guys have these things happen to them.

I spent a lot of Saturday with a friend wandering around, including an excursion to Green Apple Books in the Inner Richmond (6th and Clement; take the 1 bus to 6th and California from downtown). That was cool. Now I have Martin Gardner's most recent book (Did Adam & Eve Have Navels?).

I might go to my friend's place for Thanksgiving, which is fast approaching. I haven't been one for big family holidays in recent memory, but it should be cool to see how someone else's family does it.

stephane's party was pretty neat and seemed to have the TikiNature. I had a good time, and got home late. A lot of people with Advogato accounts were present, with large Eazel and Linuxcare contingents. I was going to post some quotations from the party here, but I've forgotten all of them!

OK, "we could get out some Cat 5 and start knitting" seems to have been one.

Stephane's cat Pixel is very cute.

So, happy birthday, Stephane, and happy Guy Fawkes Day, everybody.

ajv, I know about that chicken-and-egg problem; I gave a talk on IPv6 where I described it, and, in the year and a half since then, it doesn't seem to have gotten any better. Lots of people take IPv6 seriously, but few of them seem to be ISPs.

One incentive problem is that there are very few IPv6-only services, and, as a matter of policy and practice, people providing services over IPv6 normally don't want to make them IPv6-only. Consequently the people who want IPv6 are all kind of networking geeks; the transition mechanisms like 6over4 and 6to4 and all the rest have really reduced some of the urgency that people might feel!

I mean, it's now easier and less painful, but there isn't any big reason for people to worry about being "left behind"! That may be one reason they don't feel themselves in a hurry.

A world-wide multiprotocol network at the network layer is something that hasn't existed for some years; it's an interesting thing but also a sort of painful thing. In the best case, if your efforts are successful, that is the best we can look forward to for a while!

GJF, I don't think Richard Stallman would appreciate being called a "well known open source evangelist".

In other news

There's more Daniel Burton campaign coverage in the Daily Cal.

It's funny with what equanimity the reporter covered the story -- it seems she was probably given an assignment to track down all the candidates on the ballot for the office, interview them, and give them all equal space to tell their respective stories in their own words. Accordingly, Daniel gets just as much respect (and space) as Aroner, but there's no news analysis discussing the difference between them.

It seems that Linus Torvalds has written a book, an autobiography, called Just For Fun, forthcoming from HarperCollins.

It also seems that Wells Fargo Bank statements are bearing a notice that


I wonder why Wells Fargo did that, but it doesn't seem that kelly is around much to comment. It reminds me of some license issues that have come up and that I've been involved in. Of course, bank deposit policies are not exactly open source licenses.

I went to the BayFF meeting at the Moscone center. The meeting was a little scattered. There were three panelists (Scientology critic, reporter, law professor) talking about free speech and intellectual property on-line. I think I might have spotted the Church of Scientology representative. That issue is a bit less good-natured than "Spot the Fed" at DefCon.

I saw dmarti, Tabinda, crackmonkey, and shaleh there (the latter was accepting an award on behalf of Debian).

dmarti has had an excellent idea called antipackages (which perhaps shaleh will be implementing in Debian): a package with only conflicts, no dependencies, no provides... just conflicts!

Why would you want such a useless thing?

Well: task-no-cleartext-login-daemons, task-no-x11, task-no-gnome, task-no-kde, task-no-lzw, task-no-mp3, (once upon a time) task-no-unlicensed-rsa, task-no-strcpy (or task-no-non-bounds-checking-string-functions; this is problematic because there are safe ways to use these functions and maybe some very important software uses them safely), task-no-games (Don says that's useful for U.S. government procurements). In principle you could have task-no-nonfree (or task-no-semifree and task-no-proprietary), but that's already handled in a different way by Debian (by only shipping free software).

The cool thing is that you can upgrade antipackages to new versions, just like any other packages. It's just that their conflicts will be updated; nothing else will change.

This is a very convenient way to set a policy that a particular category of software should not be installed somewhere. And that policy won't be accidentally overridden by a dependency; instead, the system administrator will be specifically warned about the conflict, and will have to make a choice.

I also have a task-no-outstanding-security-flaws idea, as a way to remove automatically anything with a reported security flaw until its maintainer can fix it (and release a new version, which task-no-outstanding-security-flaws no longer conflicts with). I'm not sure this is an ideal solution, but it's an interesting application of package management.

I've got a rice cooker now. That is (at first impression) a terribly useful appliance. Mmmmm, carbohydrates!

My stepsister is engaged to be married. I hope I can make it to her wedding.

The MathWorld lawsuit makes me sad. Just in case anyone wanted to claim that copyright was about "protecting authors", we have further evidence that it's all about money (and of course a lot of the time that money is, well, given to authors).

(Protecting authors in the sense of getting people to abide by their wishes isn't the be-all and end-all either: there are various cases of successors in interest trying to withdraw or suppress old controversial works which are still under copyright. Or even authors themselves trying to do that. I would welcome news of more examples of this.)

The even worse thing is that I have actually bought a copy of the CRC Concise Encyclopedia of Mathematics -- and it's great, of course -- but now my habit of linking to MathWorld is ruined.

But they cannot help their neighbors,
That's not good, hackers, that's not good.

I wrote a long answer to a question about the calculation of pi. That was fun.

Quick Python hacks I wrote as part of my answer include a Monte Carlo calculation and a Riemann sum calculation. Yes, I know these are horribly inefficient ways to calculate pi, but they illustrate important theoretical concepts (two particular techniques in numerical methods for approximating definite integrals).

The Bailey/Borwein/Plouffe algorithm is totally mind-blowing. There are so many people I wish were alive to see that. Arbitrary digits of pi in any base calculated directly, without any intermediate digits!

Today is my first Python class at Linuxcare. This should be fun. Also, teaching a class is really the best possible way to learn about something; I submit that even Guido van Rossum himself has probably learned more Python by teaching other people than by writing the Python interpreter.

Gee, Al Gore is still capable of quoting the Clipper Chip announcement!

I'm no longer a Kaiser Permanente client, and I'm pretty pleased about that.

I've got on-line queries working in my CueCat book scanning stuff. So you can scan a book and have the LOC database queried immediately. In principle, I could easily do asynchronous (background) queries; in that case, there should be lockfile support so that scanning a particular book twice doesn't result in two simultaneous queries writing into the same file.

The main thing that's missing is the ability to start a new session if the LOC query session expires in the middle. Right now, I can't even detect that condition -- it was pretty much impossible for batch queries, but for real-time queries it could easily happen if you just leave the program running for a while and then come back and try to scan something else.

My friend in Teach for America in Mississippi writes

how can I expect these children to be able to analyze the effect of enzyme concentration on the rate of the reation it catalyzes if they don't even know how to calculate an average?

The phone line thing didn't work out that well. Say you're building an apartment complex in the South of Market in San Francisco in the late 1990s. On one side of you is Pac Bell; on another side of you is Best Internet; on another side is Pac Bell again. In a two-bedroom apartment, do you

(a) have two pairs of Cat 3 copper, encased in concrete, with no conduit (and a policy against exterior wiring), or

(b) have eight or ten pairs of Cat 5 in a conduit with easy access to pull more copper or fiber?

Guess which one my apartment complex chose?

So I got a pager instead of a phone line.

stephane, woohoo!

Waldo, congratulations.

paci and davidw, good luck.

In other news

I wish I were qualified to handle liquid nitrogen.

I looked at the SFPD Crime Summary For October 6 - October 12, 2000. It seems that the dramatic emergency response outside my window that weekend was due to an AGGRAVATED ASSAULT WITH A KNIFE.

I've got voicemail working at work, and now PacBell is coming on Wednesday to set up my phone. This is better telephone luck than I've had in a while.

Jim Warren noted ages ago that identities of those who request public records are themselves public records (as a general rule); this is interesting, and maybe it should serve as a precaution to people contemplating making CPRA or FOIA or similar requests.

I'm eating some Double Rainbow Soy Cream recently. "40% Less Fat than Ice Cream" doesn't help much if you eat twice as much of it!

I wonder if I could get Herrell's to introduce a non-dairy ice cream. I should remember to ask about that. Hey, if you're reading this, and you read a later entry indicating that I'm in Northampton, would you mind dropping me an e-mail message to remind me to ask about non-dairy ice cream?

I've announced a Python class at Linuxcare (Thursday noon). I've already got five or six people planning to attend.


135 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!