One of the most interesting debates on Less Wrong that seems like it should be definitively resolvable is the one between Eliezer Yudkowsky, Scott Aaronson, and others on The Weighted Majority Algorithm. I'll reprint the debate here in case anyone wants to comment further on it.
In that post, Eliezer argues that "noise hath no power" (read the post for details). Scott disagreed. He replied:
...Randomness provably never helps in average-case complexity (i.e., where you fix the probability distribution over inputs) -- since given any ensemble of strategies, by convexity there must be at least one deterministic strategy in the ensemble that does at least as well as the average.
On the other hand, if you care about the worst-case running time, then there are settings (such as query complexity) where randomness provably does help. For example, suppose you're given n bits, you're promised that either n/3 or 2n/3 of the bits are 1's, and your task is to decide which. Any deterministic strategy to solve this problem clearly requires looking at 2n/3 + 1 of the bits. On the other hand, a randomized sampling strategy only has to look at O(1) bits to succeed with high probability.
Whether randomness ever helps in worst-case polynomial-time computation is the P versus BPP question, which is in the same league as P versus NP. It's conjectured that P=BPP (i.e., randomness never saves more than a polynomial). This is known to be true if really good pseudorandom generators exist, and such PRG's can be constructed if certain problems that seem to require exponentially large circuits, really do require them (see this paper by Impagliazzo and Wigderson). But we don't seem close to proving P=BPP unconditionally.
Eliezer replied:
Scott, I don't dispute what you say. I just suggest that the confusing term "in the worst case" be replaced by the more accurate phrase "supposing that the environment is an adversarial superintelligence who can perfectly read all of your mind except bits designated 'random'".
Scott replied:
I often tell people that theoretical computer science is basically mathematicized paranoia, and that this is the reason why Israelis so dominate the field. You're absolutely right: we do typically assume the environment is an adversarial superintelligence. But that's not because we literally think it is one, it's because we don't presume to know which distribution over inputs the environment is going to throw at us. (That is, we lack the self-confidence to impose any particular prior on the inputs.) We do often assume that, if we generate random bits ourselves, then the environment isn't going to magically take those bits into account when deciding which input to throw at us. (Indeed, if we like, we can easily generate the random bits after seeing the input -- not that it should make a difference.)
Average-case analysis is also well-established and used a great deal. But in those cases where you can solve a problem without having to assume a particular distribution over inputs, why complicate things unnecessarily by making such an assumption? Who needs the risk?
And later added:
...Note that I also enthusiastically belong to a "derandomize things" crowd! The difference is, I think derandomizing is hard work (sometimes possible and sometimes not), since I'm unwilling to treat the randomness of the problems the world throws at me on the same footing as randomness I generate myself in the course of solving those problems. (For those watching at home tonight, I hope the differences are now reasonably clear...)
Eliezer replied:
I certainly don't say "it's not hard work", and the environmental probability distribution should not look like the probability distribution you have over your random numbers - it should contain correlations and structure. But once you know what your probability distribution is, then you should do your work relative to that, rather than assuming "worst case". Optimizing for the worst case in environments that aren't actually adversarial, makes even less sense than assuming the environment is as random and unstructured as thermal noise.
I would defend the following sort of statement: While often it's not worth the computing power to take advantage of all the believed-in regularity of your probability distribution over the environment, any environment that you can't get away with treating as effectively random, probably has enough structure to be worth exploiting instead of randomizing.
(This isn't based on career experience, it's how I would state my expectation given my prior theory.)
Scott replied:
> "once you know what your probability distribution is..."
I'd merely stress that that's an enormous "once." When you're writing a program (which, yes, I used to do), normally you have only the foggiest idea of what a typical input is going to be, yet you want the program to work anyway. This is not just a hypothetical worry, or something limited to cryptography: people have actually run into strange problems using pseudorandom generators for Monte Carlo simulations and hashing (see here for example, or Knuth vol 2).
Even so, intuition suggests it should be possible to design PRG's that defeat anything the world is likely to throw at them. I share that intuition; it's the basis for the (yet-unproved) P=BPP conjecture.
"Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin." --von Neumann
And that's where the debate drops off, at least between Eliezer and Scott, at least on that thread.
If you have an intelligent adversary, and you leave a single opening which maxent would choose epsilon% of the time, the adversary will choose it 100% of the time. Randomization allows you to leave lots of openings, each with small probability, so they can only choose the right one epsilon% of the time.
The comparison is not quite fair--sometimes it makes sense to think of the adversary having access to your RNG, in which case you need to leave no openings--but there are cases where this is useful. (This shows up as mixed strategies in game theory, for example.)
[Edit] As an illustration, let's work through the case with n=3. To simplify, suppose that you identify all the bits you want to probe, and then you get the results.
If you probe all 3 bits, then you get the right answer with certainty always, and the adversary can do nothing.
If you probe 2 bits, then you either see 11 or 10 or 00. In the first case, you can be certain there are two ones; in the last case, you can be certain there is only one one. In the middle case, you don't know which is true. If your algorithm is deterministic, you preselect which bits you'll probe--say, the first and second bit--and so the worst case is that you get 10, and if you're playing against an opponent, they can always choose that move. If your algorithm is random, then the opponent isn't sure which bits you'll probe, and so a third of the time you'll get a 11 or a 00. The adversary can reduce the algorithm's effectiveness from 2/3 to 1/2.
If you probe 1 bit, then you either see 1 or 0. In the first case, under maxent, there's a 2/3rds chance there are two ones; in the second case, there's a 2/3rds chance there is one one. If you employ that deterministic algorithm and always pick the first bit against an adversary, you will be wrong every time! The adversary can reduce the effectiveness from 2/3 to 0.
If you probe 0 bits, then you see nothing, and are right 1/2 the time.
What the randomness is doing at each step is counteracting the adversary. You can also see that the deterministic algorithm separates into two cases: 'no better than doing nothing' and '100% correct', which is typical.
[Edit2]What about the case where you sample a bit, then decide whether or not to keep sampling? The work is already done- just read the above in reverse. If 2/3rds probability is good enough, then the randomized algorithm can stop after one bit, but the deterministic algorithm needs all three. If you want certainty, the randomized algorithm uses, on average, only 2.67 sampled bits, because a third of the time they can stop after two.