gwern comments on Rationality Quotes May 2012 - Less Wrong

6 Post author: OpenThreadGuy 01 May 2012 11:37PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (696)

You are viewing a single comment's thread. Show more comments above.

Comment author: RichardKennaway 07 May 2013 04:22:01PM 3 points [-]

Is there any practical difference between "assuming independent results" and "assuming zero probability for all models which do not generate independent results"?

No.

If not then I think we've just been exposed to people using different terminology.

I think it's more than terminology. And if Mencius can be dismissed as someone who does not really get Bayesian inference, one can surely not say the same of Cosma Shalizi, who has made the same argument somewhere on his blog. (It was a few years ago and I can't easily find a link. It might have been in a technical report or a published paper instead.) Suppose a Bayesian is trying to estimate the mean of a normal distribution from incoming data. He has a prior distribution of the mean, and each new observation updates that prior. But what if the data are not drawn from a normal distribution, but from the sum of two such distributions with well separated peaks? The Bayesian (he says) can never discover that. Instead, his estimate of the position of the single peak that he is committed to will wander up and down between the two real peaks, like the Flying Dutchman cursed never to find a port, while the posterior probability of seeing the data that he has seen plummets (on the log-odds scale) towards minus infinity. But he cannot avoid this: no evidence can let him update towards anything his prior gives zero probability to.

What (he says) can save the Bayesian from this fate? Model-checking. Look at the data and see if they are actually consistent with any model in the class you are trying to fit. If not, think of a better model and fit that.

Andrew Gelman says the same; there's a chapter of his book devoted to model checking. And here's a paper by both of them on Bayesian inference and philosophy of science, in which they explicitly describe model-checking as "non-Bayesian checking of Bayesian models". My impression (not being a statistician) is that their view is currently the standard one.

I believe the hard-line Bayesian response to that would be that model checking should itself be a Bayesian process. (I'm distancing myself from this claim, because as a non-statistician, I don't need to have any position on this. I just want to see the position stated here.) The single-peaked prior in Shalizi's story was merely a conditional one: supposing the true distribution to be in that family, the Bayesian estimate does indeed behave in that way. But all we have to do to save the Bayesian from a fate worse than frequentism is to widen the picture. That prior was merely a subset, worked with for computational convenience, but in the true prior, that prior only accounted for some fraction p<1 of the probability mass, the remaining 1-p being assigned to "something else". Then when the data fail to conform to any single Gaussian, the "something else" alternative will eventually overshadow the Gaussian model, and will need to be expanded into more detail.

"But," the soft Bayesians might say, "how do you expand that 'something else' into new models by Bayesian means? You would need a universal prior, a prior whose support includes every possible hypothesis. Where do you get one of those? Solomonoff? Ha! And if what you actually do when your model doesn't fit looks the same as what we do, why pretend it's Bayesian inference?"

I suppose this would be Eliezer's answer to that last question.

I am not persuaded that the harder Bayesians have any more concrete answer. Solmonoff induction is uncomputable and seems to unnaturally favour short hypotheses involving Busy-Beaver-sized numbers. And any computable approximation to it looks to me like brute-forcing an NP-hard problem.

Comment author: gwern 07 May 2013 05:38:51PM 3 points [-]

And any computable approximation to it looks to me like brute-forcing an NP-hard problem.

I find this a curious thing to say. Isn't this an argument against every possible remotely optimal computable form of induction or decision-making? Of course a good computable approximation may wind up spending lots of resources solving a problem if that problem is important enough, this is not a blackmark against it. Problems in the real world can be hard, so dealing with them may not be easy!

"Omega flies up to you and hands you a box containing the Secrets of Immortality; the box is opened by the solution to an NP problem inscribed on it." Is the optimal solution really to not even try the problem - because then you're trying "brute-forcing an NP-hard problem"! - even if it turns out to be one of the majority of easily-solved problems? "You start a business and discover one of your problems is NP-hard. You immediately declare bankruptcy because your optimal induction optimally infers that the problem cannot be solved and this most optimally limits your losses."

And why NP-hard, exactly? You know there are a ton of harder complexity classes in the complexity zoo, right?


The right answer is simply to point out that the worst case of the optimal algorithm is going to be the worst case of all possible problems presented, and this is exactly what we would expect since there is no magic fairy dust which will collapse all problems to constant-time solutions.

Comment author: RichardKennaway 07 May 2013 06:20:48PM 1 point [-]

I find this a curious thing to say. Isn't this an argument against every possible remotely optimal computable form of induction or decision-making?

There might well be a theorem formalising that statement. There might also be one formalising the statement that every remotely optimal form of induction or decision-making is uncomputable. If that's the way it is, well, that's the way it is.

"Omega flies up to you

This is an argument of the form "Suppose X were true -- then X would be true! So couldn't X be true?"

"You start a business and discover one of your problems is NP-hard. You immediately declare bankruptcy because your optimal induction optimally infers that the problem cannot be solved and this most optimally limits your losses."

You try to find a method that solves enough examples of the NP-hard problem well enough to sell the solutions, such that your more bounded ambition puts you back in the realm of P. This is done all the time -- freight scheduling software, for example. Or airline ticket price searching. Part of designing optimising compilers is not attempting analyses that take insanely long.

And why NP-hard, exactly? You know there are a ton of harder complexity classes in the complexity zoo, right?

Harder classes are subsets of NP-hard, and everything in NP-hard is hard enough to make the point. Of course, there is the whole uncomputability zoo above all that, but computing the uncomputable is even more of a wild goose chase. "Omega flies up to you and hands you a box containing the Secrets of Immortality; for every digit of Chatin's Omega you correctly type in, you get an extra year, and it stops working after the first wrong answer".

Comment author: gwern 07 May 2013 06:32:46PM 2 points [-]

This is an argument of the form "Suppose X were true -- then X would be true! So couldn't X be true?"

No, this is pointing out that if you provide an optimal outcome barricaded by a particular obstacle, then that optimal outcome will trivially be at least as hard as that obstacle.

You try to find a method that solves enough examples of the NP-hard problem well enough to sell the solutions, such that your more bounded ambition puts you back in the realm of P. This is done all the time -- freight scheduling software, for example. Or airline ticket price searching. Part of designing optimising compilers is not attempting analyses that take insanely long.

This is exactly the point made for computable approximations to AIXI. Thank you for agreeing.

Harder classes are subsets of NP-hard, and everything in NP-hard is hard enough to make the point.

Are you sure you want to make that claim? That all harder classes are subsets of NP-hard?

but computing the uncomputable is even more of a wild goose chase. "Omega flies up to you and hands you a box containing the Secrets of Immortality; for every digit of Chatin's Omega you correctly type in, you get an extra year, and it stops working after the first wrong answer".

Fantastic! I claim my extra 43 years of life.

Comment author: RichardKennaway 08 May 2013 07:55:22AM 0 points [-]

Harder classes are subsets of NP-hard, and everything in NP-hard is hard enough to make the point.

Are you sure you want to make that claim? That all harder classes are subsets of NP-hard?

No, carelessness on my part. Doesn't affect my original point, that schemes for approximating Solomonoff or AIXI look like at least exponential brute force search.

You try to find a method that solves enough examples of the NP-hard problem well enough to sell the solutions, such that your more bounded ambition puts you back in the realm of P. This is done all the time -- freight scheduling software, for example. Or airline ticket price searching. Part of designing optimising compilers is not attempting analyses that take insanely long.

This is exactly the point made for computable approximations to AIXI.

Since AIXI is, by construction, the best possible intelligent agent, all work on AGI can, in a rather useless sense, be described as an approximation to AIXI. To the extent that such an attempt works (i.e. gets substantially further than past attempts at AGI), it will be because of new ideas not discovered by brute force search, not because it approximates AIXI.

Fantastic! I claim my extra 43 years of life.

43 years is a poor sort of immortality.

Comment author: gwern 08 May 2013 07:48:13PM 1 point [-]

schemes for approximating Solomonoff or AIXI look like at least exponential brute force search.

Well, yeah. Again - why would you expect anything else? Given that there exist problems which require that or worse for solution? How can a universal problem solver do any better?

Since AIXI is, by construction, the best possible intelligent agent, all work on AGI can, in a rather useless sense, be described as an approximation to AIXI.

Yes.

To the extent that such an attempt works (i.e. gets substantially further than past attempts at AGI), it will be because of new ideas not discovered by brute force search, not because it approximates AIXI.

No. Given how strange and different AIXI works, it can easily stimulate new ideas.

43 years is a poor sort of immortality.

It's more than I had before.

Comment author: RichardKennaway 09 May 2013 07:44:03AM 0 points [-]

No. Given how strange and different AIXI works, it can easily stimulate new ideas.

The spin-off argument. Here's a huge compendium of spinoffs of previous approaches to AGI. All very useful, but not AGI. I'm not expecting better from AIXI.

Comment author: gwern 09 May 2013 03:45:25PM 3 points [-]

Hm, so let's see; you started off mocking the impossibility and infeasibility of AIXI and any computable version:

I am not persuaded that the harder Bayesians have any more concrete answer. Solmonoff induction is uncomputable and seems to unnaturally favour short hypotheses involving Busy-Beaver-sized numbers. And any computable approximation to it looks to me like brute-forcing an NP-hard problem.

Then you admitted that actually every working solution can be seen as a form of SI/AIXI:

There might well be a theorem formalising that statement. There might also be one formalising the statement that every remotely optimal form of induction or decision-making is uncomputable. If that's the way it is, well, that's the way it is... Since AIXI is, by construction, the best possible intelligent agent, all work on AGI can, in a rather useless sense, be described as an approximation to AIXI

And now you're down to arguing that it'll be "very useful, but not AGI".

Well, I guess I can settle for that.

Comment author: RichardKennaway 10 May 2013 08:33:24AM -1 points [-]

I stand by the first quote. Every working solution can in a useless sense be seen as a form of SI/AIXI. The sense that a hot-air balloon can be seen as an approach to landing on the Moon.

And now you're down to arguing that it'll be "very useful, but not AGI".

At the very most. Whether AIXI-like algorithms get into the next edition of Russell and Norvig, having proved of practical value, well, history will decide that, and I'm not interested in predicting it. I will predict that it won't prove to be a viable approach to AGI.

Comment author: gwern 10 May 2013 04:30:54PM 0 points [-]

The sense that a hot-air balloon can be seen as an approach to landing on the Moon.

How can a hot air balloon even in theory be seen as that? Hot air has a specific limit, does it not - where its density equals the outside density?