Open Problems in FAI
Edit: Please note that this write-up is poor quality, having the style of a hastily written personal note.
It has been mentioned that there should be a better write-up of open problems in FAI, and as I understand it there is an ongoing effort to explain such open problems. My feeling has been that the recent effort has tended too hold off proposing solutions for too long. I prefer the approach in the Tiling Agents paper, which explained problems through example systems which fail in various respects. What follows is an outline of what I'd write if I spent significant time; I think it is enough to be of some use. This list very much reflects my personal interests & beliefs.
- Tarski's Undefinability Theorem
- We would like a system to be able to reason about itself (in a few critical ways), and Tarski's Theorem is one of the important obstacles. Kripke provided the first hope of progress in this area by showing that we can embed a partial truth predicate in a language if we accept a "gap" (statements which cannot be assessed as true or false within the system's self-theory). Work over the decades has "reduced the gap" (capturing an increasing number of the self-judgements we want, while always leaving a gap). There are also "glut" theories (which must assess some things as both true and false), which typically mirror gap theories. Paul Christiano provided a theory of probabilistic self-reference which intuitively reduces the "gap" to infinitesimal size: the system's knowledge about its own probabilities can be wrong, but only by an infinitesimal. (For example, if it believes X, then it may fail to believe P(X)=1, but it will still believe P(X)>c for all c<1.) (Note, this feels a bit like a "glut" theory since the system solves the problem by saying too much rather than remaining silent.)
- First Incompleteness Theorem
- Logical Uncertainty:
- First-Order ("easy"): assigning probabilities to eventual results of (halting) computations we don't have time to make. I claim this is mostly solved by the FOL prior: we can prefer simpler hypotheses about the behavior of systems, treating computations as black boxes which we use universal induction to predict, while allowing us to incorporate logical reasoning about the function's behavior via bayesian updates. It also solves somewhat more; I claim it will have better properties than Solomonoff if the environment contains objects like halting oracles. (Some deficiencies with respect to reasoning about halting will be mentioned in the next section, however.)
- Second-Order ("impossible"): If we want to assign probabilities to programs halting, facts of number theory, or facts of set theory, we're in serious trouble. Using the FOL prior admits nonstandard models. It's not yet clear what qualities such a probability distribution should have. It seems reasonable to want universal statements to approach probability 1 as the set of positive examples approaches all the examples; this turns out to be as difficult to compute as all the bits in the arithmetic hierarchy. I thought it would be reasonable to restrict this to just the universal statements about computable predicates; ie, halting facts & equivalent. Will Sawin proved that it is not possible for our beliefs about halting to approach arbitrarily close to the correct values without some false Sigma_2 statements approaching arbitrarily close to 1. It remains an open problem to construct such a prior. My proposal is to (focusing on sentences in the regular forms Pi_n or Sigma_n) require that we only introduce a quantified statement into a theory if a statement of the same form already exists; so sentences at a given level in the arithmetic hierarchy must wait for a sentence at the next lowest level to be introduced. This does not block any true sentences from being produced, but causes halting facts to converge as we eliminate possible halting times. It is an open problem whether this proposed distribution converges. If this distribution exists, call it the bad arithmetic prior (BAP).
- Second Incompleteness Theorem
- Lobian Obstacle:
For a machine to plan its actions in the future, it needs to trust itself. The second incompleteness theorem (and, more generally, Lob's theorem) makes this difficult. (All this is insufficiently formal, but the tiling agents paper gives a good explanation.) Several partial solutions have been proposed in a deterministic setting. It is an open problem whether one of Dan Willard's several self-verifying systems solves this problem. (He had multiple proposals...) In case there is no purely logical solution, it seems intuitively promising to look for probabilistic self-trust. Difficulties are already presented by the previous section. If the BAP converges, then we can show that it has self-knowledge of that convergence of the form Paul Christiano described! This makes the false Sigma_2 beliefs feel more acceptable, because it is a necessary feature of the system's self-reference. However, I think it's the case that BAP ends up converging to 1 for *all* sigma_2 statements, which is really terrible.
- Anti-Lobian Obstacle:
In case the Lobian obstacle is solved, the anti-lobian obstacle may be a concern.
Open Thread, November 8 - 14, 2013
Open Thread, November 1 - 7, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open Thread, October 20 - 26, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open Thread, October 13 - 19, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open Thread, October 7 - October 12, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open Thread, September 30 - October 6, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open Thread, September 23-29, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open thread, September 2-8, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open thread, August 19-25, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open thread, August 12-18, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open thread, August 5-11, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open thread, July 29-August 4, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Of course, for "every Monday", the last one should have been dated July 22-28. *cough*
Open thread, July 23-29, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open thread, July 16-22, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open Thread, April 15-30, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open Thread, April 1-15, 2013
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open thread, March 17-31, 2013
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, March 1-15, 2013
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open thread, February 15-28, 2013
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, February 1-14, 2013
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, January 16-31, 2013
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, January 1-15, 2013
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, December 16-31, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, December 1-15, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, November 16–30, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, November 1-15, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, October 16-31, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, October 1-15, 2012
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Open Thread, September 15-30, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, September 1-15, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, August 16-31, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, August 1-15, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, July 16-31, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, July 1-15, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, June 16-30, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, June 1-15, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, May 16-31, 2012
If it's worth saying, but not worth its own post, even in Discussion, it goes here.
Open Thread, May 1-15, 2012
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Stupid Questions Open Thread Round 2
From Costanza's original thread (entire text):
This is for anyone in the LessWrong community who has made at least some effort to read the sequences and follow along, but is still confused on some point, and is perhaps feeling a bit embarrassed. Here, newbies and not-so-newbies are free to ask very basic but still relevant questions with the understanding that the answers are probably somewhere in the sequences. Similarly, LessWrong tends to presume a rather high threshold for understanding science and technology. Relevant questions in those areas are welcome as well. Anyone who chooses to respond should respectfully guide the questioner to a helpful resource, and questioners should be appropriately grateful. Good faith should be presumed on both sides, unless and until it is shown to be absent. If a questioner is not sure whether a question is relevant, ask it, and also ask if it's relevant.
Meta:
- How often should these be made? I think one every three months is the correct frequency.
- Costanza made the original thread, but I am OpenThreadGuy. I am therefore not only entitled but required to post this in his stead. But I got his permission anyway.
View more: Next
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)