Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: orthonormal 24 March 2012 08:40:11PM 0 points [-]

Note that our agent will quickly prove "if output = 'defect' then utility >= $1".

Your intuition that it gets deduced before any of the spurious claims like "if output = 'defect' then utility <= -$1" is taking advantage of an authoritative payoff matrix that X can't safely calculate xerself. I'm not sure that this tweaked version is any safer from exploitation...

Comment author: AlephNeil 24 March 2012 09:15:44PM *  0 points [-]

an authoritative payoff matrix that X can't safely calculate xerself.

Why not? Can't the payoff matrix be "read off" from the "world program" (assuming X isn't just 'given' the payoff matrix as an argument.)

Comment author: AlephNeil 24 March 2012 07:33:57PM *  1 point [-]
  1. Actually, this is an open problem so far as I know: show that if X is a Naive Decision Theory agent as above, with some analyzable inference module like a halting oracle, then there exists an agent Y written so that X cooperates against Y in a Prisoner's Dilemma while Y defects.

Let me just spell out to myself what would have to happen in this instance. For definiteness, let's take the payoffs in prisoner's dilemma to be $0 (CD), $1 (DD), $10 (CC) and $11 (DC).

Now, if X is going to co-operate and Y is going to defect then X is going to prove "If I co-operate then I get $0". Therefore, in order to co-operate, X must also prove the spurious counterfactual "If I defect then I get $x" for some negative value of x.

But suppose I tweak the definition of the NDT agent so that whenever it can prove (1) "if output = a then utility >= u" and (2) "if output != a then utility <= u" it will immediately output a. (And if several statements of the forms (1) and (2) have been proved then the agent searches for them in the order that they were proved) Note that our agent will quickly prove "if output = 'defect' then utility >= $1". So if it ever managed to prove "if output = 'co-operate' then utility = $0" it would defect right away.

Since I have tweaked the definition, this doesn't address your 'open problem' (which I think is a very interesting one) but it does show that if we replace the NDT agent with something only slightly less naive, then the answer is that no such Y exists.

(We could replace Prisoner's Dilemma with an alternative game where each player has a third option called "nuclear holocaust", such that if either player opts for nuclear holocaust then both get (say) -$1, and ask the same question as in your note 2. Then even for the tweaked version of X it's not clear that no such Y exists.)

ETA: I'm afraid my idea doesn't work: The problem is that the agent will also quickly prove "if 'co-operate' then I receive at least $0." So if it can prove the spurious counterfactual "if 'defect' then receive -1" before proving the 'real' counterfactual "if 'co-operate' then receive 0" then it will co-operate.

We could patch this up with a rule that said "if we deduce a contradiction from the assumption 'output = a' then immediately output a" which, if I remember rightly, is Nesov's idea about "playing chicken with the inconsistency". Then on deducing the spurious counterfactual "if 'defect' then receive -1" the agent would immediately defect, which could only happen if the agent itself were inconsistent. So if the agent is consistent, it will never deduce this spurious counterfactual. But of course, this is getting even further away from the original "NDT".

Comment author: AlephNeil 01 February 2012 02:11:08PM 15 points [-]

[general comment on sequence, not this specific post.]

You have such a strong intuition that no configuration of classical point particles and forces can ever amount to conscious awareness, yet you don't immediately generalize and say: 'no universe capable of exhaustive description by mathematically precise laws can ever contain conscious awareness'. Why not? Surely whatever weird and wonderful elaboration of quantum theory you dream up, someone can ask the same old question: "why does this bit that you've conveniently labelled 'consciousness' actually have consciousness?"

So you want to identify 'consciousness' with something ontologically basic and unified, with well-defined properties (or else, to you, it doesn't really exist at all). Yet these very things would convince me that you can't possibly have found consciousness given that, in reality, it has ragged, ill-defined edges in time, space, even introspective content.

Stepping back a little, it strikes me that the whole concept of subjective experience has been carefully refined so that it can't possibly be tracked down to anything 'out there' in the world. Kant and Wittgenstein (among others) saw this very clearly. There are many possible conclusions one might draw - Dennett despairs of philosophy and refuses to acknowledge 'subjective experience' at all - but I think people like Chalmers, Penrose and yourself are on a hopeless quest.

Comment author: cousin_it 18 January 2012 02:41:16PM *  2 points [-]

The comprehension axiom schema (or any other construction that can be used by a proof checker algorithm) isn't enough to prove all the statements people consider to be inescapable consequences of second-order logic. You can view the system you described as a many-sorted first-order theory with sets as one of the sorts, and notice that it cannot prove its own consistency (which can be rephrased as a statement about the integers, or about a certain Turing machine not halting) for the usual Goedelian reasons. But we humans can imagine that the integers exist as "hunks of Platoplasm" somewhere within math, so the consistency statement feels obviously true to us.

It's hard to say whether our intuitions are justified. But one thing we do know, provably, irrevocably and forever, is that being able to implement a proof checker algorithm precludes you from ever getting the claimed benefits of second-order logic, like having a unique model of the standard integers. Anyone who claims to transcend the reach of first-order logic in a way that's relevant to AI is either deluding themselves, or has made a big breakthrough that I'd love to know about.

Comment author: AlephNeil 25 January 2012 02:27:38PM 2 points [-]

The comprehension axiom schema (or any other construction that can be used by a proof checker algorithm) isn't enough to prove all the statements people consider to be inescapable consequences of second-order logic.

Indeed, since the second-order theory of the real numbers is categorical, and since it can express the continuum hypothesis, an oracle for second-order validity would tell us either that CH or ¬CH is 'valid'.

("Set theory in sheep's clothing".)

Comment author: AlephNeil 18 December 2011 06:13:26PM *  5 points [-]

But the bigger problem is that we can't say exactly what makes a "silly" counterfactual different from a "serious" one.

Would it be naive to hope for a criterion that roughly says: "A conditional P ⇒ Q is silly iff the 'most economical' way of proving it is to deduce it from ¬P or else from Q." Something like: "there exists a proof of ¬P or of Q which is strictly shorter than the shortest proof of P ⇒ Q"?

A totally different approach starts with the fact that your 'lemma 1' could be proved without knowing anything about A. Perhaps this could be deemed a sufficient condition for a counterfactual to be serious. But I guess it's not a necessary condition?

Comment author: AlephNeil 09 December 2011 12:56:27AM *  1 point [-]

Suppose we had a model M that we thought described cannons and cannon balls. M consists of a set of mathematical assertions about cannons

In logic, the technical terms 'theory' and 'model' have rather precise meanings. If M is a collection of mathematical assertions then it's a theory rather than a model.

formally independent of the mathematical system A in the sense that the addition of some axiom A0 implies Q, while the addition of its negation, ~A0, implies ~Q.

Here you need to specify that adding A0 or ~A0 doesn't make the theory inconsistent, which is equivalent to just saying: "Neither Q nor ~Q can be deduced from A."

Note: if by M you had actually meant a model, in the sense of model theory, then for every well-formed sentence s, either M satisfies s or M satisfies ~s. But then models are abstract mathematical objects (like 'the integers'), and there's usually no way to know which sentences a model satisfies.

Comment author: paulfchristiano 08 December 2011 08:07:51PM *  8 points [-]

I'll write down an algorithm that solves SAT in time N^k if any algorithm does.

On SAT instances of size T, enumerate all algorithms of size up to log(T); for each one, consider all SAT instances of size up to log(T); run each algorithm for time log(T)^k, outputing 0 if they don't halt, and compare its output to the result of a brute force search.

Take the shortest algorithm which worked in each of these tests, and use it to answer your original SAT query of size T (aborting and outputing 0 if it takes more than T^k time).

This entire process takes time poly(T). Moreover, suppose there is an N^k time algorithm which solves SAT correctly and let A be the shortest such algorithm. Then there is a finite constant T0 such that all shorter algorithms than A fail on some SAT instance of size at most T0. Then the algorithm I described works correctly for all SAT instances of size at least 2^(T0 + |A|).

(Edit: this argument is silly; you can just run all programs of size log(T) and output a solution if any of them find one)

Edit: Note in particular that the algorithm which takes k = BB(10^10) and does a brute force search instead for T < BB(10^10) is guaranteed to solve SAT in poly time, provided that any algorithm does. Realistically, I think taking k = 10 and doing a brute for search for T < 10^10^10 is virtually guaranteed to solve SAT in poly time, if any algorithm does (and I can actually write down this latter algorithm).

Comment author: AlephNeil 08 December 2011 08:25:33PM *  1 point [-]

Perhaps a slightly simpler way would be to 'run all algorithms simultaneously' such that each one is slowed down by a constant factor. (E.g. at time t = (2x + 1) * 2^n, we do step x of algorithm n.) When algorithms terminate, we check (still within the same "process" and hence slowed down by a factor of 2^n) whether a solution to the problem has been generated. If so, we return it and halt.

ETA: Ah, but the business of 'switching processes' is going to need more than constant time. So I guess it's not immediately clear that this works.

Comment author: Vladimir_Nesov 03 December 2011 07:33:55PM 1 point [-]

OK, the problem I was getting at is that adopting a definition usually has consequences that make some definitions better than others, thus not exempting them from criticism, with implication of their usefulness still possible to refute.

Comment author: AlephNeil 03 December 2011 11:23:36PM *  0 points [-]

I agree that definitions (and expansions of the language) can be useful or counterproductive, and hence are not immune from criticism. But still, I don't think it makes sense to play the Bayesian game here and attach probabilities to different definitions/languages being correct. (Rather like how one can't apply Bayesian reasoning in order to decide between 'theory 1' and 'theory 2' in my branching vs probability post.) Therefore, I don't think it makes sense to calculate expected utilities by taking a weighted average over each of the possible stances one can take in the mind-body problem.

Comment author: Vladimir_Nesov 03 December 2011 06:33:55PM 0 points [-]

The metaphysical principles which either allow or deny the "intrinsic philosophical risk" mentioned in the OP are not like theorems or natural laws, which we might hope some day to corroborate or refute - they're more like definitions that a person either adopts or does not.

What do the definitions do?

Comment author: AlephNeil 03 December 2011 06:55:39PM *  0 points [-]

I don't understand the question, but perhaps I can clarify a little:

I'm trying to say that (e.g.) analytic functionalism and (e.g.) property dualism are not like inconsistent statements in the same language, one of which might be confirmed or refuted if only we knew a little more, but instead like different choices of language, which alter the set of propositions that might be true or false.

It might very well be that the expanded language of property dualism doesn't "do" anything, in the sense that it doesn't help us make decisions.

Comment author: gwern 03 December 2011 05:36:07PM 0 points [-]

Do you have any argument that all our previous observations where jarring physical discontinuities tend to be associated with jarring mental discontinuities (like, oh I don't know, death) are wrong? Or are you just trying to put the burden of proof on me and smugly use an argument from ignorance?

Comment author: AlephNeil 03 December 2011 05:56:42PM 1 point [-]

Of course, we haven't had any instances of jarring physical discontinuities not being accompanied by 'functional discontinuities' (hopefully it's clear what I mean).

But the deeper point is that the whole presumption that we have 'mental continuity' (in a way that transcends functional organization) is an intuition founded on nothing.

(To be fair, even if we accept that these intuitions are indefensible, it's remains to be explained where they come from. I don't think it's all that "bizarre".)

View more: Next