Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: OrphanWilde 26 August 2015 07:41:39PM 2 points [-]

And you have an amazingly good tool-user, that still doesn't innovate or surprise you.

That's not entirely true. It might surprise us by, say, showing us the precise way to use an endoscopic cauterizer to cut off blood flow to a tumor without any collateral damage. But it can't, by definition, invent a new tool entirely.

I'm not sure the solution to the AI friendliness problem is "Creating AI that is too narrow-minded to be dangerous". You throw out most of what is intended to be achieved by AI in the first place, and achieve little more than evolutionary algorithms are already capable of. (If you're capable of modeling the problem to that extent, you can just toss it, along with the toolset, into an evolutionary algorithm and get something pretty close to just as good.)

Comment author: warbo 31 August 2015 04:20:27PM 0 points [-]

it can't, by definition, invent a new tool entirely.

Can humans "invent a new tool entirely", when all we have to work with are a handful of pre-defined quarks, leptons and bosons? AIXI is hard-coded to just use one tool, a Turing Machine; yet the open-endedness of that tool make it infinitely inventive.

We can easily put a machine shop, or any other manufacturing capabilities, into the abstract room. We could ignore the tedious business of manufacturing and just include a Star-Trek-style replicator, which allows the AI to use anything for which is can provide blueprints.

Also, we can easily be surprised by actions taken in the room. For example, we might simulate the room according to known scientific laws, and have it automatically suspend if anything strays too far into uncertain territory. We can then either abort the simulation, if something dangerous or undesirable is happening within, or else perform an experiment to see what would happen in that situation, then feed the result back in and resume. That would be a good way to implement an artificial scientist. Similar ideas are explored in http://lambda-the-ultimate.org/node/4392

Comment author: philh 07 February 2015 05:18:57PM 0 points [-]

I'm thinking of a dilemma that I thought was called the farmer's dilemma, but that redirects to the prisoner's dilemma on wikipedia, and google doesn't help me out either. Is there a standard name for this dilemma?

Two farmers have adjacent fields. If either of them irrigates (U-1), both get the benefits from it (U+5). If I cooperate, my opponent can get a payoff of 4 by cooperating or 5 by defecting, so he has an incentive to defect; my own payoff is guaranteed 4. If I defect, my opponent can get a payoff of 4 by cooperating (and I get 5), or 0 by defecting (and I also get 0), so he has an incentive to cooperate. We both want at least of us to cooperate, but as long as the other cooperates, we have a mild preference for defecting.

Comment author: warbo 15 February 2015 06:42:37PM 0 points [-]

This mostly reminds me of the "tragedy of the commons", where everyone benefits when an action is taken (like irrigating land, picking up litter, etc.), but it costs some small amount to one who takes the action, such that everyone agrees that action should be taken, but nobody wants to do it themselves.

There is also the related concept of "not in my back yard" (NIMBY), where everyone agrees that some 'necessary evil' be done, like creating a new landfill site or nuclear power plant, but nobody wants to take the sacrifice themselves (ie. have it "in their back yard").

Some real-life examples of this effect http://www.dummies.com/how-to/content/ten-reallife-examples-of-the-tragedy-of-the-common.html

Comment author: TheAncientGeek 22 September 2014 11:51:10AM 0 points [-]

By an off switch I mean a backup goal.

I know nobody mentioned it. The point is that Clippie has one main goal, any no backup goal, so off switches, in my sense, are being IMPLICITLY omitted.

Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.

Comment author: warbo 22 September 2014 12:13:17PM 1 point [-]

By an off switch I mean a backup goal. Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.

This is quite a subtle issue.

If the "backup goal" is always in effect, eg. it is just another clause of the main goal. For example, "maximise paperclips" with a backup goal of "do what you are told" is the same as having the main goal "maximise paperclips while doing what you are told".

If the "backup goal" is a separate mode which we can switch an AI into, eg. "stop all external interaction", then it will necessarily conflict with the the AI's main goal: it can't maximise paperclips if it stops all external interaction. Hence the primary goal induces a secondary goal: "in order to maximise paperclips, I should prevent anyone switching me to my backup goal". These kind of secondary goals have been raised by Steve Omohundro.

Comment author: moridinamael 18 September 2014 01:42:00PM *  3 points [-]

At the risk of looking even more like an idiot: Buying one $1 lottery ticket earns you a tiny chance - 1 in 175,000,000 for the Powerball - of becoming absurdly wealthy. The Powerball gets as high as $590,500,000 pretax. NOT buying that one ticket gives you a chance of zero. So buying one ticket is "infinitely" better than buying no tickets. Buying more than one ticket, comparably, doesn't make a difference.

I like to play with the following scenario. A LessWrong reader buys a lottery ticket. They almost certainly don't win. They have one dollar less to donate to MIRI and because they're not wealthy they may not have enough wealth to psychologically justify donating anything to MIRI anyway. However, in at least one worldline, somewhere, they win a half a billion dollars and maybe donate $100,000,000 to MIRI. So from a global humanity perspective, buying that lottery ticket made the difference between getting FAI built and not getting it built. The one dollar spent on the ticket, in comparison, would have had a totally negligible impact.

I fully realize that the number of universes (or whatever) where the LessWrong reader wins the lottery is so small that they would be "better off" keeping their dollar according to basic economics, but the marginal utility of one extra dollar is basically zero.

edit: Digging myself in even deeper, let me attempt to simplify the argument.

You want to buy a Widget. The difference in net utility, to you, between owning a Widget and not owning a Widget is 3^3^3^3 utilons. Widgets cost $100,000,000. You have no realistic means of getting $100,000,000 through your own efforts because you are stuck in a corporate drone job and you have lots of bills and a family relying on you. So the only way you have of ever getting a Widget is by spending negligible amounts of money buying "bad" investments like lottery tickets. It is trivial to show that buying a lottery ticket is rational in this scenario: (Tiny chance) x (Absurdly, unquantifiably vast utility) > (Certain chance) x ($1).

Replace Widget with FAI and the argument may feel more plausible.

Comment author: warbo 22 September 2014 11:57:14AM 2 points [-]

Buying one $1 lottery ticket earns you a tiny chance - 1 in 175,000,000 for the Powerball - of becoming absurdly wealthy. NOT buying that one ticket gives you a chance of zero.

There are ways to win a lottery without buying a ticket. For example, someone may buy you a ticket as a present, without your knowledge, which then wins.

So buying one ticket is "infinitely" better than buying no tickets.

No, it is much more likely that you'll win the lottery by buying tickets than by not buying tickets (assuming it's unlikely to be gifted a ticket), but the cost of being gifted a ticket is zero, which makes not buying tickets an "infinitely" better return on investment.

Comment author: AlexMennen 09 June 2014 09:36:24PM 6 points [-]

Intuitively, this limitation could be addressed by hooking up the AIXItl's output channel to its source code. Unfortunately, if you do that, the resulting formalism is no longer AIXItl.

It sounds to me like, although such an agent would very quickly self-modify into something other than AIXItl, it would be AIXItl at least on the first timestep (despite its assumption that its output does not change its source code being incorrect).

I expect that such an agent would perform very poorly, because it doesn't start with a good model of self-modification, so the successor it replaces itself with would, with very high probability, not do anything useful. This is a problem for all agents that do not start off with detailed information about the environment, not just AIXI variants. The advantage that the example precommitment game player you provided has over AIXItl is not non-Cartesianism, but the fact that it was designed by someone who knows how the game mechanics work. It seems to me that the only way an agent that does not start off with strong assumptions about which program the environment is running could win is if self-modification is difficult enough that it could not accidentally self-modify into something useless before learning enough about its environment to protect itself.

Comment author: warbo 19 June 2014 06:05:20PM *  3 points [-]

My intuition is that the described AIXItl implementation fails because it's implementation is too low-level. A higher-level AIXItl can succeed though, so it's not a limitation in AIXItl. Consider the following program:

P1) Send the current machine state* as input to a 'virtual' AIXItl.

P2) Read the output of this AIXItl step, which will be a new program.

P3) Write a back up of the current machine state*. This could be in a non-executing register, for example.

P4) Replace the machine's state (but not the backup!) to match the program provided by AIXItl.

Now, as AlexMennen notes, this machine will no longer be AIXItl and in all probability it will 'brick' itself. However, we can rectify this. The AIXItl agent is 'virtual' (ie. not directly hooked up to the machine's IO), so we can interpret its output programs in a safe way:

  • We can use a total language, such that all outputted programs eventually halt.

  • We can prevent the language having (direct) access to the backup.

  • We can append a "then reload the backup" instruction to all programs.

This is still AIXItl, not a "variant". It's just running on a rather complex virtual machine. From AIXItl's Cartesian point of view:

A1) Take in an observation, which will be provided in the form of a robot's configuration. A2) Output an action, in the form of a new robot configuration which will be run to completion. A3) GOTO 1.

From an embodied viewpoint, we can see that the robot AIXItl thinks it's programming doesn't exactly correspond to the robot which actually exists (in particular, it doesn't know that the real robot is also running AIXItl!). Also, where AIXItl measures time in terms of IO cycles, we can see that an arbitrary amount of time may pass between steps A1 and A2 (where AIXItl is 'thinking') and between steps A2 and A3 (where the robot is executing the new program, and AIXItl only exists in the backup).

This setup doesn't solve all Cartesian problems, for example AIXItl doesn't understand that it might die, it has no control over the backup (which a particularly egregious Omega might place restrictions on**) and the backup-and-restore scheme (just like anything else) might be interfered with by the environment. However, this article's main thrust is that a machine running AIXItl is unable to rewrite its code, which is false.

  • Note that this doesn't need to be complete; in particular, we can ignore the current state of execution. Only the "code" and sensor data need to be included.

** With more effort, we could have AIXItl's output programs contain the backup and restore procedures, eg. validated by strong types. This would allow a choice of different backup strategies, depending on the environment (eg. "Omega wants this register to be empty, so I'll write my backup to this hard drive instead, and be sure to restore it afterwards")

Comment author: Peterdjones 08 October 2012 08:09:35PM 3 points [-]

One of my mistakes was believing in Bayesian decision theory, and in constructive logic at the same time. This is because traditional probability theory is inherently classical, because of the axiom that P(A + not-A) = 1.

Could you be so kind as to expand on that?

Comment author: warbo 02 October 2013 11:10:57AM 1 point [-]

One of my mistakes was believing in Bayesian decision theory, and in constructive logic at the same time. This is because traditional probability theory is inherently classical, because of the axiom that P(A + not-A) = 1.

Could you be so kind as to expand on that?

Classical logics make the assumption that all statements are either exactly true or exactly false, with no other possibility allowed. Hence classical logic will take shortcuts like admitting not(not(X)) as a proof of X, under the assumptions of consistency (we've proved not(not(X)) so there is no proof of not(X)), completeness (if there is no proof of not(X) then there must be a proof of X) and proof-irrelevance (all proofs of X are interchangable, so the existence of such a proof is acceptable as proof of X).

The flaw is, of course, the assumption of a complete and consistent system, which Goedel showed to be impossible for systems capable of modelling the Natural numbers.

Constructivist logics don't assume the law of the excluded middle. This restricts classical 'truth' to 'provably true', classical 'false' to 'provably false' and allows a third possibility: 'unproven'. An unproven statement might be provably true or provably false or it might be undecidable.

From a probability perspective, constructivism says that we shouldn't assume that P(not(X)) = 1 - P(X), since doing so is assuming that we're using a complete and consistent system of reasoning, which is impossible.

Note that constructivist systems are compatible with classical ones. We can add the law of the excluded middle to a constructive logic and get a classical one; all of the theorems will still hold and we won't introduce any inconsistencies.

Another way of thinking about it is that the law of the excluded middle assumes that a halting oracle exists which allows us to take shortcuts in our proofs. The results will be consistent, since the oracle gives correct answers, but we can't tell which results used the oracle as a shortcut (and hence don't need it) and which would be impossible without the oracle's existence (and hence don't exist, since halting oracles don't exist).

The only way to work out which ones are shortcuts is to take 'the long way' and produce a separate proof which doesn't use an oracle; these are exactly the constructive proofs!

Comment author: Stuart_Armstrong 07 June 2013 07:28:32AM 0 points [-]

Do you think this idea can be generalised to Ems?

Comment author: warbo 13 June 2013 04:38:49PM 1 point [-]

We can generalise votes to carry different weights. Starting today, everyone who currently has one vote continues to have one vote. When someone makes a copy (electronic or flesh), their voting power is divided between themselves and the copy. The total amount of voting power is conserved and, assuming that copies default to the political opinion of their prototypes, the political landscape only moves when someone changes their mind.

Comment author: DanArmak 09 June 2013 07:45:21PM 4 points [-]

If search is hard, perhaps there is another way.

Suppose an agent intends to cooperate (if the other agent cooperates, etc.). Then it will want the other agent to be able to prove that it will cooperate. If it knew of a proof about itself, it would advertise this proof, so the others wouldn't have to search. (In the original tournament setup, it could embed the proof in its source code.)

An agent can't spend much time searching for proofs about every agent it meets. Perhaps it meets new agents very often. Or perhaps there is a huge amount of agents out there, and it can communicate with them all, and the more agents it cooperates with the more it benefits.

But an agent has incentive to spend a lot of time finding a proof about itself - once - and then it can give that proof to counterparties it wants to cooperate with.

(It's been pointed out that this could give an advantage to families of similar agents that can more easily prove things about one another by using existing knowledge and proofs).

Finally, somewhere there may be someone who created this agent - a programming intelligence. If so, that someone with their understanding of the agent's behavior may be able to create such a proof.

Do these considerations help in practice?

Comment author: warbo 11 June 2013 10:43:18AM 6 points [-]

We only need to perform proof search when we're given some unknown blob of code. There's no need to do a search when we're the ones writing the code; we know it's correct, otherwise we wouldn't have written it that way.

Admittedly many languages allow us to be very sloppy; we may not have to justify our code to the compiler and the language may not be powerful enough to express the properties we want. However there are some languages which allow this (Idris, Agda, Epigram, ATS, etc.). In such languages we don't actually write a program at all; we write down a constructive proof that our properties are satisfied by some program, then the compiler derives that program from the proof. Such programs are known as "correct by construction".

http://en.wikipedia.org/wiki/Program_derivation http://en.wikipedia.org/wiki/Proof-carrying_code

Comment author: Mestroyer 06 June 2013 12:04:12AM 1 point [-]

I'm not that familiar with Scheme, but is there some way to see how much stack space you have left and stop before an overflow?

Comment author: warbo 06 June 2013 04:02:38PM 0 points [-]

Scheme requires tail-call optimisation, so if you use tail-recursion then you'll never overflow.

Comment author: Qiaochu_Yuan 06 June 2013 01:44:23AM *  8 points [-]

(These were some comments I had on a slightly earlier draft than this, so the page numbers and such might be slightly off.)

Page 4, footnote 8: I don't think it's true that only stronger systems can prove weaker systems consistent. It can happen that system A can prove system B consistent and A and B are incomparable, with neither stronger than the other. For example, Gentzen's proof of the consistency of PA uses a system which is neither stronger nor weaker than PA.

Page 6: the hypotheses of the second incompleteness theorem are a little more restrictive than this (though not much, I think).

Page 11, problem c: I don't understand the sentence containing "highly regular and compact formula." Looks like there's a typo somewhere.

Comment author: warbo 06 June 2013 03:06:25PM *  1 point [-]

Page 4 footnote 8 in the version you saw looks like footnote 9 in mine.

I don't see how 'proof-of-bottom -> bottom' makes a system inconsistent. This kind of formula appears all the time in Type Theory, and is interpreted as "not(proof-of-bottom)".

The 'principle of explosion' says 'forall A, bottom -> A'. We can instantiate A to get 'bottom -> not(proof-of-bottom)', then compose this with "proof-of-bottom -> bottom" to get "proof-of-bottom -> not(proof-of-bottom)". This is an inconsistency iff we can show proof-of-bottom. If our system is consistent, we can't construct a proof of bottom so it remains consistent. If our system is inconsistent then we can construct a proof of bottom and derive bottom, so our system remains inconsistent.

Have I misunderstood this footnote?

[EDIT: Ignore me for now; this is of course Lob's theorem for bottom. I haven't convinced myself of the existence of modal fixed points yet though]

View more: Next