Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

In response to comment by Larry_D'Anna on GAZP vs. GLUT
Comment author: bigjeff5 03 February 2011 06:50:42PM *  0 points [-]

Because consciousness is precluded in the thought experiment. The whole idea is that the Zombie World is identical in every way - except it doesn't have this ephemeral consciousness thing.

Therefore the GLUT cannot be conscious, by the very design of the thought experiment it cannot be so. Yet there isn't any logical explanation for the behavior of the zombies without something, somewhere, that is conscious to drive them. That's why the GLUT came into the discussion in the first place - something has to tell the zombies what to do, and that something must be conscious (except it can't be, because the thought experiment precludes it).

Thus, an identical world without consciousness is inconceivable.

In response to comment by bigjeff5 on GAZP vs. GLUT
Comment author: Nebu 11 August 2016 04:07:52AM 0 points [-]

So does that mean a GLUT in the zombie world cannot be conscious, but a GLUT in our world (assuming infinite storage space, since apparently we were able to assume that for the zombie world) can be conscious?

Comment author: jollybard 28 April 2016 01:45:39AM *  0 points [-]

I can think of many situations where a zero prior gives rise to tangibly different behavior, and even severe consequences. To take your example, suppose that we (or Omega, since we're going to assume nigh omniscience) asked the person whether JFK was murdered by Lee Harvey Oswald or not, and if they get it wrong, then they are killed/tortured/dust-specked into oblivion/whatever. (let's also assume that the question is clearly defined enough that the person can't play with definitions and just say that God is in everyone and God killed JFK)

However, let me steelman this a bit by somewhat moving the goalposts: if we allow a single random belief to have P=0, then it seems very unlikely that it will have a serious effect. I guess that the above scenario would require that we know that the person has P=0 about something (or have Omega exist), which, if we agree that such a belief will not have much empirical effect, is almost impossible to know. So that's also unlikely.

Comment author: Nebu 28 April 2016 03:09:04AM 0 points [-]

suppose that we (or Omega, since we're going to assume nigh omniscience) asked the person whether JFK was murdered by Lee Harvey Oswald or not, and if they get it wrong, then they are killed/tortured/dust-specked into oblivion/whatever.

Okay, but what is the utility function Omega is trying to optimize?

Let's say you walk up to Omega, tell it "was JFK murdered by Lee Harvey Oswald or not? And by the way, if you get this wrong, I am going to kill you/torture you/dust-spec you."

Unless we've figured out how to build safe oracles, with very high probability, Omega is not a safe oracle. Via https://arbital.com/p/instrumental_convergence/, even though Omega may or may not care if it gets tortured/dust-speced, we can assume it doesn't want to get killed. So what is it going to do?

Do you think it's going to tell you what it thinks is the true answer? Or do you think it's going to tell you the answer that will minimize the risk of it getting killed?

Comment author: Dentin 12 February 2016 01:59:42AM -1 points [-]

Plaintext reading of the metaphor suggests attempted rape? WTF?

Comment author: Nebu 09 April 2016 08:41:37PM 2 points [-]

I also inferred rape from the story. It was the part about how in desperation, he reached out and grabbed at her ankle. And then he was imprisoned in response to that.

Comment author: JGWeissman 05 July 2012 04:52:35PM 0 points [-]

If the tool is not sufficiently reflective to recommend improvements to itself, it will never become a worthy substituted for FAI. This case is not interesting.

If the tool is sufficiently reflective to recommend improvements to itself, it will recommend that it be modified to just implement its proposed policies instead of printing them. So we would not actually implement that policy. But what then makes it recommend a policy that we will actually want to implement? What tweak to the program should we apply in that situation?

Comment author: Nebu 17 February 2016 11:28:40AM 0 points [-]

But what then makes it recommend a policy that we will actually want to implement?

First of all, I'm assuming that we're taking as axiomatic that the tool "wants" to improve itself (or else why would it have even bothered to consider recommending that it be modified to improve itself?); i.e. improving itself is favorable according to its utility function.

Then: It will recommend a policy that we will actually want to implement, because its model of the universe includes our minds and it can see that if it recommends a policy we will actually want to implement leads it to a higher ranked state in its utility function.

Comment author: wedrifid 05 July 2012 06:13:31PM 2 points [-]

If we were smart enough to understand its policy, then it would not be smart enough to be dangerous.

That doesn't seem true. Simple policies can be dangerous and more powerful than I am.

Comment author: Nebu 17 February 2016 11:24:41AM 0 points [-]

To steelman the parent argument a bit, a simple policy can be dangerous, but if an agent proposed a simple and dangerous policy to us, we probably would not implement it (since we could see that it was dangerous), and thus the agent itself would not be dangerous to us.

If the agent were to propose a policy that, as far as we could tell, appears safe, but was in fact dangerous, then simultaneously:

  1. We didn't understand the policy.
  2. The agent was dangerous to us.
Comment author: itaibn0 06 January 2014 11:43:20PM 0 points [-]

I believe AIXI is much more inspectable than you make it out to be. I think it is important to challenge your claim here because Holden appears to have trusted your expertise and hereby concede an important part of the argument.

AIXI's utility judgements are based a Solomonoff prior, which are based on the computer programs which return the input data. Computer programs are not black-boxes. A system implementing AIXI can easily also return a sample of typical expected future histories and the programs compressing these histories. By examining these programs, we can figure out what implicit model the AIXI system has of its world. These programs are optimized for shortness so they are likely to be very obfuscated, but I don't expect them to be incomprehensible (after all, they're not optimized for incomprehensibility). Even just sampling expected histories without their compressions is likely to be very informative. In the case of AIXItl the situation is better in the sense that it's output at any give time is guaranteed to be generated by just one length <l subprogram, and this subprogram comes with a proof justifying its utility judgement. It's also worse in that there is no way to sample its expected future histories. However, I expect the proof provided would implicitly contain such information. If either the programs or the proofs cannot be understood by humans, the programmers can just reject them and look at the next best candidates.

As for "What will be its effect on _?", this can be answered as well. I already stated that with AIXI you can sample future histories. This is because AIXI has a specific known prior it implements for its future histories, namely Solomonoff induction. This ability may seem limited because it only shows the future sensory data, but sensory data can be whatever you feed AIXI as input. If you want it to a have a realistic model of the world, this includes a lot of relevant information. For example, if you feed it the entire database of Wikipedia, it can give likely future versions of Wikipedia which already provides a lot of details on the effect of its actions.

Comment author: Nebu 17 February 2016 11:15:09AM 1 point [-]

Can you be a bit more specific in your interpretation of AIXI here?

Here are my assumptions, let me know where you have different assumptions:

  • Traditional-AIXI is assumed to exists in the same universe as the human who wants to use AIXI to solve some problem.
  • Traditional-AIXI has a fixed input channel (e.g. it's connected to a webcam, and/or it receives keyboard signals from the human, etc.)
  • Traditional-AIXI has a fixed output channel (e.g. it's connected to a LCD monitor, or it can control a robot servo arm, or whatever).
  • The human has somehow pre-provided Traditional-AIXI with some utility function.
  • Traditional-AIXI operates in discrete time steps.
  • In the first timestep that elapses since Traditional-AIXI is activated, Traditional-AIXI examines the input it receives. It considers all possible programs that take pair (S, A) and emits an output P, where S is the prior state, A is an action to take, and P is the predicted output of taking the action A in state S. Then it discards all programs that would not have produced the input it received, regardless of what S or A it was given. Then it weighs the remaining program according to their Kolmorogov complexity. This is basically the Solomonoff induction step.
  • Now Traditional-AIXI has to make a decision about an output to generate. It considers all possible outputs it could produce, and feeds it to the programs under consideration, to produce a predicted next time step. Traditional-AIXI then calculates the expected utility of each output (using its pre-programmed utility function), picks the one with the highest utility, and emits that output. Note that it has no idea how any of its outputs would the universe, so this is essentially a uniformly random choice.
  • In the next timestep, Traditional-AIXI reads its inputs again, but this time taking into account what output it has generated in the previous step. It can now start to model correlation, and eventually causation, between its input and outputs. It has a previous state S and it knows what action A it took in its last step. It can further discard more programs, and narrow the possible models that describes the universe it finds itself in.

How does Tool-AIXI work in contrast to this? Holden seems to want to avoid having any utility function pre-defined at all. However, presumably Tool-AIXI still receives inputs and still produces outputs (probably Holden intends not to allow Tool-AIXI to control a robot servo arm, but he might intend for Tool-AIXI to be able to control an LCD monitor, or at the very least, produce some sort of text file as output).

Does Tool-AIXI proceed in discrete time steps gathering input? Or do we prevent Tool-AIXI from running until a user is ready to submit a curated input to Tool-AIXI? If the latter, how quickly to we expect Tool-AIXI to be able to formulate an reasonable model of our universe?

How does Tool-AIXI choose what output to produce, if there's no utility function?

If we type in "Tool-AIXI, please give me a cure for cancer" onto a keyboard attached to Tool-AIXI and submit that as an input, do we think that a model that encodes ASCII, the English language, bio-organisms, etc. has a lower kolmogorov complexity than a model that says "we live in a universe where we receive exactly this hardcoded stream of bytes"?

Does Tool-AIXI model the output it produces (whether that be pixels on a screen, or bytes to a file) as an action, or does it somehow prevent itself from modelling its output as if it were an action that had some effect on the universe that it exists in? If the former, then isn't this just an agenty Oracle AI? If the latter, then what kind of programs is it generate for its model (surely not programs that take (S, A) pairs as inputs, or else what would it use for A when evaluating its plans and predicting the future)?

Comment author: TheOtherDave 13 July 2012 02:42:28PM 2 points [-]

Regarding #3: what happens given a directive like "Over there are a bunch of people who report sensory experiences of the kind I'm interested in. Figure out what differentially caused those experiences, and maximize the incidence of that."?

(I'm not concerned with the specifics of my wording, which undoubtedly contains infinite loopholes; I'm asking about the general strategy of, when all I know is sensory experiences, referring to the differential causes of those experiences, whatever they may be. Which, yes, I would expect to include, in the case where there actually are no gliders and the recurring perception of gliders is the result of a glitch in my perceptual system, modifying my perceptual system to make such glitches more likely... but which I would not expect to include, in the case where my perceptual system is operating essentially the same way when it perceives gliders as when it perceives everything else, modifying my perceptual system to include such glitches (since such a glitch is not the differential cause of experiences of gliders in the first place.))

Comment author: Nebu 17 February 2016 10:12:43AM 0 points [-]

I think LearnFun might be informative here. https://www.youtube.com/watch?v=xOCurBYI_gY

LearnFun watches a human play an arbitrary NES games. It is hardcoded to assume that as time progresses, the game is moving towards a "better and better" state (i.e. it assumes the player's trying to win and is at least somewhat effective at achieving its goals). The key point here is that LearnFun does not know ahead of time what the objective of the game is. It infers what the objective of the game is from watching humans play. (More technically, it observes the entire universe, where the entire universe is defined to be the entire RAM content of the NES).

I think there's some parallels here with your scenario where we don't want to explicitly tell the AI what our utility function is. Instead, we're pointing to a state, and we're saying "This is a good state" (and I guess either we'd explicitly tell the AI "and this other state, it's a bad state" or we assume the AI can somehow infer bad states to contrast the good states from), and then we ask the AI to come up with a plan (and possibly execute the plan) that would lead to "more good" states.

So what happens? Bit of a spoiler, but sometimes the AI seems to make a pretty good inference for what the utility function a human would probably have had for a given NES game, but sometimes it makes a terrible inference. It never seems to make a "perfect" inference: the even in its best performance, it seems to be optimizing very strange things.

The other part of it is that even if it does have a decent inference for the utility function, it's not always good at coming up with a plan that will optimize that utility function.

Comment author: oooo 06 July 2013 05:50:31PM *  6 points [-]

Canonical software development examples emphasizing "proving safety/usefulness before running" over the "tool" software development approach are cryptographic libraries and NASA space shuttle navigation.

At the time of writing this comment, there was recent furor over software called CryptoCat that didn't provide enough warnings that it was not properly vetted by cryptographers and thus should have been assumed to be inherently insecure. Conventional wisdom and repeated warnings from the security community state that cryptography is extremely difficult to do properly and attempting to create your own may result in catastrophic results. A similar thought and development process goes into space shuttle code.

It seems that the FAI approach to "proving safety/usefulness" is more similar to the way cryptographic algorithms are developed than the (seemingly) much faster "tool" approach, which is more akin to web development where the stakes aren't quite as high.

EDIT: I believe the "prove" approach still allows one to run snippets of code in isolation, but tends to shy away from running everything end-to-end until significant effort has gone into individual component testing.

Comment author: Nebu 17 February 2016 09:35:03AM 1 point [-]

The analogy with cryptography is an interesting one, because...

In cryptography, even after you've proven that a given encryption scheme is secure, and that proof has been centuply (100 times) checked by different researchers at different institutions, it might still end up being insecure, for many reasons.

Examples of reasons include:

  • The proof assumed mathematical integers/reals, of which computer integers/floating point numbers are just an approximation.
  • The proof assumed that the hardware the algorithm would be running on was reliable (e.g. a reliable source of randomness).
  • The proof assumed operations were mathematical abstractions and thus exist out of time, and thus neglected side channel attacks which measures how long a physical real world CPU took to execute a the algorithm in order to make inferences as to what the algorithm did (and thus recover the private keys).
  • The proof assumed the machine executing the algorithm was idealized in various ways, when in fact a CPU emits heat other electromagnetic waves, which can be detected and from which inferences can be drawn, etc.
Comment author: [deleted] 27 April 2011 03:20:15PM 0 points [-]

Löwenheim-Skolem only applies to first-order theories. While there are models of the theory of real closed fields that are countable, referring to those models as "the real numbers" is somewhat misleading, because there isn't only one of them (up to model-theoretic isomorphism).

Also, if you're going to measure information content, you really need to fix a formal language first, or else "the number of bits needed to express X" is ill-defined.

Basically, learn model theory before trying to wield it.

In response to comment by [deleted] on The Pascal's Wager Fallacy Fallacy
Comment author: Nebu 01 February 2016 04:18:06AM 0 points [-]

Also, if you're going to measure information content, you really need to fix a formal language first, or else "the number of bits needed to express X" is ill-defined.

Basically, learn model theory before trying to wield it.

I don't know model theory, but isn't the crucial detail here whether or not the number of bits needed to express X is finite or infinite? If so, then it seems we can handwave the specific formal language we're using to describe X, in the same way that we can handwave what encoding for Turing Machines generally when talking about Kolmogorov complexity, even though to actually get a concrete integer K(S) representing the Kolmogorovo complexity of a string S requires us to use a fixed encoding of Turing Machines. In practice, we never actually care what the number K(S) is.

Comment author: Lumifer 18 December 2015 04:04:07PM 0 points [-]

seems pretty rational, given his writings

If you define rationality as winning, why does it matter what his writings seem like?

Comment author: Nebu 24 January 2016 10:47:50PM 1 point [-]

I can't directly observe Eliezer winning or losing, but I can make (perhaps very weak) inferences about how often he wins/loses given his writing.

As an analogy, I might not have the opportunity to play a given videogame ABC against a given blogger XYZ that I've never met and will never meet. But if I read his blog posts on ABC strategies, and try to apply them when I play ABC, and find that my win-rate vastly improves, I can infer that XYZ also probably wins often (and probably wins more often than I do).

View more: Next