All of Anja's Comments + Replies

Super hard to say without further specification of the approximation method used for the physical implementation.

So I would only consider the formulation in terms of semimeasures to be satisfactory if the semimeasures are specific enough that the correct semimeasure plus the observation sequence is enough information to determine everything that's happening in the environment.

Can you make an example of a situation in which that would not be the case? I think the semimeasure AIXI and deterministic programs AIXI are pretty much equivalent, am I overlooking something here?

If we're going to allow infinite episodic utilities, we'll need some way of comparing how big

... (read more)
0AlexMennen
The nice thing about using programs is that a program not only determines what your observation will be, but also the entire program state at each time. That way you can care about, for instance, the head of the Turing machine printing 0s to a region of the tape that you can't see (making assumptions about how the UTM is implemented). I'm not sure how semimeasures are usually talked about in this context; if it's something like a deterministic program plus a noisy observation channel, then there's no problem, but if a semimeasure doesn't tell you what the program state history is, or doesn't even mention a program state, then a utility function defined over semimeasures doesn't give you a way to care about the program state history (aka events in the environment). I don't understand. If all the series we care about converge, then why would we need to be able to compare convergent series? That might end up being fairly limited. Academian points out that if you define a "weak preference" of X over Y as a preference such that X is preferred over Y but there exist outcomes Z and W such that for all probabilities p>0, pZ + (1-p)Y is preferred over pW + (1-p)X, and a "strong preference" as a preference that is not a weak preference, then strong preferences are archimedian by construction, so by the VNM utility theorem, a real-valued utility function describes your strong preferences even if you omit the archimedian axiom (i.e. u(X) > u(Y) means a strong preference for X over Y, and u(X) = u(Y) means either indifference or a weak preference one way or the other). Exact ties between utilities of different outcomes should be rare, and resolving them correctly is infinitely less important than resolving strong preferences correctly. The problem with this that I just thought of is that conceivably there could be no strong preferences (i.e. for any preference, there is some other preference that is infinitely stronger). I suspect that this would cause something to go horribly

I think you are proposing to have some hypotheses privileged in the beginning of Solomonoff induction, but not too much because the uncertainty helps fight wireheading by means of providing knowledge about the existence of an idealized, "true" utility function and world model. I that a correct summary? (Just trying to test whether I understand what you mean.)

In particular they can make positive use of wire-heading to reprogram themselves even if the basic architecture M doesn't allow it

Can you explain this more?

0Squark
I made some improvements to the formalism, see http://lesswrong.com/lw/cze/reply_to_holden_on_tool_ai/8fjb There I consider a stochastic model M and here a non-deterministic model, but the same principle can be applied here. Namely, we consider a Solomonoff process starting t0 time before formation of agent A, conditioned by observance of M's rules in the time before A's formation and by A's existence at time of its formation. The expected utility is computed with respect to the resulting distribution
1Squark
Yes, I think you got it more or less right. For p=0 we would just get a version of Legg-Hutter (AIXI) with limited computing resources (but duality problem preserved). For p > 0, no hypothesis is completely ruled out and the agent should be able to find the correct hypothesis given sufficient evidence, in particular it should be able to correct her assumptions regarding how her own mind works. Of course this requires the correct hypothesis to be sufficiently aligned with M's architecture for the agent to work at all. The utility function is actually built in from the starters, however if we like we can choose it to be something like a sum of external input bits with decaying weights (in order to ensure convergence), which would be in the spirit of the Legg-Hutter "reinforcement learning" approach. In particular the agent can discover that the true "physics" allow for reprogramming the agent, even though the initially assumed architecture M didn't allow it. In this case she can use it to reprogram herself for her own benefit. To draw a parallel, a human can perform brain surgery on herself because of her acquired knowledge about the physics of the universe and her brain and in principle she can use it to change the functioning of her brain in ways that are incompatible with her "intuitive" initial assumptions about her own mind

They just do interpersonal comparisons; lots of their ideas generalize to intrapersonal comparisons though.

I recommend the book "Fair Division and Collective Welfare" by H. J. Moulin, it discusses some of these problems and several related others.

1AlexMennen
That looks like it only discusses interpersonal utility comparisons. I don't see anything about intrapersonal utility comparison in the book description.

you forgot to multiply by 2^-l(q)

I think then you would count that twice, wouldn't you? Because my original formula already contains the Solomonoff probability...

1AlexMennen
Oh right. But you still want the probability weighting to be inside the sum, so you would actually need =\frac{1}{\xi\left(\dot{y}\dot{x}_{%3Ck}y\underline{x}_{k:m_{k}}\right)}\sum_{q:q(y_{1:m_k})=x_{1:m_k}}%20U(q,y_{1:m_k})2%5E{-\ell\left(q\right)}%0A)

Let's stick with delusion boxes for now, because assuming that we can read off from the environment whether the agent has wireheaded breaks dualism. So even if we specify utility directly over environments, we still need to master the task of specifying which action/environment combinations contain delusion boxes to evaluate them correctly. It is still the same problem, just phrased differently.

0AlexMennen
If I understand you correctly, that sounds like a fairly straightforward problem for AIXI to solve. Some programs q_1 will mimic some other program q_2's communication with the agent while doing something else in the background, but AIXI considers the possibilities of both q_1 and q_2.

I think there is something off with the formulas that use policies: If you already choose the policy

=y_{%3Ck}y_k)

then you cannot choose an y_k in the argmax.

Also for the Solomonoff prior you must sum over all programs

=x_{1:m_k}) .

Could you maybe expand on the proof of Lemma 1 a little bit? I am not sure I get what you mean yet.

0AlexMennen
The argmax comes before choosing a policy. In , there is already a value for y_k before you consider all the policies such that p(x_<k) = y_<k y_k. Didn't I do that? Look at any finite observation sequence. There exists some action you could output in response to that sequence that would allow you to get arbitrarily close to the supremum expected utility with suitable responses to the other finite observation sequences (for instance, you could get within 1/2 of the supremum). Now look at another finite observation sequence. There exists some action you could output in response to that, without changing your response to the previous finite observation sequence, such that you can get arbitrarily close to the supremum (within 1/4). Look at a third finite observation sequence. There exists some action you could output in response to that, without changing your responses to the previous 2, that would allow you to get within 1/8 of the supremum. And keep going in some fashion that will eventually consider every finite observation sequence. At each step n, you will be able to specify a policy that gets you within 2^-n of the supremum, and these policies converge to the policy that the agent actually implements. I hope that helps. If you still don't know what I mean, could you describe where you're stuck?

I like how you specify utility directly over programs, it describes very neatly how someone who sat down and wrote a utility function

)

would do it: First determine how the observation could have been computed by the environment and then evaluate that situation. This is a special case of the framework I wrote down in the cited article; you can always set

=\sum_{q:q(y_{1:m_k})=x_{1:m_k}}%20U(q,y_{1:m_k}))

This solves wireheading only if we can specify which environments contain wireheaded (non-dualistic) agents, delusion boxes, etc..

1AlexMennen
True, the U(program, action sequence) framework can be implemented within the U(action/observation sequence) framework, although you forgot to multiply by 2^-l(q) when describing how. I also don't really like the finite look-ahead (until m_k) method, since it is dynamically inconsistent. Not sure what you mean by that.

You are a wirehead if you consider your true utility function to be genetic fitness.

-1timtyler
Not according to most existing usage of the term.
4DanArmak
What makes a utility function "true"? If I choose to literally wirehead - implant electrodes - I can sign a statement saying I consider my "true" utility function to be optimized by wireheading. Does that mean I'm not wireheading in your sense?
-1brazil84
Well what else could it be? :)

To what extent does our response to Nozick's Experience Machine Argument typically reflect status quo bias rather than a desire to connect with ultimate reality?

I think the argument that people don't really want to stay in touch with reality but rather want to stay in touch with their past makes a lot of sense. After all we construct our model of reality from our past experiences. One could argue that this is another example of a substitute measure, used to save computational resources: Instead of caring about reality we care about our memories making sense and being meaningful.

On the other hand I assume I wasn't the only one mentally applauding Neo for swallowing the red pill.

What would happen if we set an algorithm inside the AGI assigning negative infinite utility to any action which modifies its own utility function and said algorithm itself?

There are several problems with this approach: First of all how do you specify all actions that modify the utility function? How likely do you think it is that you can exhaustively specify all sequences of actions that lead to modification of the utility function in a practical implementation? Experience with cryptography has taught us, that there is almost always some side channel at... (read more)

You might be right. I thought about this too, but it seemed people on LW had already categorized the experience machine as wireheading. If we rebrand, we should maybe say "self-delusion" instead of "pornography problem"; I really like the term "utility counterfeiting" though and the example about counterfeit money in your essay.

4davidpearce
"Utility counterfeiting" is a memorable term; but I wonder if we need a duller, less loaded expression to avoid prejudging the issue? After all, neuropathic pain isn't any less bad because it doesn't play any signalling role for the organism. Indeed, in some ways neuropathic pain is worse. We can't sensibly call it counterfeit or inauthentic. So why is bliss that doesn't serve any signalling function any less good or authentic? Provocatively expressed, evolution has been driven by the creation of ever more sophisticated counterfeit utilities that tend to promote the inclusive fitness of our genes. Thus e.g. wealth, power, status, maximum access to seemingly intrinsically sexy women of prime reproductive potential (etc) can seem inherently valuable to us. Therefore we want the real thing. This is an unsettling perspective because we like to think we value e.g. our friends for who they are rather than their capacity to trigger subjectively valuable endogenous opioid release in our CNS. But a mechanistic explanation might suggest otherwise.
3timtyler
Bill Hibbard apparently endorses using the wirehead terminology to refer to utility counterfeiting via sense data manipulation here. However, after looking at my proposal, I think it is fairly clear that the "wireheading" term should be reserved for the "simpleton gambit" of Ring and Orseau. I don't think my proposal represented a "rebranding". I do think you really have to invoke pornography or masturbation to describe the issue. I think "delusion" is the wrong word. A delusion is a belief held with conviction - despite evidence to the contrary. Masturbation or pornography do not require delusions.

The word "value" seems unnecessarily value-laden here.

Changed it to "number".

You are correct in pointing out that for human agents the evaluation procedure is not a deliberate calculation of expected utility, but some messy computation we have little access to. In many instances this can however be reasonably well translated into the framework of (partial) utility functions, especially if our preferences approximately satisfy transitivity, continuity and independence.

For noticing discrepancies between true and substitute utility it is not necessary to exactly know both functions, it suffices to have an icky feeling that tells you t... (read more)

There is also a more detailed paper by Lattimore and Hutter (2011) on discounting and time consistency that is interesting in that context.

-1mytyde
This is a very interesting paper. Reminds me of HIGHLANDER for some reason... those guys lived for thousands of years and weren't even rich? They hadn't usurped control of vast econo-political empires? No hundred-generations-long family of bodyguards?

I am starting to see what you mean. Let's stick with utility functions over histories of length m_k (whole sequences) like you proposed and denote them with a capital U to distinguish them from the prefix utilities. I think your Agent 4 runs into the following problem: modeled_action(n,m) actually depends on the actions and observations yx_{k:m-1} and needs to be calculated for each combination, so y_m is actually

)

which clutters up the notation so much that I don't want to write it down anymore.

We also get into trouble with taking the expectation, the ob... (read more)

2AlexMennen
Yes. Oops, you are right. The sum should have been over x_{k:n}, not just over x_k. Yes, that is a cleaner and actually correct version what I was trying to describe. Thanks.

I second the general sentiment that it would be good for an agent to have these traits, but if I follow your equations I end up with Agent 2.

3AlexMennen
No, you don't. If you tried to represent Agent 2 in that notation, you would get modeled_action(n, k) = argmax(y_k) sum(x_k) [u_k(yx_k. You were using u_k to represent the utility of the last step of its input, so that total utility is the sum of the utilities of its prefixes, while I was using u_k to represent the utility of the whole sequence. If I adapt Agent 4 to your use of u_k, I get modeled_action(n, k) = argmax(y_k) sum(x_k) [u_k(yx_k.

First, replace the action-perception sequence with an action-perception-utility sequence u1,y1,x1,u2,y2,x2,etc.

This seems unnecessary. The information u_i is already contained in x_i.

modeled_action(n, k) = argmax(y_k) uk(yx\<k, yx_k:n)*M(uyx_<k, uyx_k:n)

This completely breaks the expectimax principle. I assume you actually mean something like =\textrm{arg}\max_{y_k}\sum_{x_k}u_k(\.{y}\.{x}_{%3Ck}y\underline{x}_{k:n})M(\.{y}\.{x}_{%3Ck}y\underline{x}_{k:n}))

which is just Agent 2 in disguise.

0AlexMennen
Oops. Yes, that's what I meant. But it is not the same as Agent 2, because this (Agent 4?) uses its current utility function to evaluate the desirability of future observations and actions, even though it knows that it will use a different utility function to choose between them later. For example, Agent 4 will not take the Simpleton's Gambit because it cares about its current utility function getting satisfied in the future, not about its future utility function getting satisfied in the future. Agent 4 can be seen as a set of agents, one for each possible utility function, that are using game theory with each other.

This generalizes to the horizon problem: If at time k you only look ahead to time step m_k but have unlimited life span you will make infinitely large mistakes.

I would assume that it is not smart enough to forsee its own future actions and therefore dynamically inconsistent. The original AIXI does not allow for the agent to be part of the environment. If we tried to relax the dualism then your question depends strongly on the approximation to AIXI we would use to make it computable. If this approximation can be scaled down in a way such that it is still a good estimator for the agent's future actions, then maybe an environment containing a scaled down, more abstract AIXI model will, after a lot of observations, become one of the consistent programs with lowest complexity. Maybe. That is about the only way I can imagine right now that we would not run into this problem.

0Manfred
Thanks, that helps.

I am pretty sure that Agent 2 will wirehead on the Simpleton Gambit, depending heavily on the number of time cycles to follow, the comparative advantage that can be gained from wireheading and the negative utility the current utility function assigns to the change.

Agent 1 will have trouble modeling how its decision to change its utility function now will influence its own decisions later, as described in AIXI and existential despair. So basically the two futures look very similar to the agent except that for the part where the screen says something differe... (read more)

1Manfred
Ah, right, that abstraction thing. I'm still fairly confused by it. Maybe a simple game will help see what's going on. The simple game can be something like a two-step choice. At time T1, the agent can send either A or B. Then at time T2, the agent can send A or B again, but its utility function might have changed in between. For the original utility function, our payoff matrix looks like AA: 10, AB: -1, BA: 0, BB: 1. So if the utility function didn't change, the agent would just send A at time T1 and A at time T2, and get a reward of 10. But suppose in between T1 and T2, a program predictably changes the agent's payoff matrix, as stored in memory, to AA: -1, AB: 10, BA: 0, BB: 1. Now if the agent sent A at time T1, it will send B at time T2, to claim the new payoff for AB of 10 units. Even though AB is lowest on the preference ordering of the agent at T1. So if our agent is clever, it sends B at time T1 rather than A, knowing that the future program will also pick B, leading to an outcome (BB, for a reward of 1) that the agent at T1 prefers to AB. So, is our AIXI Agent 1 clever enough to do that?
1timtyler
Be warned that that post made practically no sense - and surely isn't a good reference.

I am quite sure that pareto optimality is untouched by the proposed changes, but I haven't written down a proof yet.

Took the survey. Does the god question include simulators? I answered under the assumption that it did not.

5tgb
I, for one, answered assuming that it does include simulators. I do not know what ontologically basic mental events are and didn't bother to look it up.

I assumed the same, based on the definition of "god" as "supernatural" and the definition of "supernatural" as "involving ontologically basic mental entities."

(Oh, and for anyone who hasn't read the relevant post, the survey is quoting this.)

8gwern
I'm pretty sure it doesn't. At least, if it does I have no idea what the 'ontologically basic mental events' qualifiers were about...