AnnaSalamon

Sequences

Decision Theory: Newcomb's Problem

Comments

Sorted by
  • A man being deeply respected and lauded by his fellow men, in a clearly authentic and lasting way, seems to be a big female turn-on. Way way way bigger effect size than physique best as I can tell.
    • …but the symmetric thing is not true! Women cheering on one of their own doesn't seem to make men want her more. (Maybe something else is analogous, the way female "weight lifting" is beautification?)

My guess at the analogous thing: women being kind/generous/loving seems to me like a thing many men have found attractive across times and cultures, and seems to me far more viable if a woman is embedded in a group who recognize her, tell her she is cared about and will be protected by a network of others, who in fact shield her from some kinds of conflict/exploitation, who help there be empathy for her daily cares and details to balance out the attentional flow of these she gives to others, etc.  So the group plays a support role in a woman being able to have/display the quality.

Steven Brynes wrotes:

 "For example, I expect that AGIs will be able to self-modify in ways that are difficult for humans (e.g. there’s no magic-bullet super-Adderall for humans), which impacts the likelihood of your (1a)."

My (1a) (and related (1b)), for reference:

(1a) “You” (the decision-maker process we are modeling) can choose anything you like, without risk of losing control of your hardware.  (Contrast case: if the ruler of a country chooses unpopular policies, they are sometimes ousted.  If a human chooses dieting/unrewarding problems/social risk, they sometimes lose control of themselves.)

1b) There are no costs to maintaining control of your mind/hardware.  (Contrast case: if a company hires some brilliant young scientists to be creative on its behalf, it often has to pay a steep overhead if it additionally wants to make sure those scientists don’t disrupt its goals/beliefs/normal functioning.)

I'm happy to posit an AGI with powerful ability to self-modify.  But, even so, my (nonconfident) guess is that it won't have property (1a), at least not costlessly.

My admittedly handwavy reasoning:

  • Self-modification doesn't get you all powers: some depend on the nature of physics/mathematics.  E.g. it may still be that verifying a proof is easier than generating a proof, for our AGI.
  • Intelligence involves discovering new things, coming into contact with what we don't specifically expect (that's why we bother to spend compute on it).  Let's assume our powerful AGI is still coming into contact with novel-to-it mathematics/empirics/neat stuff.   Questions are: is it (possible at all / possible at costs worth paying) to anticipate enough about what it will uncover that it can prevent the new things from destablilizing its centralized goals/plans/["utility function" if it has one]?  I... am really not sure what the answers to these questions are, even for powerful AGI that has powerfully self-modified!  There are maybe alien-to-it AGIs out there  encoded in mathematics, waiting to boot up within it as it does its reasoning.

I just paraphrased the OP for a friend who said he couldn't decipher it.  He said it helped, so I'm copy-pasting here in case it clarifies for others.

I'm trying to say:

A) There're a lot of "theorems" showing that a thing is what agents will converge on, or something, that involve approximations ("assume a frictionless plane") that aren't quite true.

B) The "VNM utility theorem" is one such theorem, and involves some approximations that aren't quite true.  So does e.g. Steve Omohundro's convergent instrumental drives, the "Gandhi folk theorems" showing that an agent will resist changes to its utility function, etc.

C) So I don't think the VNM utility theorem means that all minds will necessarily want to become VNM agents, nor to follow instrumental drives, nor to resist changes to their "utility functions" (if indeed they have a "utility function").

D) But "be a better VNM-agent" "follow the instrumental Omohundro drives" etc. might still be a self-fulfilling prophecy for some region, partially.  Like, humans or other entities who think its rational to be VNM agents might become better VNM agents, who might become better VNM agents, for awhile.

E) And there might be other [mathematically describable mind-patterns] that can serve as alternative self-propagating patterns, a la D, that're pretty different from "be a better VNM-agent."  E.g. "follow the god of nick land".

F) And I want to know what are all the [mathematically describable mind-patterns, that a mind might decide to emulate, and that might make a kinda-stable attractor for awhile, where the mind and its successors keeps emulating that mind-pattern for awhile].  They'll probably each have a "theorem" attached that involves some sort of approximation (a la "assume a frictionless plane").

There is a problem that, other things equal, agents that care about the state of the world in the distant future, to the exclusion of everything else, will outcompete agents that lack that property. This is self-evident, because we can operationalize “outcompete” as “have more effect on the state of the world in the distant future”.

I am not sure about that!

One way this argument could fail: maybe agents who  care exclusively about the state of the world in the distant future end up, as part of their optimizing, creating other agents who care in different ways from that.

In that case, they would “have more effect on the state of the world in the distant future”, but they might not “outcompete” other agents (in the common-sensical way of understanding “outcompete”).

A person might think this implausible, because they might think that a smart agent who cares exclusively about X can best achieve X by having all minds they create also be [smart agents who care exclusively about X.

But, I’m not sure this is true, basically for reasons of not trusting assumptions (1), (2), (3), and (4) that I listed here.

(As one possible sketch: a mind whose only goal is to map branch B of mathematics might find it instrumentally useful to map a bunch of other branches of mathematics.  And, since supervision is not free, it might be more able to do this efficiently if it creates researchers who have an intrinsic interest in math-in-general, and who are not being fully supervised by exclusively-B-interested minds.)

or more centrally, long after I finish the course of action.

I don't understand why the more central thing is "long after I finish the course of action" as opposed to "in ways that are clearly 'external to' the process called 'me', that I used to take the actions."

I was trying to explain to Habryka why I thought (1), (3) and (4) are parts of the assumptions under which the VNM utility theorem is derived.

I think all of (1), (2), (3) and (4) are part of the context I've usually pictured in understanding VNM as having real-world application, at least.  And they're part of this context because I've been wanting to think of a mind as having persistence, and persistent preferences, and persistent (though rationally updated) beliefs about what lotteries of outcomes can be chosen via particular physical actions, and stuff.  (E.g., in Scott's example about the couple, one could say "they don't really violate independence; they just care also about process-fairness" or something, but, ... it seems more natural to attach words to real-world scenarios in such a way as to say the couple does violate independence.  And when I try to reason this way, I end up thinking that all of (1)-(4) are part of the most natural way to try to get the VNM utility theorem to apply to the world with sensible, non-Grue-like word-to-stuff mappings.)

I'm not sure why Habryka disagrees.  I feel like lots of us are talking past each other in this subthread, and am not sure how to do better.

I don't think I follow your (Mateusz's) remark yet.

I... don't think I'm taking the hidden order of the universe non-seriously.  If it matters, I've been obsessively rereading Christopher Alexander's "The nature of order" books, and trying to find ways to express some of what he's looking at in LW-friendly terms; this post is part of an attempt at that.  I have thousands and thousands of words of discarded drafts about it.

Re: why I think there might be room in the universe for multiple aspirational models of agency, each of which can be self-propagating for a time, in some contexts: Biology and culture often seem to me to have multiple kinda-stable equilibria.  Like, eyes are pretty great, but so is sonar, and so is a sense of smell, or having good memory and priors about one's surroundings, and each fulfills some of the same purposes.  Or diploidy and haplodiploidy are both locally-kinda-stable reproductive systems.

What makes you think I'm insufficiently respecting the hidden order of the universe?

I agree.  I love "Notes on the synthesis of form" by Christopher Alexander, as a math model of things near your vase example.

I agree with your claim that VNM is in some ways too lax.

vNM is .. too restrictive ... [because] vNM requires you to be risk-neutral. Risk aversion violates preferences being linear in probability ... Many people desperately want risk aversion, but that's not the vNM way.

Do many people desperately want to be risk averse about the probability a given outcome will be achieved?  I agree many people want to be loss averse about e.g. how many dollars they will have.  Scott Garrabrant provides an example in which a couple wishes to be fair to its members via compensating for other scenarios in which things would've been done the husband's way (even though those scenarios did not 
Scott's example is ... sort of an example of risk aversion about probabilities?  I'd be interested in other examples if you have them.

Load More