AnnaSalamon

Sequences

Decision Theory: Newcomb's Problem

Comments

Sorted by

I got to the suggestion by imagining: suppose you were about to quit the project and do nothing.  And now suppose that instead of that, you were about to take a small amount of relatively inexpensive-to-you actions, and then quit the project and do nothing.  What're the "relatively inexpensive-to-you actions" that would most help?

Publishing the whole list, without precise addresses or allegations, seems plausible to me.

I guess my hope is: maybe someone else (a news story, a set of friends, something) would help some of those on the list to take it seriously and take protective action, maybe after awhile, after others on the list were killed or something.  And maybe it'd be more parsable to people if had been hanging out on the internet for a long time, as a pre-declared list of what to worry about, with visibly no one being there to try to collect payouts or something.

Maybe some of those who received the messages were more alert to their surroundings after receiving it, even if they weren't sure it was real and didn't return the phone/email/messages?

I admit this sounds like a terrible situation.

Gotcha.  No idea if this is a good or bad idea, but: what are your thoughts on dumping an edited version of it onto the internet, including names, photos and/or social media links, and city/country but not precise addresses or allegations?

Can you notify the intended victims?  Or at least the more findable intended victims?

  • A man being deeply respected and lauded by his fellow men, in a clearly authentic and lasting way, seems to be a big female turn-on. Way way way bigger effect size than physique best as I can tell.
    • …but the symmetric thing is not true! Women cheering on one of their own doesn't seem to make men want her more. (Maybe something else is analogous, the way female "weight lifting" is beautification?)

My guess at the analogous thing: women being kind/generous/loving seems to me like a thing many men have found attractive across times and cultures, and seems to me far more viable if a woman is embedded in a group who recognize her, tell her she is cared about and will be protected by a network of others, who in fact shield her from some kinds of conflict/exploitation, who help there be empathy for her daily cares and details to balance out the attentional flow of these she gives to others, etc.  So the group plays a support role in a woman being able to have/display the quality.

Steven Brynes wrotes:

 "For example, I expect that AGIs will be able to self-modify in ways that are difficult for humans (e.g. there’s no magic-bullet super-Adderall for humans), which impacts the likelihood of your (1a)."

My (1a) (and related (1b)), for reference:

(1a) “You” (the decision-maker process we are modeling) can choose anything you like, without risk of losing control of your hardware.  (Contrast case: if the ruler of a country chooses unpopular policies, they are sometimes ousted.  If a human chooses dieting/unrewarding problems/social risk, they sometimes lose control of themselves.)

1b) There are no costs to maintaining control of your mind/hardware.  (Contrast case: if a company hires some brilliant young scientists to be creative on its behalf, it often has to pay a steep overhead if it additionally wants to make sure those scientists don’t disrupt its goals/beliefs/normal functioning.)

I'm happy to posit an AGI with powerful ability to self-modify.  But, even so, my (nonconfident) guess is that it won't have property (1a), at least not costlessly.

My admittedly handwavy reasoning:

  • Self-modification doesn't get you all powers: some depend on the nature of physics/mathematics.  E.g. it may still be that verifying a proof is easier than generating a proof, for our AGI.
  • Intelligence involves discovering new things, coming into contact with what we don't specifically expect (that's why we bother to spend compute on it).  Let's assume our powerful AGI is still coming into contact with novel-to-it mathematics/empirics/neat stuff.   Questions are: is it (possible at all / possible at costs worth paying) to anticipate enough about what it will uncover that it can prevent the new things from destablilizing its centralized goals/plans/["utility function" if it has one]?  I... am really not sure what the answers to these questions are, even for powerful AGI that has powerfully self-modified!  There are maybe alien-to-it AGIs out there  encoded in mathematics, waiting to boot up within it as it does its reasoning.

I just paraphrased the OP for a friend who said he couldn't decipher it.  He said it helped, so I'm copy-pasting here in case it clarifies for others.

I'm trying to say:

A) There're a lot of "theorems" showing that a thing is what agents will converge on, or something, that involve approximations ("assume a frictionless plane") that aren't quite true.

B) The "VNM utility theorem" is one such theorem, and involves some approximations that aren't quite true.  So does e.g. Steve Omohundro's convergent instrumental drives, the "Gandhi folk theorems" showing that an agent will resist changes to its utility function, etc.

C) So I don't think the VNM utility theorem means that all minds will necessarily want to become VNM agents, nor to follow instrumental drives, nor to resist changes to their "utility functions" (if indeed they have a "utility function").

D) But "be a better VNM-agent" "follow the instrumental Omohundro drives" etc. might still be a self-fulfilling prophecy for some region, partially.  Like, humans or other entities who think its rational to be VNM agents might become better VNM agents, who might become better VNM agents, for awhile.

E) And there might be other [mathematically describable mind-patterns] that can serve as alternative self-propagating patterns, a la D, that're pretty different from "be a better VNM-agent."  E.g. "follow the god of nick land".

F) And I want to know what are all the [mathematically describable mind-patterns, that a mind might decide to emulate, and that might make a kinda-stable attractor for awhile, where the mind and its successors keeps emulating that mind-pattern for awhile].  They'll probably each have a "theorem" attached that involves some sort of approximation (a la "assume a frictionless plane").

There is a problem that, other things equal, agents that care about the state of the world in the distant future, to the exclusion of everything else, will outcompete agents that lack that property. This is self-evident, because we can operationalize “outcompete” as “have more effect on the state of the world in the distant future”.

I am not sure about that!

One way this argument could fail: maybe agents who  care exclusively about the state of the world in the distant future end up, as part of their optimizing, creating other agents who care in different ways from that.

In that case, they would “have more effect on the state of the world in the distant future”, but they might not “outcompete” other agents (in the common-sensical way of understanding “outcompete”).

A person might think this implausible, because they might think that a smart agent who cares exclusively about X can best achieve X by having all minds they create also be [smart agents who care exclusively about X.

But, I’m not sure this is true, basically for reasons of not trusting assumptions (1), (2), (3), and (4) that I listed here.

(As one possible sketch: a mind whose only goal is to map branch B of mathematics might find it instrumentally useful to map a bunch of other branches of mathematics.  And, since supervision is not free, it might be more able to do this efficiently if it creates researchers who have an intrinsic interest in math-in-general, and who are not being fully supervised by exclusively-B-interested minds.)

or more centrally, long after I finish the course of action.

I don't understand why the more central thing is "long after I finish the course of action" as opposed to "in ways that are clearly 'external to' the process called 'me', that I used to take the actions."

Load More