Elliot_Olds — LessWrong

LESSWRONG
LW

Replying toDiscussion with Eliezer Yudkowsky on AGI interventions

Discussion with Eliezer Yudkowsky on AGI interventions

Steve Omohundro says:

"1) Nobody powerful wants to create unsafe AI but they do want to take advantage of AI capabilities.
2) None of the concrete well-specified valuable AI capabilities require unsafe behavior"

I think a lot of powerful people / organizations do want take advantage of possibly unsafe AI capabilities, such as ones that would allow them to be the emperors of the universe for all time. Especially if not doing so means that their rivals have a higher chance of becoming the emperors of the universe.

Replying to"Should Blackmail Be Legal" Hanson/Zvi Debate (Sun July 26th, 3pm PDT)

Elliot_Olds5y

"Should Blackmail Be Legal" Hanson/Zvi Debate (Sun July 26th, 3pm PDT)

Hi Ben. Are you still planning to post a transcript sometime?

Replying toWhy isn't the following decision theory optimal?

Elliot_Olds11y

Why isn't the following decision theory optimal?

Thanks. I had one question about your Toward Idealized Decision Theory paper.

I can't say I fully understand UDT, but the 'updateless' part does seem very similar to the "act as if you had precommitted to any action that you'd have wanted to precommit to" core idea of NDT. It's not clear to me that the super powerful UDT would make the wrong decision in the game where two players pick numbers between 0-10 and get payouts based on their pick and the total sum.

Wouldn't the UDT reason as follows? "If my algorithm were such that I wouldn't just pick 1 when the human player forced me into it by picking 9... (read more)

Replying toWhy isn't the following decision theory optimal?

Elliot_Olds11y

Why isn't the following decision theory optimal?

An AI should certainly cooperate if it discovered that by chance its opposing AI had identical source code.

I read your paper and the two posts in your short sequence. Thanks for the links. I still think it's very unlikely that one of the AIs in your original hypothetical (when they don't examine each other's source code) would do better by defecting.

I accept that if an opposing AI had a model of you that was just decent but not great, then there is some amount of logical connection there. What I haven't seen is any argument about the shape of the graph of logical connection strength vs similarity of entities. I hypothesize that for any two humans who exist today, if you put them in a one shot PD, the logical connection is negligible.

Has anyone written specifically on how exactly to give weights to logical connections between similar but non-identical entities?

Replying toWhy isn't the following decision theory optimal?

Elliot_Olds11y

Why isn't the following decision theory optimal?

I think defect is the right answer in your AI problem and therefore that NDT gets it right, but I'm aware lots of LWers think otherwise. I haven't researched this enough to want to argue it, but is there a discussion you'd recommend I read that spells out the reasoning? Otherwise I'll just look through LW posts on prisoner's dilemmas.

Secondly, I'd like to try to somehow incorporate logical effects into NDT. I agree they're important. Any suggestions for where I could find lots of examples of decision problems where logical effects matter, to help me think about the general case?

Replying toWhy isn't the following decision theory optimal?

Elliot_Olds11y

Why isn't the following decision theory optimal?

In the retro blackmail, CDT does not precommit to refusing even if it's given the opportunity to do so before the researcher gets its source code.

To clarify: you mean that CDT doesn't precommit at time t=1 even if the researcher hasn't gotten the code representing CDT's state at time t=0 yet. The CDT doesn't think precommitting will help because it knows the code the researcher will get will be from before its precommitment. I agree that this is true, and a CDT won't want to precommit.

I guess my definition even after my clarification is ambiguous, as it's not clear that what a CDT wishes it could have precomitted to at an... (read more)

Replying toWhy isn't the following decision theory optimal?

Elliot_Olds11y

Why isn't the following decision theory optimal?

For example, a decision algorithm based on precommitment is unable to hold selfish preferences (valuing a cookie for me more than a cookie for a copy of me) in anthropic situations

I disagree that it makes sense to talk about one of the future copies of you being "you" whereas the other isn't. They're both you to the same degree (if they're exact copies).

Replying toWhy isn't the following decision theory optimal?

Elliot_Olds11y

Why isn't the following decision theory optimal?

Eliezer talked about this in his TDT paper. It is possible to hypothesize scenarios where agents get punished or rewarded for arbitrary reasons. For instance an AI could punish agents who made decisions based on the idea of their choices determining the results of abstract computations (as in TDT). This wouldn't show that TDT is a bad decision theory or even that it's no better than any other theory.

If we restrict ourselves to action-determined and decision-determined problems (see Eliezer's TDT paper) we can say that TDT is better than CDT, because it gets everything right that CDT gets right, plus it gets right some things that CDT gets wrong.

Can you think of any way that a situation could be set up that punishes an NDT agent, that doesn't reduce to an AI just not liking NDT agents and arbitrarily trying to hurt them?

Replying toWhy isn't the following decision theory optimal?

Elliot_Olds11y

Why isn't the following decision theory optimal?

I think my definition of NDT above was worded badly. The problematic part is "if he had previously known he'd be in his currently situation." Consider this definition:

You should always make the decision that a CDT-agent would have wished he had precommitted to, if he previously considered the possibility of his current situation and had the opportunity to costlessly precommit to a decision.

The key is that the NDT agent isn't behaving as if he knew for sure that he'd end up blackmailed when he made his precommitment (since his precommitment affects the probability of his being blackmailed), but rather he's acting "as if" he precommitted to some behavior based on reasonable estimates of the likelihood of his being kidnapped in various cases.

Replying toWhy isn't the following decision theory optimal?

Elliot_Olds11y

Why isn't the following decision theory optimal?

I believe that NDT gets this problem right.

The paper you link to shows that a pure CDT agent would not self modify into an NDT agent, because a CDT agent wouldn't really have the concept of "logical" connections between agents. The understanding that both logical and causal connections are real things is what would compel an agent to self-modify to NDT.

However, if there was some path by which an agent started out as pure CDT and then became NDT, the NDT agent would still choose correctly on Retro Blackmail even if the researcher had its original CDT source code. The NDT agent's decision procedure explicitly tells it to behave as if it had precommitted before the researcher got its source code.

So even if the CDT --> NDT transition is impossible, since I don't think any of us here are pure CDT agents, we can still adopt NDT and profit.

Why isn't the following decision theory optimal?

Elliot_Olds

11y

I've recently read the decision theory FAQ, as well as Eliezer's TDT paper. When reading the TDT paper, a simple decision procedure occurred to me which as far as I can tell gets the correct answer to every tricky decision problem I've seen. As discussed in the FAQ above, evidential decision theory get's the chewing gum problem wrong, causal decision theory gets Newcomb's problem wrong, and TDT gets counterfactual mugging wrong.

In the TDT paper, Eliezer postulates an agent named Gloria (page 29), who is defined as an agent who maximizes decision-determined problems. He describes how a CDT-agent named Reena would want to transform herself into Gloria. Eliezer writes

By Gloria’s nature, she always

... (read 344 more words →)

Help create an instrumental rationality "stack ranking"?

Elliot_Olds

14y

I recently heard about SIAI's Rationality Minicamp and thought it sounded cool, but for logistical/expense reasons I won't be going to one.

There are probably lots of people who are interested in improving their instrumental rationality, know about and like LessWrong, but haven't read the vast majority of content because there is just so much material, and the practical payoff is uncertain.

It would be cool if it was much easier for people to find the highest ROI material on LessWrong.

My rough idea for how this new instrumental rationality tool might work:

It starts off as a simple wiki focused on instrumental rationality. People only add things to the wiki (often just links to existing

... (read more)