All of Elliot_Olds's Comments + Replies

Steve Omohundro says:

"1) Nobody powerful wants to create unsafe AI but they do want to take advantage of AI capabilities.
2) None of the concrete well-specified valuable AI capabilities require unsafe behavior"

I think a lot of powerful people / organizations do want take advantage of possibly unsafe AI capabilities, such as ones that would allow them to be the emperors of the universe for all time. Especially if not doing so means that their rivals have a higher chance of becoming the emperors of the universe.

Hi Ben. Are you still planning to post a transcript sometime?

4Ben Pace
Yes, I published it that week! Here's the highlights post, that links to the transcript.

Thanks. I had one question about your Toward Idealized Decision Theory paper.

I can't say I fully understand UDT, but the 'updateless' part does seem very similar to the "act as if you had precommitted to any action that you'd have wanted to precommit to" core idea of NDT. It's not clear to me that the super powerful UDT would make the wrong decision in the game where two players pick numbers between 0-10 and get payouts based on their pick and the total sum.

Wouldn't the UDT reason as follows? "If my algorithm were such that I wouldn't just ... (read more)

1So8res
Yep, that's a common intuition pump people use in order to understand the "updateless" part of UDT. A proof-based UDT agent would -- this follows from the definition of proof-based UDT. Intuitively, we surely want a decision theory that reasons as you said, but the question is, can you write down a decision algorithm that actually reasons like that? Most people agree with you on the philosophy of how an idealized decision theory should act, but the hard part is formalizing a decision theory that actually does the right things. The difficult part isn't in the philosophy, the difficult part is turning the philosophy into math :-)

An AI should certainly cooperate if it discovered that by chance its opposing AI had identical source code.

I read your paper and the two posts in your short sequence. Thanks for the links. I still think it's very unlikely that one of the AIs in your original hypothetical (when they don't examine each other's source code) would do better by defecting.

I accept that if an opposing AI had a model of you that was just decent but not great, then there is some amount of logical connection there. What I haven't seen is any argument about the shape of the graph o... (read more)

3So8res
Nope! That's the open part of the problem :-) We don't know how to build a decision network with logical nodes, and we don't know how to propagate a "logical update" between nodes. (That is, we don't have a good formalism of how changing one algorithm logically affects a related but non-identical algorithm.) If we had the latter thing, we wouldn't even need the "logical decision network", because we could just ask "if I change the agent, how does that logically affect the universe?" (as both are algorithms); this idea is the basis of proof-based UDT (which tries to answer the problem by searching for proofs under the assumption "Agent()=a" for various actions). Proof based UDT has lots of problems of its own, though, and thinking about logical updates in logical graphs is a fine angle of approach.

I think defect is the right answer in your AI problem and therefore that NDT gets it right, but I'm aware lots of LWers think otherwise. I haven't researched this enough to want to argue it, but is there a discussion you'd recommend I read that spells out the reasoning? Otherwise I'll just look through LW posts on prisoner's dilemmas.

Secondly, I'd like to try to somehow incorporate logical effects into NDT. I agree they're important. Any suggestions for where I could find lots of examples of decision problems where logical effects matter, to help me think about the general case?

1So8res
That's surprising to me. Imagine that the situation is "prisoner's dilemma with shared source code", and that the AIs inspect each other's source code and verify that (by some logical but non-causal miracle) they have exactly identical source code. Do you still think they do better to defect? I wouldn't want to build an agent that defects in that situation :-p The paper that jessicat linked in the parent post is a decent introduction to the notion of logical counterfactuals. See also the "Idealized Decision Theory" section of this annotated bibliography, and perhaps also this short sequence I wrote a while back.

In the retro blackmail, CDT does not precommit to refusing even if it's given the opportunity to do so before the researcher gets its source code.

To clarify: you mean that CDT doesn't precommit at time t=1 even if the researcher hasn't gotten the code representing CDT's state at time t=0 yet. The CDT doesn't think precommitting will help because it knows the code the researcher will get will be from before its precommitment. I agree that this is true, and a CDT won't want to precommit.

I guess my definition even after my clarification is ambiguous, as i... (read more)

4So8res
The universe begins, and then almost immediately, two different alien species make AIs while spacelike separated. The AIs start optimizing their light cones and meet in the middle, and must play a Prisoner's Dilemma. There is absolutely no causal relationship between them before the PD, so it doesn't matter what precommitments they would have made at the beginning of time :-) To be clear, this sort of thought experiment is meant to demonstrate why your NDT is not optimal; it's not meant to be a feasible example. The reason we're trying to formalize "logical effect" is not specifically so that our AIs can cooperate with independently developed alien AIs or something (although that would be a fine perk). Rather, this extreme example is intended to demonstrate why idealized counterfactual reasoning needs to take logical effects into account. Other thought experiments can be used to show that reasoning about logical effects matters in more realistic scenarios, but first it's important to realize that they matter at all :-)

For example, a decision algorithm based on precommitment is unable to hold selfish preferences (valuing a cookie for me more than a cookie for a copy of me) in anthropic situations

I disagree that it makes sense to talk about one of the future copies of you being "you" whereas the other isn't. They're both you to the same degree (if they're exact copies).

0Manfred
I agree with you there - what I mean by selfish preferences is that after the copies are made, each copy will value a cookie for itself more than a cookie for the other copy - it's possible that they wouldn't buy their copy a cookie for $1, but would buy themselves a cookie for $1. This is the indexically-selfish case of the sort of preferences people have that cause them to buy themselves a $1 cookie rather than giving that $1 to GiveDirectly (which is what they'd do if they made their precommitments behind a Rawlsian veil of ignorance).

Eliezer talked about this in his TDT paper. It is possible to hypothesize scenarios where agents get punished or rewarded for arbitrary reasons. For instance an AI could punish agents who made decisions based on the idea of their choices determining the results of abstract computations (as in TDT). This wouldn't show that TDT is a bad decision theory or even that it's no better than any other theory.

If we restrict ourselves to action-determined and decision-determined problems (see Eliezer's TDT paper) we can say that TDT is better than CDT, because it get... (read more)

2BlindIdiotPoster
This sounds a lot like the objections CDT people were giving to Newcombs problem.

I think my definition of NDT above was worded badly. The problematic part is "if he had previously known he'd be in his currently situation." Consider this definition:

You should always make the decision that a CDT-agent would have wished he had precommitted to, if he previously considered the possibility of his current situation and had the opportunity to costlessly precommit to a decision.

The key is that the NDT agent isn't behaving as if he knew for sure that he'd end up blackmailed when he made his precommitment (since his precommitment affec... (read more)

I believe that NDT gets this problem right.

The paper you link to shows that a pure CDT agent would not self modify into an NDT agent, because a CDT agent wouldn't really have the concept of "logical" connections between agents. The understanding that both logical and causal connections are real things is what would compel an agent to self-modify to NDT.

However, if there was some path by which an agent started out as pure CDT and then became NDT, the NDT agent would still choose correctly on Retro Blackmail even if the researcher had its original... (read more)

5So8res
In the retro blackmail, CDT does not precommit to refusing even if it's given the opportunity to do so before the researcher gets its source code. This is because CDT believes that the researcher is predicting according to a causally disconnected copy of itself, and therefore it does not believe that its actions can affect the copy. (That is, if CDT knows it is going to be retro blackmailed, and considers this before the researcher gets access to its source code, then it still doesn't precommit.) The failure here is that CDT only reasons according to what it can causally affect, but in the real world decision algorithms also need to worry about what they can logically affect (For example, two agents created while spacelike separated should be able to cooperate on a Prisoner's Dilemma.) Your attempted patch (pretend you made your precommitments earlier in time) only works when the neglected logical relationships stem from a causal event earlier in time. This is often but not always the case. For instance, if CDT thinks that its clone was causally copied from its own source code, then you can get the right answer by acting as CDT would have precommitted to act before the copying occurred. But two agents written in spacelike separation from each other might have decision algorithms that are logically correlated, despite there being no causal connection no matter how far back you go. In order to get the right precommitments in those sorts of scenarios, you need to formalize some sort of notion of "things the decision algorithm's choice logically affects," and formalizing "logical effects" is basically the part of the problem that remains difficult :-)

You're right, I think poker chips is too messy. Since you already have pens right there, it'd be better for people to just write a little mark / star / [their name] on any card that they were interested in.

Pretty much. The fact that NDT is so obvious is why I'm puzzled as to why TDT needed to be created, and why Eliezer didn't end his paper shortly after the discussion of Gloria. NDT seems to get all the tricky decision problems right, even at least one that TDT gets wrong, so what am I missing?

Idea related to the clipboard, but combined with poker chips:

There is a stack of blank note cards on the table, and several pens/markers. If there's an existing discussion and you want to talk about an unrelated topic, you grab a notecard, write down the topic, and place it face up on the table. At any time, there may be several note cards on the table representing topics people want to talk about. Each person also has a poker chip (or a few) that they may place near a particular card, expressing their interest in talking about that topic. Poker chips are basically upvotes.

3evand
I like the index cards approach. I worry that the poker chips start making things distracting, which will discourage their use or reduce their effectiveness.

I stutter and have done a lot of research on stuttering. It's rare that adult stutterers ever completely stop stuttering, but these two ebooks are the best resources I know of for dealing with it:

http://www.stutteringhelp.org/Portals/English/Book_0012_tenth_ed.pdf http://www.scribd.com/doc/23283047/Easy-Stuttering-Avoidance-Reduction-Therapy

The short version is that the less you try to suppress or conceal your stuttering the less severe it will become in the long run.

"the plan that lets you save money in the US is a life-engulfing minefield of time-consuming bargin-hunting, self-denial, and tax evasion."

I work as a software developer in the US, have never made a 'budget' for myself or tried to analyze my finaces before now, I pay taxes normally, eat out often, and have no trouble saving lots of money. I'm going to substitute my expenses and pretend I only make 100k and see how much I'd still be able to save (living in Seattle).

Rent: 16.8k instead of 23.2k Utilities: 2k instead of 7k (how can you spend 7k on u... (read more)

0taryneast
I spent $5800 on utilities last year... it happens when you live in an area that simultaneously gets below freezing point (and thus you need to spend on heating) and also gets above comfortable living point (and thus you need to spend on fans or air-con). I'm pretty reasonably frugal on both... I don't set the aircon super low, I don't set the heating on high... but utilities are pricey. I also count "internet" as a utility. When I lived in a warmer climate I spent $2800 "Misc house expenses" include things like fixing a broken toilet... or other general repairs. If you're renting you may not have to pay that. Or maybe you do if your landlord is dodgy. I spent around $8K on "transport" - which includes car payments (I bought a new but small hatchback 3 years ago = $22k), fuel, insurance, repairs, servicing and parking costs. I can well imagine that a family with more than one person (and thus more than one car) easily pays twice as much as me.