thomblake comments on Thoughts on the Singularity Institute (SI) - Less Wrong

256 Post author: HoldenKarnofsky 11 May 2012 04:31AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (1270)

You are viewing a single comment's thread. Show more comments above.

Comment author: thomblake 18 May 2012 01:00:48PM 5 points [-]

The point is that there are unknowns you're not taking into account, and "bounded" doesn't mean "has bounds that a human would think of as 'reasonable'".

An AI doesn't strictly need "theory of mind" to manipulate humans. Any optimizer can see that some states of affairs lead to other states of affairs, or it's not an optimizer. And it doesn't necessarily have to label some of those states of affairs as "lying" or "manipulating humans" to be successful.

There are already ridiculous ways to hack human behavior that we know about. For example, you can mention a high number at an opportune time to increase humans' estimates / willingness to spend. Just imagine all the simple manipulations we don't even know about yet, that would be more transparent to someone not using "theory of mind".

Comment author: TheOtherDave 18 May 2012 02:44:48PM 0 points [-]

It becomes increasingly clear to me that I have no idea what the phrase "theory of mind" refers to in this discussion. It seems moderately clear to me that any observer capable of predicting the behavior of a class of minds has something I'm willing to consider a theory of mind, but that doesn't seem to be consistent with your usage here. Can you expand on what you understand a theory of mind to be, in this context?

Comment author: thomblake 18 May 2012 02:47:53PM 1 point [-]

I'm understanding it in the typical way - the first paragraph here should be clear:

Theory of mind is the ability to attribute mental states—beliefs, intents, desires, pretending, knowledge, etc.—to oneself and others and to understand that others have beliefs, desires and intentions that are different from one's own.

An agent can model the effects of interventions on human populations (or even particular humans) without modeling their "mental states" at all.

Comment author: TheOtherDave 18 May 2012 03:04:46PM 0 points [-]

Well, right, I read that article too.

But in this context I don't get it.

That is, we're talking about a hypothetical system that is capable of predicting that if it does certain things, I will subsequently act in certain ways, assert certain propositions as true, etc. etc, etc. Suppose we were faced with such a system, and you and I both agreed that it can make all of those predictions.Further suppose that you asserted that the system had a theory of mind, and I asserted that it didn't.

It is not in the least bit clear to me what we we would actually be disagreeing about, how our anticipated experiences would differ, etc.

What is it that we would actually be disagreeing about, other than what English phrase to use to describe the system's underlying model(s)?

Comment author: thomblake 18 May 2012 03:20:07PM 2 points [-]

What is it that we would actually be disagreeing about, other than what English phrase to use to describe the system's underlying model(s)?

We would be disagreeing about the form of the system's underlying models.

2 different strategies to consider:

  1. I know that Steve believes that red blinking lights before 9 AM are a message from God that he has not been doing enough charity, so I can predict that he will give more money to charity if I show him a blinking light before 9 AM.

  2. Steve seeing a red blinking light before 9 AM has historically resulted in a 20% increase of charitable donation for that day, so I can predict that he will give more money to charity if I show him a blinking light before 9 AM.

You can model humans with or without referring to their mental states. Both kinds of models are useful, depending on circumstance.

Comment author: TheOtherDave 18 May 2012 03:32:59PM 1 point [-]

And the assertion here is that with strategy #2 I could also predict that if I asked Steve why he did that, he would say "because I saw a red blinking light this morning, which was a message from God that I haven't been doing enough charity," but that my underlying model would nevertheless not include anything that corresponds to Steve's belief that red blinking lights are messages from God, merely an algorithm that happens to make those predictions in other ways.

Yes?

Comment author: thomblake 18 May 2012 04:41:57PM 2 points [-]

Yes, that's possible. It's still possible that you could get a lot done with strategy #2 without being able to make that prediction.

I agree that if 2 systems have the same inputs and outputs, their internals don't matter much here.

Comment author: TheOtherDave 18 May 2012 05:25:31PM *  0 points [-]

So.. when we posit in this discussion a system that lacks a theory of mind in a sense that matters, are we positing a system that cannot make predictions like this one? I assume so, given what you just said, but I want to confirm.

Comment author: thomblake 18 May 2012 06:05:44PM 1 point [-]

Yes, I'd say so. It isn't helpful here to say that a system lacks a theory of mind if it has a mechanism that allows it to make predictions about reported beliefs, intentions, etc.

Comment author: TheOtherDave 18 May 2012 06:12:33PM 0 points [-]

Cool! This was precisely my concern. It sounded an awful lot like y'all were talking about a system that could make such predictions but somehow lacked a theory of mind. Thanks for clarifying.

Comment author: XiXiDu 18 May 2012 03:41:02PM 0 points [-]

"theory of mind"

For me it denotes the ability to simulate other agents to various degrees of granularity. Possessing a mental model of another agent.