Comment author: cousin_it 09 March 2016 12:56:20PM *  12 points [-]

When I started hearing about the latest wave of results from neural networks, I thought to myself that Eliezer was probably wrong to bet against them. Should MIRI rethink its approach to friendliness?

Comment author: Wei_Dai 10 March 2016 10:37:06AM 14 points [-]

Compared to its competition in the AGI race, MIRI was always going to be disadvantaged by both lack of resources and the need to choose an AI design that can predictably be made Friendly as opposed to optimizing mainly for capability. For this reason, I was against MIRI (or rather the Singularity Institute as it was known back then) going into AI research at all, as opposed to pursuing some other way of pushing for a positive Singularity.

In any case, what other approaches to Friendliness would you like MIRI to consider? The only other approach that I'm aware of that's somewhat developed is Paul Christiano's current approach (see for example https://medium.com/ai-control/alba-an-explicit-proposal-for-aligned-ai-17a55f60bbcf), which I understand is meant to be largely agnostic about the underlying AI technology. Personally I'm pretty skeptical but then I may be overly skeptical about everything. What are your thoughts? I don't recall seeing you having commented on them much.

Are you aware of any other ideas that MIRI should be considering?

Comment author: paulfchristiano 07 February 2016 04:45:03AM 2 points [-]

What's your estimate? And what do you think the first such choices will be?

I think that we are facing some issues all of the time (e.g. some of these questions probably bear on "how much should we prioritize fast technological development?" or "how concerned should we be with physics disasters?" or so on), but that it will be a long time before we face really big expected costs from getting these wrong. My best guess is that we will get to do many-centuries-of-current-humanity worth of thinking before we really need to get any of these questions right.

I don't have a clear sense of what the first choices will be. My view is largely coming from not seeing any serious candidates for critical choices.

Anything to do with expansion into space looks like it will be very far away in subjective time (though perhaps not far in calendar time). Maybe there is some stuff with simulations, or value drift, but neither of those look very big in expectation for now. Maybe all of these issues together make 5% difference in expectation over the next few hundred subjective issues? (Though this is a pretty unstable estimate.)

Comment author: Wei_Dai 07 February 2016 11:06:08PM 3 points [-]

How did you arrive at the conclusion that we're not facing big expected costs with these questions? It seems to me that for example the construction of large nuclear arsenals and lack of sufficient safeguards against nuclear war has already caused a large expected cost, and may have been based on one or more incorrect philosophical understandings (e.g., to the question of, what is the right amount of concern for distant strangers and future people). Similarly with "how much should we prioritize fast technological development?" But this is just from intuition since I don't really know how to compute expected costs when the uncertainties involved have a large moral or normative component.

My best guess is that we will get to do many-centuries-of-current-humanity worth of thinking before we really need to get any of these questions right.

Do you expect technological development to have plateaued by then (i.e., AIs will have invented essentially all technologies feasible in this universe)? If so, do you think there won't be any technologies among them that would let some group of people/AIs unilaterally alter the future of the universe according to their understanding of what is normative? (For example, intentionally or accidentally destroy civilization, or win a decisive war against the rest of the world.) Or do you think something like a world government will have been created to control the use of such technologies?

Comment author: paulfchristiano 05 February 2016 11:10:21PM 0 points [-]

Suppose act-based designs are as successful as you expect them to be.

It's not so much that I have confidence in these approaches, but that I think (1) they are the most natural to explore at the moment, and (2) issues that seem like they can be cleanly avoided for these approaches seem less likely to be fundamental obstructions in general.

We still need to understand issues like the one described in Eliezer's post (or solve the meta-problem of understanding philosophical reasoning) at some point, right? When do you think that will be?

Whenever such issues bear directly on our decision-making in such a way that making errors would be really bad. For example, when we encounter a situation where we face a small probability of a very large payoff, then it matters how well we understand the particular tradeoff at hand. The goal / best case is that the development of AI doesn't depend on sorting out these kinds of considerations for its own sake, only insofar as the AI has to actually make critical choices that depend on these considerations.

The dependence on humans and lack of full autonomy in act-based agents seem likely to cause a significant weakness in at least one crucial area of this competition,

I wrote a little bit about efficiency here. I don't see why an approval-directed agent would be at a serious disadvantage compared to an RL agent (though I do see why an imitation learner would be at a disadvantage by default, and why an approval-directed agent may be unsatisfying from a safety perspective for non-philosophical reasons).

Ideally you would synthesize data in advance in order to operate without access to counterfactual human feedback at runtime---it's not clear if this is possible, but it seems at least plausible. But it's also not clear to me it is necessary, as long as we can tolerate very modest (<1%) overhead from oversight.

Of course if such a period goes on long enough then it will be a problem, but that is a slow-burning problem that a superintelligent civilization can address at its leisure. In terms of technical solutions, anything we can think of now will easily be thought of in this future scenario. It seems like the only thing we really lose is the option of technological relinquishment or serious slow-down, which don't look very attractive/feasible at the moment.

Comment author: Wei_Dai 06 February 2016 01:27:09PM 0 points [-]

The goal / best case is that the development of AI doesn't depend on sorting out these kinds of considerations for its own sake, only insofar as the AI has to actually make critical choices that depend on these considerations.

Isn't a crucial consideration here how soon after the development of AI they will be faced with such choices? If the answer is "soon" then it seems that we should try to solve the problems ahead of time or try to delay AI. What's your estimate? And what do you think the first such choices will be?

Comment author: [deleted] 28 January 2016 11:27:52PM *  0 points [-]

In response to comment by [deleted] on AALWA: Ask any LessWronger anything
Comment author: Wei_Dai 05 February 2016 09:12:50PM 1 point [-]

Does your link to the first thread imply that you believe securing one's bitcoin (and realizing its unique benefits) is ultimately a futile venture, especially in the presence of an adversary of advanced intelligence?

Yes, that looks likely to be the case.

To the second link, I guess you mean to imply the monetary policy of Bitcoin is ultimately flawed due to its deflationary nature?

That's part of it. If decentralized cryptocurrency is ultimately good for the world, then Bitcoin may be bad because its flawed monetary policy prevents or delays widespread adoption of cryptocurrency. But another part is that cryptocurrency and other cypherpunk/cryptoanarchist ideas may ultimately be harmful even if they are successful in their goals. For example they tend to make it harder for governments to regulate economic activity, but we may need such regulation to reduce existential risk from AI, nanotech, and other future technologies.

If one wants to push the future in a positive direction, it seems to me that there are better things to work on than Bitcoin.

Comment author: paulfchristiano 31 January 2016 08:09:14PM *  0 points [-]

In my view, we could make act-based agents without answering this or any similar questions. So I'm much less interested in answering them then I used to be. (There are possible approaches that do have to answer all of these questions, but at this point they seem very much less promising to me.)

We've briefly discussed this issue in the abstract, but I'm curious to get your take in a concrete case. Does that seem right to you? Do you think that we need to understand issues like this one, and have confidence in that understanding, prior to building powerful AI systems?

Comment author: Wei_Dai 01 February 2016 10:38:17PM *  0 points [-]

FAI designs that require high confidence solutions to many philosophical problems also do not seem very promising to me at this point. I endorse looking for alternative approaches.

I agree that act-based agents seem to require fewer high confidence solutions to philosophical problems. My main concern with act-based agents is that these designs will be in competition with fully autonomous AGIs (either alternative designs, or act-based agents that evolve into full autonomy due to inadequate care of their owners/users) to colonize the universe. The dependence on humans and lack of full autonomy in act-based agents seem likely to cause a significant weakness in at least one crucial area of this competition, such as general speed/efficiency/creativity, warfare (conventional, cyber, psychological, biological, nano, etc.), cooperation/coordination, self-improvement, and space travel. So even if these agents turn out to be "safe", I'm not optimistic that we "win" in the long run.

My own idea is to aim for FAI designs that can correct their philosophical errors, autonomously, the same way that we humans can. Ideally, we'd fully understand how humans reason about philosophical problems and how philosophy normatively ought to be done before programming or teaching that to an AI. But realistically, due to time pressure, we might have to settle for something suboptimal like teaching through examples of human philosophical reasoning. Of course there's lots of ways for this kind of AI to go wrong as well, so I also consider it to be a long shot.

Do you think that we need to understand issues like this one, and have confidence in that understanding, prior to building powerful AI systems?

Let me ask you a related question. Suppose act-based designs are as successful as you expect them to be. We still need to understand issues like the one described in Eliezer's post (or solve the meta-problem of understanding philosophical reasoning) at some point, right? When do you think that will be? In other words, how much time do you think successfully creating act-based agents buys us?

Comment author: IlyaShpitser 12 January 2016 07:06:54PM 2 points [-]

I think my point wasn't about what computer security precisely does, but about the mindset of people who do it (security people cultivate an adversarial point of view about systems).

My secondary point is that computer security is a very solid field, and doesn't look wishy washy or science fictiony. It has serious conferences, it has research centers, industry labs, intellectual firepower, etc.

Comment author: Wei_Dai 15 January 2016 11:03:07PM 3 points [-]

I'm not sure how much there is to learn from the field of computer security, with regard to the OP's question. It's relatively easy to cultivate an adversarial mindset and get funding for conferences, research centers, labs, intellectual firepower, etc., when adversaries exist at the present time and are causing billions of dollars of damage each year. How to do that if the analogous adversaries are not expected to exist for a decade or more, and we expect it will be too late to get started once the adversaries do exist?

Comment author: [deleted] 08 January 2016 03:12:42AM *  0 points [-]

In response to comment by [deleted] on AALWA: Ask any LessWronger anything
Comment author: Wei_Dai 15 January 2016 10:38:24PM 1 point [-]

I don't follow Bitcoin development very closely, basically just reading about it if a story shows up on New York Times or Wired. If you're curious as to why, see this post and this thread.

Comment author: [deleted] 01 November 2015 08:52:40PM *  1 point [-]

In response to comment by [deleted] on AALWA: Ask any LessWronger anything
Comment author: Wei_Dai 02 November 2015 07:29:13AM 1 point [-]

If the identity of the individual were confirmed it would perhaps, at a minimum, elevate their engineer/thinker status such that other ideas and pieces of work attributed to them may receive more attention (and maybe help) from many others who would perhaps not otherwise have happened upon them.

This is interesting and something I hadn't thought about. Now I'm more curious who Satoshi is and why he or she or they have decided to remain anonymous. Thanks! You might want to post your idea somewhere else too, like the Bitcoin reddit or forum, since probably not many people will get to read it here.

Comment author: Wei_Dai 02 November 2015 07:12:25AM *  7 points [-]

Few people, when learning their values in childhood, ended up considering examples such as this one and explicitly learning that they were wrong. Yet the persuasive power of that example comes from most people instantly reject the desirability of the dopamine drip scenario when it’s suggested to them.

I for one don't "instantly reject" the desirability of this scenario. I think it's a difficult philosophy problem as to whether dopamine drip is desirable or not. My worry is that either the AI will not be as uncertain as I am about it, or it will not handle or resolve the normative uncertainty in the same way as I would or should.

Today's machine learning algorithms tend to be unreasonably certain (and wrong) about inputs very different from their training data, but that is perhaps just due to machine learning researchers currently focusing mostly on commercial settings where inputs are rarely very different from training data, and there aren't terrible consequences for getting things wrong. So maybe we can expect this to improve in the future as researchers start to focus more on safety.

But even if we manage to build an AI that is properly uncertain about whether something like the dopamine drip scenario is good or bad, how do we get it to resolve its uncertainty in the right way, especially if its creators/owners are also uncertain or possibly wrong so it can't just ask? Resolving the uncertainty incorrectly or getting the uncertainty permanently frozen into its utility function seem to be two big risks here. So I worry just as much about the reverse maverick nanny scenario, where we eventually, after centuries of philosophical progress, figure out that we actually do want to be put on dopamine drips, but the AI says "Sorry, I can't let you do that."

Comment author: So8res 26 October 2015 07:23:49PM 1 point [-]

I mostly agree here, though I'm probably less perturbed by the six year time gap. It seems to me like most of the effort in this space has been going towards figuring out how to handle logical uncertainty and logical counterfactuals (with some reason to believe that answers will bear on the question of how to generate priors), with comparatively little work going into things like naturalized induction that attack the problem of priors more directly.

Can you say any more about alternatives you've been considering? I can easily imagine a case where we look back and say "actually the entire problem was about generating a prior-like-thingy" but I have a harder time visualizing different tacts altogether (that don't eventually have some step that reads "then treat observations like Bayesian evidence").

Comment author: Wei_Dai 27 October 2015 02:59:24AM 1 point [-]

Can you say any more about alternatives you've been considering?

Not much to say, unfortunately. I tried looking at some frequentist ideas for inspiration, but didn't find anything that seemed to have much bearing on the kind of philosophical problems we're trying to solve here.

View more: Prev | Next