Comment author: WhySpace 23 August 2016 06:26:08PM *  2 points [-]

(1) Given: AI risk comes primarily from AI optimizing for things besides human values.

(2) Given: humans already are optimizing for things besides human values. (or, at least besides our Coherent Extrapolated Volition)

(3) Given: Our world is okay.^[CITATION NEEDED!]

(4) Therefore, imperfect value loading can still result in an okay outcome.

This is, of course, not necessarily always the case for any given imperfect value loading. However, our world serves as a single counterexample to the rule that all imperfect optimization will be disastrous.

(5) Given: A maxipok strategy is optimal. ("Maximize the probability of an okay outcome.")

(6) Given: Partial optimization for human values is easier than total optimization. (Where "partial optimization" is at least close enough to achieve an okay outcome.)

(7) ∴ MIRI should focus on imperfect value loading.

Note that I'm not convinced of several of the givens, so I'm not certain of the conclusion. However, the argument itself looks convincing to me. I’ve also chosen to leave assumptions like “imperfect value loading results in partial optimization” unstated as part of the definitions of those 2 terms. However, I’ll try and add details to any specific areas, if questioned.

Comment author: Wei_Dai 04 September 2016 11:41:14AM 2 points [-]

However, our world serves as a single counterexample to the rule that all imperfect optimization will be disastrous.

Except that the proposed rule is more like, given an imperfect objective function, the outcome is likely to turn from ok to disastrous at some point as optimization power is increased. See the Context Disaster and Edge Instantiation articles at Arbital.

The idea of context disasters applies to humans or humanity as a whole as well as AIs, since as you mentioned we are already optimizing for something that is not exactly our true values. Even without the possibility of AI we have a race between technological progress (which increases our optimization power) and progress in coordination and understanding our values (which improve our objective function), which we can easily lose.

Comment author: Wei_Dai 22 July 2016 03:54:22PM 2 points [-]

Anyone else worried about Peter Thiel's support for Donald Trump discrediting Thiel in a lot of people's eyes, and MIRI and AI safety/risk research in general by association?

Comment author: Manfred 21 July 2016 09:48:54PM 12 points [-]

Oh my gosh, the negative utilitarians are getting into AI safety. Everyone play it cool and try not to look like you're suffering.

Comment author: Wei_Dai 22 July 2016 03:49:22PM 7 points [-]

That's funny. :) But these people actually sound remarkably sane. See here and here for example.

Comment author: Gunnar_Zarncke 27 June 2016 08:05:36PM 3 points [-]

Yes. But I guess that there is a large class of interesting property protocols that can't be implemented with bitcoins. Like the variants of 'two-phase commit' that are part of many step by step property transactions ('concurrent' in financial jargon). I wonder whether there is a non-turing complete set of primitives that suffices for more current legal transactions. Probably a lot can be learned by asking a notary with programming experience...

Comment author: Wei_Dai 28 June 2016 02:03:19AM 2 points [-]

But I guess that there is a large class of interesting property protocols that can't be implemented with bitcoins.

Good point, on second thought this may apply to the type of transaction I described in the opening comment, since it requires some way to transfer funds from all of the contracting parties into the escrow account at the same time, otherwise one of the parties could hold back from making the escrow deposit, and then blackmail the other parties that did make deposits (since they require his cooperation to get the money back from escrow but he has nothing to lose). I'm not sure if this simultaneous transfer of funds is possible to implement in the current version of Bitcoin.

Comment author: Gunnar_Zarncke 25 June 2016 08:45:15PM 1 point [-]

With smart contracts you can implement such escrow accounts. Which is a step into the right direction I think.

Comment author: Wei_Dai 27 June 2016 09:36:06AM 3 points [-]

You don't need Ethereum-style smart contracts that can do general computation to implement escrow accounts. Multi-signature addresses, which Bitcoin already supports, are enough.

Comment author: Wei_Dai 25 June 2016 06:13:33AM 12 points [-]

In b-money I had envisioned that a common type of contract would be one where all the participants, including a third-party arbitrator, deposit funds into an escrow account at the start, which can only be released at the end of the contract with the unanimous agreement of all the contracting parties. So the arbitrator would make judgments on performance and damages, and be incentivized to be fair in order to protect their reputation and not lose their own deposit, and the other parties would be incentivized to accept the arbitrator's judgments since that's the only way (short of direct account adjustment by everyone, aka forking) to get their escrow funds out.

Not exactly the kind of "the code is the contract" smart contracts that some people are so excited about, and I have to say I don't quite understand the excitement. Without an AI that can live on the blockchain and replace human judgments, smart contracts are restricted to applications where such judgement are not required, and there doesn't seem to be many of these. Even when contracting for the production and delivery of digital goods, we still need human-like judgments when disputes arise regarding whether the goods delivered are the ones contracted for (except in rare cases where we can mathematically define what we want, like the prime factors of some integer).

Comment author: HungryHobo 24 June 2016 09:48:54AM 5 points [-]

AI is complex. Complexity means bugs. Bugs in smart contracts are exactly what you need to avoid.

What is needed the most is mathematically proving code.

For certain contract types you're going to need some way of confirming that, say, physical goods have been delivered but you gain nothing my adding AI to the mix.

Without AI you have a switch someone has to toggle or some other signal that someone might hack. With AI you just have some other input stream that someone might tamper with. Either way you need to accept information into the system somehow and it may not be accurate. AI does not solve the problem. It just adds complexity which makes mistakes more likely.

When all you have is a hammer, everything looks like a nail, when all you have is AI theories everything looks like a problem to throw AI at.

Comment author: Wei_Dai 25 June 2016 05:49:26AM 4 points [-]

AI is complex. Complexity means bugs. Bugs in smart contracts are exactly what you need to avoid.

Security is one problem with smart contracts, but lack of applications is another one. AI may make the security problem worse, but it's needed for many potential applications of smart contracts. For example, suppose I want to pay someone to build a website for me that is standards conforming, informative, and aesthetically pleasing. Without an AI that can make human-like judgements, to create a smart contract where "the code is the contract", I'd have to mathematically define each of those adjectives, which would be impossibly difficult or many orders of magnitude more costly than just building the website.

With AI you just have some other input stream that someone might tamper with.

The solution to this would be to have each of the contracting parties provide evidence to the AI, which could include digitally signed (authenticated) data from third parties (security camera operators, shipping companies, etc.), and have the AI make judgments about them the same way a human judge would.

Comment author: paulfchristiano 11 March 2016 10:31:56PM *  1 point [-]

Re 1:

For a working scheme, I would expect it to be usable by a significant fraction of humans (say, comparable to the fraction that can learn to write a compiler).

That said, I would not expect almost anyone to actually play the role of the overseer, even if a scheme like this one ended up being used widely. An existing analogy would be the human trainers who drive facebook's M (at least in theory, I don't know how that actually plays out). The trainers are responsible for getting M to do what the trainers want, and the user trusts the trainers to do what the user wants. From the user's perspective, this is no different from delegating to the trainers directly, and allowing them to use whatever tools they like.

I don't yet see why "defer to human judgments and handle uncertainty in a way that they would endorse" requires evaluating complex philosophical arguments or having a correct understanding of metaphilosophy. If the case is unclear, you can punt it to the actual humans.

If I imagine an employee who sucks at philosophy but thinks 100x faster than me, I don't feel like they are going to fail to understand how to defer to me on philosophical questions. I might run into trouble because now it is comparatively much harder to answer philosophical questions, so to save costs I will often have to do things based on rough guesses about my philosophical views. But the damage from using such guesses depends on the importance of having answers to philosophical questions in the short-term.

It really feels to me like there are two distinct issues:

  1. Philosophical understanding may help us make good decisions in the short term, for example about how to trade off extinction risk vs faster development, or how to prioritize the suffering of non-human animals. So having better philosophical understanding (and machines that can help us build more understanding) is good.
  2. Handing off control of civilization to AI systems might permanently distort society's values. Understanding how to avoid this problem is good.

These seem like separate issues to me. I am convinced that #2 is very important, since it seems like the largest existential risk by a fair margin and also relatively tractable. I think that #1 does add some value, but am not at all convinced that it is a maximally important problem to work on. As I see it, the value of #1 depends on the importance of the ethical questions we face in the short term (and on how long-lasting are the effects of differential technological progress that accelerates our philosophical ability).

Moreover, it seems like we should evaluate solutions to these two problems separately. You seem to be making an implicit argument that they are linked, such that a solution to #2 should only be considered satisfactory if it also substantially addresses #1. But from my perspective, that seems like a relatively minor consideration when evaluating the goodness of a solution to #2. In my view, solving both problems at once would be at most 2x as good as solving the more important of the two problems. (Neither of them is necessarily a crisp problem rather than an axis along which to measure differential technological development.)

I can see several ways in which #1 and #2 are linked, but none of them seem very compelling to me. Do you have something in particular in mind? Does my position seem somehow more fundamentally mistaken to you?

(This comment was in response to point 1, but it feels like the same underlying disagreement is central to points 2 and 3. Point 4 seems like a different concern, about how the availability of AI would itself change philosophical deliberation. I don't really see much reason to think that the availability of powerful AI would make the endpoint of deliberation worse rather than better, but probably this is a separate discussion.)

Comment author: Wei_Dai 13 March 2016 12:10:04AM *  6 points [-]

The trainers are responsible for getting M to do what the trainers want, and the user trusts the trainers to do what the user wants.

In that case, there would be severe principle-agent problems, given the disparity between power/intelligence of the trainer/AI systems and the users. If I was someone who couldn't directly control an AI using your scheme, I'd be very concerned about getting uneven trades or having my property expropriated outright by individual AIs or AI conspiracies, or just ignored and left behind in the race to capture the cosmic commons. I would be really tempted to try another AI design that does purport to have the AI serve my interests directly, even if that scheme is not as "safe".

If I imagine an employee who sucks at philosophy but thinks 100x faster than me, I don't feel like they are going to fail to understand how to defer to me on philosophical questions.

If an employee sucks at philosophy, how does he even recognize philosophical problems as problems that he needs to consult you for? Most people have little idea that they should feel confused and uncertain about things like epistemology, decision theory, and ethics. I suppose it might be relatively easy to teach an AI to recognize the specific problems that we currently consider to be philosophical, but what about new problems that we don't yet recognize as problems today?

Aside from that, a bigger concern for me is that if I was supervising your AI, I would be constantly bombarded with philosophical questions that I'd have to answer under time pressure, and afraid that one wrong move would cause me to lose control, or lock in some wrong idea.

Consider this scenario. Your AI prompts you for guidance because it has received a message from a trading partner with a proposal to merge your AI systems and share resources for greater efficiency and economy of scale. The proposal contains a new AI design and control scheme and arguments that the new design is safer, more efficient, and divides control of the joint AI fairly between the human owners according to your current bargaining power. The message also claims that every second you take to consider the issue has large costs to you because your AI is falling behind the state of the art in both technology and scale, becoming uncompetitive, so your bargaining power for joining the merger is dropping (slowly in the AI's time-frame, but quickly in yours). Your AI says it can't find any obvious flaws in the proposal, but it's not sure that you'd consider the proposal to really be fair under reflective equilibrium or that the new design would preserve your real values in the long run. There are several arguments in the proposal that it doesn't know how to evaluate, hence the request for guidance. But it also reminds you not to read those arguments directly since they were written by a superintelligent AI and you risk getting mind-hacked if you do.

What do you do? This story ignores the recursive structure in ALBA. I think that would only make the problem even harder, but I could be wrong. If you don't think it would go like this, let me know how you think this kind of scenario would go.

In terms of your #1, I would divide the decisions requiring philosophical understanding into two main categories. One is decisions involved in designing/improving AI systems, like in the scenario above. The other, which I talked about in an earlier comment, is ethical disasters directly caused by people who are not uncertain, but just wrong. You didn't reply to that comment, so I'm not sure why you're unconcerned about this category either.

Comment author: paulfchristiano 10 March 2016 06:57:00PM 4 points [-]

Do you have a concise explanation of skepticism about the overall approach, e.g. a statement of the difficulty or difficulties you think will be hardest to overcome by this route?

Or is your view more like "most things don't work, and there isn't much reason to think this would work"?

In discussion you most often push on the difficulty of doing reflection / philosophy. Would you say this is your main concern?

My take has been that we just need to meet the lower bar of "wants to defer to human views about philosophy, and has a rough understanding of how humans want to reflect and want to manage their uncertainty in the interim."

Regarding philosophy/metaphilosophy, is it fair to describe your concern as one of:

  1. The approach I am pursuing can't realistically meet even my lower bar,
  2. Meeting my lower bar won't suffice for converging to correct philosophical views,
  3. Our lack of philosophical understanding will cause problems soon in subjective time (we seem to have some disagreement here, but I don't feel like adopting your view would change my outlook substantially), or
  4. AI systems will be much better at helping humans solve technical than philosophical problems, driving a potentially long-lasting (in subjective time) wedge between our technical and philosophical capability, even if ultimately we would end up at the right place?

My hope is that thinking and talking more about bootstrapping procedures would go a long way to resolving the disagreements between us (either leaving you more optimistic or me more pessimistic). I think this is most plausible if #1 is the main disagreement. If our disagreement is somewhere else, it may be worth also spending some time focusing somewhere else. Or it may be necessary to better define my lower bar in order to tell where the disagreement is.

Comment author: Wei_Dai 10 March 2016 10:59:09PM 7 points [-]

It seems to be a combination of all of these.

  1. Training an AI to defer to one's eventual philosophical judgments and interim method of managing uncertainty (and not falling prey to marketing worlds and incorrect but persuasive philosophical arguments etc) seems really hard, and made harder by the recursive structure in ALBA and the fact that the first level AI is sub-human in capacity which then has to handle being bootstrapped and training the next level AI. What percent of humans can accomplish this task, do you think? (I'd argue that the answer is likely zero, but certainly very small.) How do the rest use your AI?
  2. Assuming that deferring to humans on philosophy and managing uncertainty is feasible but costly, how many people could resist dropping this feature and the associated cost, in favor of adopting some sort of straightforward utility maximization framework with a fixed utility function that they think captures most or all of their values, if that came as a suggestion from the AI with an apparently persuasive argument? If most people do this and only a few don't (and those few are also disadvantaged in the competition to capture the cosmic commons due to deciding to carry these costs), that doesn't seem like much of a win.
  3. This is tied in with 1 and 2, in that correct meta-philosophical understanding is needed to accomplish 1, and unreasonable philosophical certainty would cause people to fail step 2.
  4. Even if the AIs keep deferring to their human users and don't end up short-circuit their philosophical judgements, if the AI/human systems become very powerful while still having incorrect and strongly held philosophical views, that seems likely to cause disaster. We also don't have much reason to think that if we put people in such positions of power (for example, being able to act as a god in some simulation or domain of their choosing), that most will eventually realize their philosophical errors and converge to correct views, that the power itself wouldn't further distort their already error-prone reasoning processes.
Comment author: cousin_it 10 March 2016 12:10:52PM *  3 points [-]

As far as I can tell, Paul's current proposal might still suffer from blackmail, like his earlier proposal which I commented on. I vaguely remember discussing the problem with you as well.

One big lesson for me is that AI research seems to be more incremental and predictable than we thought, and garage FOOM probably isn't the main danger. It might be helpful to study the strengths and weaknesses of modern neural networks and get a feel for their generalization performance. Then we could try to predict which areas will see big gains from neural networks in the next few years, and which parts of Friendliness become easy or hard as a result. Is anyone at MIRI working on that?

Comment author: Wei_Dai 10 March 2016 09:27:40PM 6 points [-]

Then we could try to predict which areas will see big gains from neural networks in the next few years, and which parts of Friendliness become easy or hard as a result. Is anyone at MIRI working on that?

If they did that, then what? Try to convince NN researchers to attack the parts of Friendliness that look hard? That seems difficult for MIRI to do given where they've invested in building their reputation (i.e., among decision theorists and mathematicians instead of in the ML community). (It would really depend on people trusting their experience and judgment since it's hard to see how much one could offer in the form of either mathematical proof or clearly relevant empirical evidence.) You'd have a better chance if the work was carried out by some other organization. But even if that organization got NN researchers to take its results seriously, what incentives do they have to attack parts of Friendliness that seem especially hard, instead of doing what they've been doing, i.e., racing as fast as they can for the next milestone in capability?

Or is the idea to bet on the off chance that building an FAI with NN turns out to be easy enough that MIRI and like-minded researchers can solve the associated Friendliness problems themselves and then hand the solutions to whoever ends up leading the AGI race, and they can just plug the solutions in at little cost to their winning the race?

Or you're suggesting aiming/hoping for some feasible combination of both, I guess. It seems pretty similar to what Paul Christiano is doing, except he has "generic AI technology" in place of "NN" above. To me, the chance of success of this approach seems low enough that it's not obviously superior to what MIRI is doing (namely, in my view, betting on the off chance that the contrarian AI approach they're taking ends up being much easier/better than the mainstream approach, which is looking increasingly unlikely but still not impossible).

View more: Next