Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Matthew_Opitz comments on Black box knowledge - Less Wrong Discussion

2 Post author: Elo 03 March 2016 10:40PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (7)

You are viewing a single comment's thread.

Comment author: Matthew_Opitz 05 March 2016 12:08:55AM 2 points [-]

Some of your black box examples seem unproblematic. I agree that all you need to trust that a toaster will toast bread is an induction from repeated observation that bread goes in and toast comes out.

(Although, if the toaster is truly a black box about which we know absolutely NOTHING, then how can we induce that the toaster will not suddenly start shooting out popsicles or little green leprechauns when the year 2017 arrives? In reality, a toaster is nothing close to a black box. It is more like a gray box. Even if you think you know nothing about how a toaster works, you really do know quite a bit about how a toaster works by virtue of being a reasonably intelligent adult who understands a little bit about general physics--enough to know that a toaster is never going to start shooting out leprechauns. In fact, I would wager that there are very few true "black boxes" in the world--but rather, many gray boxes of varying shades of gray).

However, the tax accountant and the car mechanic seem to be even more problematic as examples of black boxes because there is intelligent agency behind them--agency that can analyze YOUR source code, determine the extent to which you think those things are a black box, and adjust their output accordingly. For example, how do you know that your car will be fixed if you bring it to the mechanic? If the mechanic knows that you consider automotive repair to be a complete black box, the mechanic could have an incentive to purposefully screw up the alignment or the transmission or something that would necessitate more repairs in the future, and you would have no way of telling where those problems came from. Or, the car mechanic could just lie about how much the repairs would cost, and how would you know any better? Ditto with the tax accountant.

The tax accountant and the car mechanic are a bit like AIs...except AIs would presumably be much more capable at scanning our source code and taking advantage of our ignorance of its black-box nature.

Here's another metaphor: in my mind, the problem of humanity confronting AI is a bit like the problem that a mentally-retarded billionaire would face.

Imagine that you are a mentally-retarded person with the mind of a two-year-old who has suddenly just come into possession of a billion dollars in a society where there is no state or higher authority to regulate enforce any sort of morality or make sure that things are "fair." How are you going to ensure that your money will be managed in your interest? How can you keep your money from being outright stolen from you?

I would assert that there would be, in fact, no way at all for you to have your money employed in your interest. Consider:

*Do you hire a money manager (a financial advisor, a bank, a CEO...any sort of money manager)? What would keep this money manager from taking all of your money and running away with it? (Remember, there is no higher authority to punish this money manager in this scenario). If you were as smart or smarter than the money manager, you could probably track down this money manager and take your money back. But you are not as smart as the money manager. You are a mentally-retarded person with the mind of a toddler. And in that case where you did happen to be as smart as the money manager, then the money manager would be redundant in the first place. You would just manage your own money.

*Do you try to manage your money on your own? Remember, you have the mind of a two-year-old. The best you can do is stumble around on the floor and say "Goo-goo-gah-gah." What are you going to be able to do with a billion dollars?

Neither solution in this metaphor is satisfactory.

In this metaphor: *The two-year-old billionaire is humanity. *The lack of a higher authority symbolizes the absence of a God to punish an AI. *The money manager is like AI.

If an AI is a black box, then you are screwed. If an AI is not a black box, then what do you need the AI for?

Humans only work as black-boxes (or rather, gray-boxes) because we have an instinctual desire to be altruistic to other humans. We don't take advantage of each other. (And this does not apply equally to all people. Sociopaths and tribalistic people would happily take advantage of strangers. And I would allege that a world civilization made up of entirely these types of people would be deeply dysfunctional).

So, here's how we might keep an AI from becoming a total black-box, while still allowing it to do useful work:

*Let it run for a minute in a room unconnected to the Internet. *Afterwards, hiring a hundred million programmers to trace out exactly what the AI was doing in that minute by looking at a readout of the most base-level code of the AI.

To any one of these programmers, the rest of the AI that does not happen to be that programmer's special area of expertise will seem like a black box. But, through communication, humanity could pool their specialized investigations into each part of the AIs running code and sketch out an overall picture of whether its computations were on a friendly trajectory or not.

Comment author: Elo 05 March 2016 11:11:49AM *  1 point [-]

trust that a toaster will toast bread

yes, this is a retrospective example. once I already know what happens; I can say that a toaster makes bread into toast. If you start to make predictive examples; things get more complicated as you have mentioned.

It still helps to have an understanding of what you don't know. And in the case of AI; an understanding of what you are deciding not to know (for now) can help you consider the risk involved in playing with AI of unclear potential.

i.e. AI with defined CEV -> what happens next -> humans are fine. seems like a bad idea to expect a good outcome from. Now maybe we can work on a better process for defining CEV.