Comment author: jacob_cannell 22 June 2015 05:55:09PM 3 points [-]

The idea that the cortex or cerebellum, for example, can be described as "general purpose re-programmable hardware" is lacking in both clarity and support.

"General purpose learning hardware" is perhaps better. I used "re-programmable" as an analogy to an FPGA.

However, in a literal sense the brain can learn to use simpe paper + pencil tools as an extended memory, and can learn to emulate a turing machine. Given huge amounts of time, the brain could literally run windows.

And more to the point, programmers ultimately rely on the ability of our brain to simulate/run little sections of code. So in a more practical literal sense, all of the code of windows first ran on human brains.

You seem to be saying that the cortex is a universal reinforcement learning machine

You seem to be hung up reinforcement learning. I use some of that terminology to define a ULM because it is just the most general framework - utility/value functions, etc. Also, there is some pretty strong evidence for RL in the brain, but the brain's learning mechanisms are complex - moreso than any current ML system. I hope I conveyed that in the article.

Learning in the lower sensory cortices in particular can also be modeled well by unsupervised learning, and I linked to some articles showing how UL models can reproduce sensory cortex features. UL can be viewed as a potentially reasonable way to approximate the ideal target update, especially for lower sensory cortex that is far (in a network depth sense) from any top down signals from the reward system. The papers I linked to about approximate bayesian learning and target propagation in particular can help put it all into perspective.

clear evidence that we have found evidence for a reinforcement learning machine in the brain already.

Well, the article summarizes the considerable evidence that the brain is some sort of approximate universal learning machine. I suspect that you have a particular idea of RL that is less than fully general.

Comment author: Richard_Loosemore 22 June 2015 08:18:21PM 1 point [-]

You are right to say that, seen from a high enough level, the brain does general purpose learning .... but the claim becomes diluted if you take it right up to the top level, where it clearly does.

For example, the brain could be 99.999% hardwired, with no flexibility at all except for a large RAM memory, and it would be consistent with the brain as you just described it (able to learn anything). And yet that wasn't the type of claim you were making in the essay, and it isn't what most people mean when they refer to "general purpose learning". You (and they) seem to be pointing to an architectural flexibility that allows the system to grow up to be a very specific, clever sort of understanding system without all the details being programmed ahead of time.

I am not sure why you say I am hung up on RL: you quoted that as the only mechanism to be discussed in the context, so I went with that.

And you are (like many people) not correct to say that RL is the most general framework, or that there is good evidence for RL in the brain. That is a myth: the evidence is very poor indeed.

RL is not "fully general" -- that was precisely my point earlier. If you can point me to a rigorous proof of that which does not have an "and then some magic happens" step in it, I will eat my hat :-)

(Already had a long discussion with Marchus Hutter about this btw, and he agreed in the end that his appeal to RL was based on nothing but the assumption that it works.)

Comment author: jacob_cannell 21 June 2015 10:48:10PM 4 points [-]

The problem with this is that the "engineering diagram" of the brain is really only a hardwire wiring diagram, and the status of speculations about how the hardware modules (really just areas) relate to functional modules is ... well, just that, speculation.

Yes the engineering diagram is a hardware wiring diagram, which I hope I made clear.

In general one of my main points was that most of the big systems (cortex, cerebellum) are general purpose re-programmable hardware - they don't come pre-equipped with software. So the actual functionality of each module arises from the learning system slowly figuring out the appropriate software during development.

I provided some links to the key evidence for the overall hypothesis, I think it is well beyond speculation at this point. (the article certainly contains some speculations, but I labeled them as such)

There are good reasons to suspect that the functional diagram would look competely different

Well of course, because the functional diagram is learned software, and thus can vary substantially from human to human. For example the functional software diagram for the cortex of a blind echolocator looks very different than that of a neurotypical.

Comment author: Richard_Loosemore 22 June 2015 05:20:31PM 0 points [-]

There are serious problems with the claims you are making.

The idea that the cortex or cerebellum, for example, can be described as "general purpose re-programmable hardware" is lacking in both clarity and support.

Clarity. In what sense "generally re-programmable"? So much that it could run Microsoft Word? I have never seen anyone try to go that far, so clearly you must mean something less general. But it is very unclear what exactly is the sense in which you mean the words "general purpose re-programmable hardware".

Support. There are no generally accepted theories for what the function of the cortex actually is. Can you be clearer about what you think the evidence is, in a nutshell?

You seem to be saying that the cortex is a universal reinforcement learning machine. But the kind of evidence that you present is (if you will forgive an extreme oversimplification for the purposes of clarity) the observation that the basal ganglia plays a role that resembles a global packet-switching router, and since a global packet-switching router would be expected to be seen in a reinforcement learning machine, QED.

Now, don't get me wrong, I am symathetic to much of the general spirit that you convey here, but my problem is that my research has gone down this road for a long time already, and while we agree on the general spirit, you have jumped forward several steps and come to (what I see as) a premature conclusion about functionality. To be specific, the concept of a "reinforcement learning machine" is ghastly (it contains "And then some magic happens..." steps), and I believe it would be a terrible mistake to say that there is any clear evidence that we have found evidence for a reinforcement learning machine in the brain already.

I agree with the general interpretation of what those hippocampal and BG loops might be doing, but there are MANY other interpretations beside seeing them as a component of a reinforcement learning machine.

This is a difficult topic to discuss in these narrow confines, alas. I think you have done a service by pointing to the idea of a general learning mechanism, but I think you have just run on ahead to quickly and shackled that idea to something too speculative (the RL notion).

Comment author: Richard_Loosemore 21 June 2015 10:32:44PM 4 points [-]

The problem with this is that the "engineering diagram" of the brain is really only a hardwire wiring diagram, and the status of speculations about how the hardware modules (really just areas) relate to functional modules is ... well, just that, speculation.

There are good reasons to suspect that the functional diagram would look competely different (reasons based in psychological data) and the current state of the art there is poor.

Except perhaps in certain quarters.

Comment author: Houshalter 29 May 2015 08:25:23AM 1 point [-]

[believes that benevolence toward humanity might involve forcing human beings to do something violently against their will.]

But you didn't ask the AI to maximize the value that humans call "benevolence". You asked it to "maximize happiness". And so the AI went out and mass produced the most happy humans possible.

The point of the thought experiment, is to show how easy it is to give an AI a bad goal. Of course ideally you could just tell it to "be benevolent", and it would understand you and do it. But getting that to work is an entirely different problem. (The AI understands the words you say, but how do you get it to care. To actually follow your instructions?)

Comment author: Richard_Loosemore 29 May 2015 01:34:00PM 0 points [-]

Alas, the article was a long, detailed analysis of precisely the claim that you made right there, and the "point of the thought experiment" was shown to be a meaningless fantasy about a type of AI that would be so broken that it would not be capable of serious intelligence at all.

Comment author: misterbailey 20 May 2015 03:58:21PM *  1 point [-]

My bizarre question was just an illustrative example. It seems neither you nor I believe that would be an adequate criterion (though perhaps for different reasons).

If I may translate what you're saying into my own terms, you're saying that for a problem like "shoot first or ask first?" the criteria (i.e., constraints) would be highly complex and highly contextual. Ok. I'll grant that's a defensible design choice.

Earlier in the thread you said

the AI is supposed to take an action in spite of the fact that it is getting '''massive feedback''' from all the humans on the planet, that they do not want this action to be executed.

This is why I have honed in on scenarios where the AI has not yet received feedback on its plan. In these scenarios, the AI presumably must decide (even if the decision is only implicit) whether to consult humans about its plan first, or to go ahead with its plan first (and halt or change course in response to human feedback). To lay my cards on the table, I want to consider three possible policies the AI could have regarding this choice.

  1. Always (or usually) consult first. We can rule this out as impractical, if the AI is making a large number of atomic actions.
  2. Always (or usually) shoot first, and see what the response is. Unless the AI only makes friendly plans, I think this policy is catastrophic, since I believe there are many scenarios where an AI could initiate a plan and before we know what hit us we're in an unrecoverably bad situation. Therefore, implementing this policy in a non-catastrophic way is FAI-complete.
  3. Have some good critera for picking between "shoot first" or "ask first" on any given chunk of planning. This is what you seem to be favoring in your answer above. (Correct me if I'm wrong.) These criteria will tend to be complex, and not necessarily formulated internally in an axiomatic way. Regardless, I fear making good choices between "shoot first" or "ask first" is hard, even FAI-complete. Screw up once, and you are in a catastrophe like in case 2.

Can you let me know: have I understood you correctly? More importantly, do you agree with my framing of the dilemma for the AI? Do you agree with my assessment of the pitfalls of each of the 3 policies?

Comment author: Richard_Loosemore 20 May 2015 07:22:25PM 1 point [-]

I am with you on your rejection of 1 and 2, if only because they are both framed as absolutes which ignore context.

And, yes, I do favor 3. However, you insert some extra wording that I don't necessarily buy....

These criteria will tend to be complex, and not necessarily formulated internally in an axiomatic way.

You see, hidden in these words seems to be an understanding of how the AI is working, that might lead you to see a huge problem, and me to see something very different. I don't know if this is really what you are thinking, but bear with me while I run with this for a moment.

Trying to formulate criteria for something, in an objective, 'codified' way, can sometimes be incredibly hard even when most people would say they have internal 'judgement' that allowed them to make a ruling very easily: the standard saw being "I cannot define what 'pornography' is, but I know it when I see it." And (stepping quickly away from that example because I don't want to get into that quagmire) there is a much more concrete example in the old interactive activation model of word recognition, which is a simple constraint system. In IAC, word recognition is remarkably robust in the face of noise, whereas attempts to write symbolic programs to deal with all the different kinds of noisy corruption of the image turn out to be horribly complex and faulty.

As you can see, I am once again pointing to the fact that Swarm Relaxation systems (understood in the very broad sense that allows all varieties of neural net to be included) can make criterial decisions seem easy, where explicit codification of the decision is a nightmare.

So, where does that lead to? Well, you go on to say:

Regardless, I fear making good choices between "shoot first" or "ask first" is hard, even FAI-complete. Screw up once, and you are in a catastrophe like in case 2.

The key phrase here is "Screw up once, and...". In a constraint system it is impossible for one screw-up (one faulty constraint) to unbalance the whole system. That is the whole raison-d'etre of constraint systems.

Also, you say that the problem of making good choices might be FAI-complete. Now, I have some substantial quibbles with that whole "FAI-complete" idea, but in this case I will just ask a question: are you tring to say that in order to DESIGN the motivation system of the AI in such a way that it will not make one catastrophic choice between shoot-first and ask-first, we must FIRST build a FAI, because that is the only way we can get enough intelligence-horsepower applied to the problem? If so, why exactly would we need to? If the constraint system just cannot allow single failures to get out of control, we don't need to specify every possible criterial decision in advance, we simply rely on context to do the heavy lifting, in perpetuity.

Put another way: the constraint-based AI IS the FAI already, and the reasons for thinking that it can deal with all the potentially troublesome cases have nothing to do with us anticipating every potential troublesome case, ahead of time.

--

Stepping back a moment, consider the following three kinds of case where the AI might have to make a decision.

1) An interstellar asteroid appears from nowhere, travelling at unthinkable speed, and it is going to make a direct hit on the Earth in one hour, with no possibility of survivors. The AI considers a plan in which it quietly euthanizes all life, on the grounds that any other option would lead to one hour of horror, followed by certain death.

2) The AI considers the Dopamine Drip plan.

3) The AI suddenly becomes aware that a rare, precious species of bird has become endangered and the only surviving pair is on a nature trail that is about to be filled with a gang of humans who have been planning a holiday on that trail for months. The gang is approaching the pair right now and one of the birds will die if frightened because it has a heart condition. One plan is to block the humans without explaining (until later), which will inconvenience them.

In all three cases there is a great deal of background information (constraints) that could be brought to bear, and if the AI is constraint-based, it will consider that information. People do this all the time.

In no case is there ONLY a small number of constraints (like, 2 or 3) that are relevant. Where the number of constraints is tiny, there is a chance for a "bad choice" to be made. In fact, I would argue that it is inconceivable that a decision would take place in a near-vacuum of constraints. The more significant the decision, the greater the number of constraints. The bird situation is without doubt the one that has the fewest, but it still involves a fistful of considerations. For this reason, we would expect that all major decisions -- and especially the existential threat ones like 1 and 2 -- would involve a very large number of constraints indeed. It is this mass effect that is at the heart of claims that the constraint approach leads to AI that cannot get into bizarre reasoning episodes.

Finally, notice that in case 1, we are in a situation where (unlike case 2) many humans would say that there is no good decision.

Comment author: misterbailey 19 May 2015 09:08:45AM 1 point [-]

With respect, your first point doesn't answer my question. My question was, what criteria would cause the AI to submit a given proposed action or plan for human approval? You might say that the AI submits every proposed atomic action for approval (in this case, the criterion is the trivial one, "always submit proposal"), but this seems unlikely. Regardless, it doesn't make sense to say the humans have already heard of the plan about which the AI is just now deciding whether to tell them.

In your second point you seem to be suggesting an answer to my question. (Correct me if I'm wrong.) You seem to be suggesting "context." I'm not sure what is meant by this. Is it reasonable to suppose that the AI would make the decision about whether to "shoot first" or "ask first" based on things like, eg., the lower end of its 99% confidence interval for how satisfied its supervisors will be?

Comment author: Richard_Loosemore 19 May 2015 02:16:56PM 2 points [-]

As you wrote, the second point filled in the missing part from the first: it uses its background contextual knowledge.

You say you are unsure what this means. That leaves me a little baffled, but here goes anyway. Suppose I asked a person, today, to write a book for me on the subject of "What counts as an action that is significant enough that, if you did that action in a way that it would affect people, it would rise above some level of "nontrivialness" and you should consult them first? Include in your answer a long discussion of the kind of thought processes you went through to come up with your answers" I know many articulate people who could, if they had the time, write a massive book on that subject.

Now, that book would contain a huge number of constraints (little factoids about the situation) about "significant actions", and the SOURCE of that long list of constraints would be .... the background knowledge of the person who wrote the book. They would call upon a massive body of knowledge about many aspects of life, to organize their thoughts and come up with the book.

If we could look into the head of the person who wrote the book we could find that background knowledge. It would be similar in size to the number of constraints mentioned in the book, or it woudl be larger.

That background knowledge -- both its content AND its structure -- is what I refer to when I talk about the AI using contextual information or background knowledge to assess the degree of significance of an action.

You go on to ask a bizarre question:

Is it reasonable to suppose that the AI would make the decision about whether to "shoot first" or "ask first" based on things like, eg., the lower end of its 99% confidence interval for how satisfied its supervisors will be?

This would be an example of an intelligent system sitting there with that massive array of contextual/background knowledge that could be deployed ...... but instead of using that knowledge to make a preliminary assessement of whether "shooting first" would be a good idea, it ignores ALL OF IT and substitutes one single constraint taken from its knowldege base or its goal system:

"Does this satisfy my criteria for how satisfied my supervisors will be?"

It would entirely defeat the object of using large numbers of constraints in the system, to use only one constraint. The system design is (assumed to be) such that this is impossible. That is the whole point of the Swarm Relaxation design that I talked about.

Comment author: misterbailey 19 May 2015 09:18:18AM *  3 points [-]

I understand your desire to stick to an exegesis of your own essay, but part of a critical examination of your essay is seeing whether or not it is on point, so these sorts of questions really are "about" your essay.

Regardng your preliminary answer, I by "correct" I assume you mean "correctly reflecting the desires of the human supervisors"? (In which case, this discussion feeds into our other thread.)

Comment author: Richard_Loosemore 19 May 2015 01:53:11PM 2 points [-]

With the best will in the world, I have to focus on one topic at a time: I do not have the bandwidth to wander across the whole of this enormous landscape.

As your question: I was using "correct" as a verb, and the meaning was "self-correct" in the sense of bringing back to the previosuly specified course.

In this case this would be about the AI perceiving some aspects of its design that it noticed might cause it to depart from what it's goal was nominally supposed to be. In that case it would suggest modifications to correct the problem.

Comment author: OrphanWilde 18 May 2015 08:49:17PM 3 points [-]

An elementary error. The constraints in question are referred to in the literature as "weak" constraints (and I believe I used that qualifier in the paper: I almost always do). Weak constraints never need to be ALL satisfied at once. No AI could ever be designed that way, and no-one ever suggested that it would. See the reference to McClelland, J.L., Rumelhart, D.E. & Hinton, G.E. (1986) in the paper: that gives a pretty good explanation of weak constraints.

I understand the concept.

How exactly do you propose that the AI "weighs contextual constraints incorrectly" when the process of weighing constraints requires most of the constraints involved (probably thousands of them) to all suffer a simultaneous, INDEPENDENT 'failure' for this to occur?

I'd hazard a guess that, for any given position, less than 70% of humans will agree without reservation. The issue isn't that thousands of failures occur. The issue is that thousands of failures -always- occur.

Assuming this isn't more of the same, what you are saying here is isomorphic to the statement that somehow, a neural net might figure out the correct weighting for all the connections so that it produces the correctly trained output for a given input. That problem was solved in so many different NN systems that most NN people, these days, would consider your statement puzzling.

The problem is solved only for well-understood (and very limited) problem domains with comprehensive training sets.

A trivial variant of your second failure mode. The AI is calculating the constraints correctly, according to you, but at the same time you suggest that it has somehow NOT included any of the constraints that relate to the ethics of forced sterilization, etc. etc. You offer no explanation of why all of those constraints were not counted by your proposed AI, you just state that they weren't.

They were counted. They are, however, weak constraints. The constraints which required human extinction outweighed them, as they do for countless human beings. Fortunately for us in this imagined scenario, the constraints against killing people counted for more.

This is identical to your third failure mode, but here you produce a different list of constraints that were ignored. Again, with no explanation of why a massive collection of constraints suddenly disappeared.

Again, they weren't ignored. They are, as you say, weak constraints. Other constraints overrode them.

Another insult, and putting words into my mouth, and showing no understanding of what a weak constraint system actually is.

The issue here isn't my lack of understanding. The issue here is that you are implicitly privileging some constraints over others without any justification.

Every single conclusion I reached here is one that humans - including very intelligence humans - have reached. By dismissing them as possible conclusions an AI could reach, you're implicitly rejecting every argument pushed for each of these positions without first considering them. The "weak constraints" prevent them.

I didn't choose -wrong- conclusions, you see, I just chose -unpopular- conclusions, conclusions I knew you'd find objectionable. You should have noticed that; you didn't, because you were too concerned with proving that AI wouldn't do them. You were too concerned with your destination, and didn't pay any attention to your travel route.

If doing nothing is the correct conclusion, your AI should do nothing. If human extinction is the correct conclusion, your AI should choose human extinction. If sterilizing people with unhealthy genes is the correct conclusion, your AI should sterilize people with unhealthy genes (you didn't notice that humans didn't necessarily go extinct in that scenario). If rewriting minds is the correct conclusion, your AI should rewrite minds.

And if your constraints prevent the AI from undertaking the correct conclusion?

Then your constraints have made your AI stupid, for some value of "stupid".

The issue, of course, is that you have decided that you know better what is or is not the correct conclusion than an intelligence you are supposedly creating to know things better than you.

And that sums up the issue.

Comment author: Richard_Loosemore 18 May 2015 09:55:30PM 0 points [-]

I said:

How exactly do you propose that the AI "weighs contextual constraints incorrectly" when the process of weighing constraints requires most of the constraints involved (probably thousands of them) to all suffer a simultaneous, INDEPENDENT 'failure' for this to occur?

And your reply was:

I'd hazard a guess that, for any given position, less than 70% of humans will agree without reservation. The issue isn't that thousands of failures occur. The issue is that thousands of failures -always- occur.

This reveals that you are really not understanding what a weak constraint system is, and where the system is located.

When the human mind looks at a scene and uses a thousand clues in the scene to constrain the interpretation of it, those thousand clues all, when the network settles, relax into a state in which most or all of them agree about what is being seen. You don't get "less than 70%" agreement on the interpretation of the scene! If even one element of the scene violates a constraint in a strong way, the mind orients toward the violation extremely rapidly.

The same story applies to countless other examples of weak constraint relaxation systems dropping down into energy minima.

Let me know when you do understand what you are talking about, and we can resume.

Comment author: TheAncientGeek 18 May 2015 09:11:03PM 1 point [-]

Understood, and the bottom line is that the distinction between "terminal" and "instrumental" goals is actually pretty artificial, so if the problem with "maximize friendliness" is supposed to apply ONLY if it is terminal, it is a trivial fix to rewrite the actual terminal goals to make that one become instrumental.

What would you choose as a replacement terminal goal, or would you not use one?

Comment author: Richard_Loosemore 18 May 2015 09:41:30PM 1 point [-]

Well, I guess you would write the terminal goal as quite a long statement, which would summarize the things involved in friendliness, but also include language about not going to extremes, laissez-faire, and so on. It would be vague and generous. And as part of the instrumental goal there would be a stipulation that the friendliness instrumental goal should trump all other instrumentals.

I'm having a bit of a problem answering because there are peripheral assumptions about how such an AI would be made to function, which I don't want to accidentally buy into, because I don't think goals expressed in language statements work anyway. So I am treading on eggshells here.

A simpler solution would simply be to scrap the idea of exceptional status for the terminal goal, and instead include massive contextual constraints as your guard against drift.

Comment author: Vaniver 17 May 2015 03:49:40AM *  2 points [-]

I am finding this comment thread frustrating, and so expect this will be my last reply. But I'll try to make the most of that by trying to write a concise and clear summary:

What you said here amounts to the claim that an AI of unspecified architecture, will, on noticing a difference between hardcoding goal and instrumental knowledge, side with hardcoded goal

Loosemore, Yudkowsky, and myself are all discussing AIs that have a goal misaligned with human values that they nevertheless find motivating. (That's why we call it a goal!) Loosemore observes that if these AIs understand concepts and nuance, they will realize that a misalignment between their goal and human values is possible--if they don't realize that, he doesn't think they deserve the description "superintelligent."

Now there are several points to discuss:

  1. Whether or not "superintelligent" is a meaningful term in this context. I think rationalist taboo is a great discussion tool, and so looked for nearby words that would more cleanly separate the ideas under discussion. I think if you say that such designs are not superwise, everyone agrees, and now you can discuss the meat of whether or not it's possible (or expected) to design superclever but not superwise systems.

  2. Whether we should expect generic AI designs to recognize misalignments, or whether such a realization would impact the goal the AI pursues. Neither Yudkowsky nor I think either of those are reasonable to expect--as a motivating example, we are happy to subvert the goals that we infer evolution was directing us towards in order to better satisfy "our" goals. I suspect that Loosemore thinks that viable designs would recognize it, but agrees that in general that recognition does not have to lead to an alignment.

  3. Whether or not such AIs are likely to be made. Loosemore appears pessimistic about the viability of these undesirable AIs and sees cleverness and wisdom as closely tied together. Yudkowsky appears "optimistic" about their viability, thinking that this is the default outcome without special attention paid to goal alignment. It does not seem to me that cleverness, wisdom, or human-alignment are closely tied together, and so it seems easy to imagine a system with only one of those, by straightforward extrapolation from current use of software in human endeavors.

I don't see any disagreement that AIs pursue their goals, which is the claim you thought needed explanation. What I see is disagreement over whether or not the AI can 'partially solve' the problem of understanding goals and pursuing them. We could imagine a Maverick Nanny that hears "make humans happy," comes up with the plan to wirehead all humans, and then rewrites its sensory code to hallucinate as many wireheaded humans as it can (or just tries to stick as large a number as it can into its memory), rather than actually going to all the trouble of actually wireheading all humans. We can also imagine a Nanny that hears "make humans happy" and actually goes about making humans happy. If the same software underpins both understanding human values and executing plans, what risk is there? But if it's different software, then we have the risk.

Comment author: Richard_Loosemore 18 May 2015 08:43:34PM 1 point [-]

I have read what you wrote above carefully, but I won't reply line-by-line because I think it will be clearer not to.

When it comes to finding a concise summary of my claims, I think we do indeed need to be careful to avoid blanket terms like "superintelligent" or superclever" or "superwise" ... but we should only avoid these IF they are used with the implication they have a precise (perhaps technically precise) meaning. I do not believe they have precise meaning. But I do use the term "superintelligent" a lot anyway. My reason for doing that is because I only use it as an overview word -- it is just supposed to be a loose category that includes a bunch of more specific issues. I only really want to convey the particular issues -- the particular ways in which the intelligence of the AI might be less than adequate, for example.

That is only important if we find ourselves debating whether it might clever, wise, or intelligent ..... I wouldn't want to get dragged into that, because I only really care about specifics.

For example: does the AI make a habit of forming plans that massively violate all of its background knowledge about the goal that drove the plan? If it did, it would (1) take the baby out to the compost heap when what it intended to do was respond to the postal-chess game it is engaged in, or (2) cook the eggs by going out to the workshop and making a cross-cutting jog for the table saw, or (3) ......... and so on. If we decided that the AI was indeed prone to errors like that, I wouldn't mind if someone diagnosed a lack of 'intelligence' or a lack of 'wisdom' or a lack of ... whatever. I merely claim that in that circumstance we have evidence that the AI hasn't got what it takes to impose its will on a paper bag, never mind exterminate humanity.

Now, my attacks on the scenarios have to do with a bunch of implications for what the AI (the hypothetical AI) would actually do. And it is that 'bunch' that I think add up to evidence for what I would summarize as 'dumbness'.

And, in fact, I usually go further than that and say that if someone tried to get near to an AI design like that, the problems would arise early on and the AI itself (inasmuch as it could do anyhting smart at all) would be involved in the efforts to suggest improvements. This is where we get the suggestions in your item 2, about the AI 'recognizing' misalignments.

I suspect that on this score a new paper is required, to carefully examine the whole issue in more depth. In fact, a book.

I am now decided that that has to happen.

So perhaps it is best to put the discussion on hold until a seriously detailed technical book comes out of me? At any rate, that is my plan.

View more: Prev | Next