Comment author: [deleted] 07 April 2016 06:23:47PM *  1 point [-]

This sounds like you're assuming that I'm trying to argue in favor of Friendly AI as the best solution...

(Responding to the whole paragraph but don't want to quote it all) I would be interested to hear a definition of "AI risk" that does not reduce to "risk of unfriendly outcome" which itself is defined in terms of friendliness aka relation to human morality. If, like me, you reject the idea of consistent, discoverable morality in the first place, and therefore find friendliness to be an ill-formed, inconsistent idea, then it's hard to say anything concrete about AI risk either. If you have a better definition that does not reduce to alignment with human morality, please provide it.

Mapping the problem starts with defining what the problem is. What is AI risk, without reference to dubious notions of human morality?

I think that you've left out a LOT of things that must happen a certain way in order for your AI risk outcomes to come to pass. Would appreciate hearing more about these.

To start with there's all the normal, benign things that happen in any large scale software project that require human intervention. Like, say, the AGI crashes. Or the database that holds its memories becomes inconsistent. Or it gets deadlocked on choosing actions due to a race condition. The humanity threatening failure mode presume that the AGI, on its first attempt at break-out, doesn't suffer any normal engineering defect failures -- or that if it does then the humans operating it just fix it and turn it back on. I'm not interested in any arguments that assume the latter, and the former is highly conjunctive.

Isn't that the standard way of figuring out the appropriate corrective actions? First figure out what would happen absent any intervention, then see which points seem like most amenable to correction.

I may have misread your intent, and if so I apologize. The first sentence of your post here made it seem like you were countering a criticism, aka advocating for the original position. So I read your posts in that context and may have inferred too much.

In response to comment by [deleted] on [link] Disjunctive AI Risk Scenarios
Comment author: Kaj_Sotala 08 April 2016 12:41:21PM *  0 points [-]

If, like me, you reject the idea of consistent, discoverable morality in the first place, and therefore find friendliness to be an ill-formed, inconsistent idea, then it's hard to say anything concrete about AI risk either. If you have a better definition that does not reduce to alignment with human morality, please provide it.

Mapping the problem starts with defining what the problem is. What is AI risk, without reference to dubious notions of human morality?

I also reject the idea of a consistent, discoverable morality, at least to the extent that the morality is assumed to be unique. I think that moralities are not so much discovered but constructed: a morality is in a sense an adaptation to a specific environment, and it will continue to constantly evolve as the environment changes (including the social environment, so the morality changing will by itself cause more changes to the environment, which will trigger new changes to the morality, etc.). There is no reason to assume that this will produce a consistent moral system: there will be inconsistencies which will need to be resolved when they become relevant, and the order in which they are resolved seems likely to affect the final outcome.

But to answer your actual question: I don't have a rigorous answer for exactly what the criteria for "success" are. The intuitive answer is that there are some futures that I'd consider horrifying if they came true and some which I'd consider fantastic if they came true, and I want to further the fantastic ones and avoid the horrifying ones. (I presume this to also be the case for you, because why else would you care about the topic in the first place?)

Given that this is very much a "I can't give you a definition, but I know it when I see it" thing, it seems hard to make sure that we avoid the horrifying outcomes without grounding the AIs in human values somehow, and making sure that they share our reaction when they see (imagine) some particular future. (Either that, or trying to make sure that we evolve to be equally powerful as the AIs, but this seems unlikely to me.)

Depending on your definitions, you could say that this still reduces to alignment with human morality, but with the note that my conception of human morality is that of a dynamic process, and that the AIs could be allowed to e.g. nudge the development of our values towards a direction that made it easier to reconcile value differences between different cultures, even if there was no "objective" reason for why that direction was any better or worse than any other one.

To start with there's all the normal, benign things that happen in any large scale software project that require human intervention. Like, say, the AGI crashes. Or the database that holds its memories becomes inconsistent. Or it gets deadlocked on choosing actions due to a race condition. The humanity threatening failure mode presume that the AGI, on its first attempt at break-out, doesn't suffer any normal engineering defect failures -- or that if it does then the humans operating it just fix it and turn it back on. I'm not interested in any arguments that assume the latter, and the former is highly conjunctive.

Are you assuming that there will only ever be one AGI that might try to escape, that its creators never decide to release it, and that it can't end up effectively in control even if boxed?

Comment author: [deleted] 06 April 2016 09:02:06PM *  1 point [-]

To be honest I only did a brief read through. The context of the debate itself is what I object to. I find the concept of "friendly" AI itself to be terrifying. It's my life work to make sure that we don't end up in such a dystopian tyrannical future. Debating the probabilities of whether what you call AI "risk" is likely or unlikely (disjunctive or conjunctive) is rather pointless when you are ambivalent towards that particular outcome.

Now I think that you've left out a LOT of things that must happen a certain way in order for your AI risk outcomes to come to pass. You've also left out ALL of the corrective actions that could be taken by any of the human actors in the picture. It reminds me of a martial arts demonstration where the attacker throws a punch and then stands there in frozen form, unreactive while the teacher demonstrate the appropriate response at leisure. But if like me you don't see such a scenario as a bad thing in the first place, then it's an academic point. And I tire of debating things of no real world significance.

In response to comment by [deleted] on [link] Disjunctive AI Risk Scenarios
Comment author: Kaj_Sotala 07 April 2016 03:12:47PM 0 points [-]

Hmm. There may have been a miscommunication here.

The context of the debate itself is what I object to. I find the concept of "friendly" AI itself to be terrifying.

This sounds like you're assuming that I'm trying to argue in favor of Friendly AI as the best solution. Now I admittedly do currently find FAI one of the most promising options for trying to navigate AI risk, but I'm not committed to that. I just want to find whatever solution works, regardless of whether it happens to be FAI or something completely else. But in order to find out what's the best solution, one needs to have a comprehensive idea of what the problem is like and how it's going to manifest itself, and that's what I'm trying to do - map out the problem, so that we can figure out what the best solutions are.

I think that you've left out a LOT of things that must happen a certain way in order for your AI risk outcomes to come to pass.

Would appreciate hearing more about these.

You've also left out ALL of the corrective actions that could be taken by any of the human actors in the picture.

Isn't that the standard way of figuring out the appropriate corrective actions? First figure out what would happen absent any intervention, then see which points seem like most amenable to correction.

Comment author: [deleted] 05 April 2016 03:02:14PM 2 points [-]

You should do a similar mapping of the disjunctive ways in which AI could go right and lead to world bettering technological growth.

In response to comment by [deleted] on [link] Disjunctive AI Risk Scenarios
Comment author: Kaj_Sotala 06 April 2016 12:07:04PM 1 point [-]

I guess you could consider all of Responses such a disjunctive post, if you consider the disjunctive options to be "this proposed response to AGI succeeds". :)

I would be interested in hearing whether you had more extended critiques of these posts. I incorporated some of our earlier discussion into my post, and was hoping to develop them further in part by having conversations with people who were more skeptical of the scenarios depicted.

Comment author: turchin 05 April 2016 08:42:49PM *  6 points [-]

I think that one of the main disjunctions is that neither self-improving, nor high level intelligence nor control of the world are necessary conditions of human extinction because of AI.

Imagine a computer which helps to create biological viruses for a terrorist. It is neither AGI, nor self-improving, not agent, doesn't have values, and is local and confined. But it will help to calculate and create perfect virus, which will be capable to wipe out humanity.

Comment author: Kaj_Sotala 06 April 2016 12:04:53PM 2 points [-]

This is an excellent point! I'm intending to discuss non-superintelligence scenarios in a follow-up post.

Comment author: Lyyce 05 April 2016 04:10:10PM 1 point [-]

I'm not sure an intelligence explosion can happen without significant speed or computational power improvements.

I guess it boils down to what happens if you let human-level intelligence self-modify without modifying the hardware (a.k.a how much human intelligence is optimised). Until now the ratio results to computational power used in significantly in favor of humans compared to I.A but the later is improving fast, and you don't need an I.A to be as versatile as human. Is there any work on what the limit on optimisation for intelligence?

It looks like a nitpick since hardware capacity is increasing steadily and will soon exceed the capacities of the human brain, but it is a lot easier to prevent intelligence explosion by putting a limit on the computational power.

Comment author: Kaj_Sotala 06 April 2016 12:03:19PM *  0 points [-]

It's unclear, but in narrow AI we've seen software get smarter even in cases where the hardware is kept constant, or even made worse. For example, the top chess engine of 2014 beats a top engine from 2006, even when you give the 2014 engine 2% the computing power of the 2006 engine. That would seem to suggest that an intelligence explosion without hardware improvements might be possible, at least in principle.

In practice I would expect an intelligence explosion to lead to hardware improvements as well, though. No reason for the AI to constrain itself just to the software side.

Comment author: Elo 30 March 2016 06:41:48AM 4 points [-]

this is a trade off that we make for partially completed survey data. On the one hand; the total number of questions was mentioned at the start (maybe could have been highlighted more), and there is a progress bar at the top of each page. I agree that this is not idea; does the trade off make more sense now?

In response to comment by Elo on Lesswrong 2016 Survey
Comment author: Kaj_Sotala 02 April 2016 07:58:23AM 1 point [-]

this is a trade off that we make for partially completed survey data.

Not sure what you mean by that?

But thanks for mentioning the progress bar, I didn't notice it at first. That helps somewhat.

Comment author: Kaj_Sotala 30 March 2016 06:18:19AM *  2 points [-]

I notice that the fact that I can't see all the questions on one page makes me feel more averse towards taking this survey. It makes me feel like there's a potentially infinite amount of content to be answered, lurking out of sight, whereas if it was all one page I'd always be clear on how many more questions there were left.

This format also makes it hard to answer questions out of order, skipping a hard one until I'm done with all the easy ones.

Comment author: Gram_Stone 20 March 2016 04:43:21PM 1 point [-]

Here are my ideological Turing test results of your comment:

People usually use the word intuition to refer to vague impressions that are not amenable to the same sort of justification as deliberative judgments, so these are different from the example that you provided of quickly inventing a deliberative rule and making errors in the process. This makes the purported counterexample less persuasive to me than you seem to expect it to be. Evaluate this comment in the context that we both still anticipate the same experiences, so this is likely a disagreement over word usage, and not likely to be highly significant.

I think this is a very productive criticism. I feel emphasis in italics makes it easier for me to write because it makes it more similar to the way that I speak, so please don't interpret them as aggressive. The way my mind goes down this path is thus:

I have to make the qualification that I don't believe that intuitions are vague feelings that cannot be justified, but vague feelings that have not been justified. There is always some fact of the matter as to whether or not it is, in some sense. But once again, probably something we would consider as disagreeing about word usage. But I think it's an important boundary to draw. From Evans (2006):

If intuition means based on feelings without access to explicit reasoning, then that sounds like a type 1 process. But in some applications it seems to mean naïve judgement, which could be based on explicit rules or heuristics that occur to an untrained judge, in which case they would be type 2.

People often use the phrase 'intuition' to refer to confident beliefs retrieved from cached memory, and the idea is that when you go wrong, it's because intuitions are unreliable. I'm getting at the possibility that that's what people say, but it's not the whole picture.

Say that you're a judge on Pop Idol or something like that, and you have no experience doing it, and you want to quickly come up with a rule, and you retrieved the reliable intuition that pop idols are usually very physically attractive, and then invented a deliberative rule that used your subjective rating of each candidate's physical attractiveness as a measure for evaluating their general Pop Idol factor, and suppose that physical attractiveness actually does not correlate perfectly with the true general Pop Idol factor. Then you would have begun with a reliable intuition and put it into an unreliable deliberative process and obtained an 'unreliable' result in the sense that it does not optimize for the purported normative criterion of Pop Idol judgment panels, which is the selection of the best Pop Idol; you would have picked the most attractive candidate instead, and you would have made a mistake on a higher level than using an unreliable intuition: you would have combined reliable intuitions in a deliberative but unreliable way. This is closely related to the 'System 1 is fast, System 2 is slow' distinction. Reasoning that looks like fast, unreliable intuitive reasoning can really just be fast, unreliable deliberative reasoning. So the main point is not about saying that there are a lot of counterexamples to 'intuitive' reasoning being System 1, but that if you want to do real work the category 'intuitive' won't cut it, because it's still a leaky generalization, even if it isn't that leaky. Does that all make sense?

Comment author: Kaj_Sotala 30 March 2016 06:14:46AM 1 point [-]

I liked your rephrasing of my comment. :) I felt that it was an accurate summary of what I meant.

I believe that we're in agreement about everything.

Comment author: gjm 20 March 2016 02:43:31AM 1 point [-]

HN has a mechanism for giving an article your seal of approval: it's called upvoting. More than that is only necessary if you expect your approval specifically to weigh more highly than that of other users.

Comment author: Kaj_Sotala 20 March 2016 05:33:22PM *  4 points [-]

Seeing comments from (say) three people who explicitly say that they agree or think I've done a good work, feels much better than just seeing three upvotes on my comment / post. I know that there are other people who feel the same. Our minds aren't good at visualizing numbers.

I think that "if you are particularly happy about something, you can indicate this with an explicit comment in addition to the upvote" is a good norm to have. Giving people extra reward for doing particularly good work is good.

Comment author: Kaj_Sotala 20 March 2016 03:59:02PM *  1 point [-]

The fourth common confusion is that Type 1 processes involve 'intuitions' or 'naivety' and Type 2 processes involve thought about abstract concepts. You might describe a fast-and-loose rule that you made up as a 'heuristic' and naively think that it is thus a 'System 1 process', but it would still be the case that you invented that rule by deliberative means, and thus by means of a Type 2 process. When you applied the rule in the future it would be by means of a deliberative process that placed a demand on working memory, not by some behavior that is based on association or procedural memory, as if by habit.

I suspect that we're disagreeing on the definitions of words rather than having any substantial difference in expectations, but: I think the way "intuition" is commonly used refers to vague feelings that you can't quite justify explicitly, not explicit heuristics that you've generated by deliberation. So your example doesn't really feel like a counterexample for the claim that intuitions are a Type 1 process.

View more: Prev | Next