steven0461 comments on How can we ensure that a Friendly AI team will be sane enough? - Less Wrong

10 Post author: Wei_Dai 16 May 2012 09:24PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (64)

You are viewing a single comment's thread.

Comment author: steven0461 16 May 2012 11:01:01PM *  16 points [-]

This post highlights for me that we don't have a good understanding of what things like "more rational" and "more sane" mean, in terms of what dimensions human minds tend to vary along as a result of nature, ordinary nurture, and specialized nurture of the kind CFAR is trying to do. I think more understanding here would be highly valuable, and I mostly don't think we can get it from studies of the general population. (We can locally define "more sane" as referring to whatever properties are needed to get the right answer on this specific question, of course, but then it might not correspond to definitions of "more sane" that we're using in other contexts.)

Not that this answers your question, but there's a potential tension between the goal of picking people with a deep understanding of FAI issues, and the goal of picking people who are unlikely to do things like become attached to the idea of being an FAI researcher.

Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?

Comment author: John_Maxwell_IV 17 May 2012 12:52:54AM 1 point [-]

If humans are irrational because of lots of bugs in our brain, it could be hard to measure how many bugs have been fixed or worked around, and how reliable these fixes or workarounds are.

Comment author: steven0461 17 May 2012 01:03:59AM 7 points [-]

There's how strong one's rationality is at its peak, and how strong one's rationality is on average, and the rarity and badness of lapses in one's rationality, and how many contexts one's rationality works in, and to what degree one's mistakes and insights are independent of others' mistakes and insights (independence of mistakes means you can correct each other, independence of insights means you don't duplicate insights). All these measures could vary orthogonally.

Comment author: ghf 17 May 2012 09:03:58AM *  1 point [-]

Good point. And, depending on your assessment of the risks involved, especially for AGI research, the level of the lapses might be more important than the peak or even the average. A researcher who is perfectly rational (hand waving for the moment about how we measure that) 99% of the time but has, say, fits of rage every so often might be even more dangerous than the slightly less rational on average colleague who is nonetheless stable.

Comment author: khafra 17 May 2012 12:20:30PM 2 points [-]

Or, proper mechanism design for the research team might be able to smooth out those troughs and let you use the highest-EV researchers without danger.

Comment author: John_Maxwell_IV 17 May 2012 02:03:37AM 0 points [-]

You seem to be using rationality to refer to both bug fixes and general intelligence. I'm more concerned about bug fixes myself, for the situation Wei Dai describes. Status-related bugs seem potentially the worst.

Comment author: steven0461 17 May 2012 02:15:43AM *  1 point [-]

I meant to refer to just bug fixes, I think. My comment wasn't really responsive to yours, just prompted by it, and I should probably have added a note to that effect. One can imagine a set of bugs that become more fixed or less fixed over time, varying together in a continuous manner, depending on e.g. what emotional state one is in. One might be more vulnerable to many bugs when sleepy, for example. One can then talk about averages and extreme values of such a "general rationality" factor in a typical decision context, and talk about whether there are important non-standard contexts where new bugs become important that one hasn't prepared for. I agree that bugs related to status (and to interpersonal conflict) seem particularly dangerous.

Comment author: Wei_Dai 18 May 2012 09:00:42AM 1 point [-]

This post highlights for me that we don't have a good understanding of what things like "more rational" and "more sane" mean, in terms of what dimensions human minds tend to vary along as a result of nature, ordinary nurture, and specialized nurture of the kind CFAR is trying to do.

I mentioned some specific biases that seem especially likely to cause risk for an FAI team. Is that the kind of "understanding" you're talking about, or something else?

Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?

I think probably there would just be an FAI research team that is told to continually reevaluate feasibility/safety as it goes. I just called it "FAI feasibility team" to emphasize that at the start its most important aim would be to evaluate feasibility and safety. Having an actual separate feasibility team might buy some additional overall sanity (but how, besides that attachment to being FAI researchers won't be an issue since they won't continue to be FAI researchers either way?). It seems like there would probably be better ways to spend the extra resources if we had them though.

Comment author: steven0461 18 May 2012 08:48:15PM *  1 point [-]

I mentioned some specific biases that seem especially likely to cause risk for an FAI team. Is that the kind of "understanding" you're talking about, or something else?

I think that falls under my parenthetical comment in the first paragraph. Understanding what rationality-type skills would make this specific thing go well is obviously useful, but it would also be great if we had a general understanding of what rationality-type skills naturally vary together, so that we can use phrases like "more rational" and have a better idea of what they refer to across different contexts.

It seems like there would probably be better ways to spend the extra resources if we had them though.

Maybe? Note that if people like Holden have concerns about whether FAI is too dangerous, that might make them more likely to provide resources toward a separate FAI feasibility team than toward, say, a better FAI team, so it's not necessarily a fixed heap of resources that we're distributing.

Comment author: Dr_Manhattan 17 May 2012 01:27:19AM 1 point [-]

Your post suggests that an FAI feasibility team would be made of the same people who would then (depending on their findings) go ahead and form an actual FAI team, but does that need to be the case?

You're implying possibility of official separation between the two teams, which seems like a good idea. Between "finding/filtering more rational people" and "social mechanisms" I would vote for mechanisms.