Looking for advice with something it seems LW can help with.
I'm currently part of a program the trains highly intelligent people to be more effective, particularly with regards to scientific research and effecting change within large systems of people. I'm sorry to be vague, but I can't actually say more than that.
As part of our program, we organize seminars for ourselves on various interesting topics. The upcoming one is on self-improvement, and aims to explore the following questions: Who am I? What are my goals? How do I get there?
Naturally, I'm of the opinion that rationalist thought has a lot to offer on all of those questions. (I also have ulterior motives here, because I think it would be really cool to get some of these people on board with rationalism in general.) I'm having a hard time narrowing down this idea to a lesson plan I can submit to the organizers, so I thought I'd ask for suggestions.
The possible formats I have open for an activity are a lecture, a workshop/discussion in small groups, and some sort of guided introspection/reading activity (for example just giving people a sheet with questions to ponder on it, or a text to reflect on).
I've also come up with several possible topics: How to Actually Change Your Mind (ideas on how to go about condensing it are welcome), practical mind-hacking techniques and/or techniques for self-transparency, or just information on heuristics and biases because I think that's useful in general.
You can also assume the intended audience already know each other pretty well, and are capable of rather more analysis and actual math than is average.
Ideas for topics or activities, particularly ones that include a strong affective experience because those are generally better at getting poeple to think about this sort of thing for the first time, are welcome.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
I've been thinking about what seems to be the standard LW pitch on AI risk. It goes like this: "Consider an AI that is given a goal by humans. Since 'convert the planet into computronium' is a subgoal of most goals, it does this and kills humanity."
The problem, which various people have pointed out, is that this implies an intelligence capable of taking over the world, but not capable of working out that when a human says pursue a certain goal, they would not want this goal to be pursued in a way that leads to the destruction of the world.
Worse, the argument can then be made that this idea that an AI will interpret goals so literally without modelling a human mind constitutes an "autistic AI" and that only autistic people would assume that AI would be similarly autistic. I do not endorse this argument in any way, but I guess its still better to avoid arguments that signal low social skills, all other things being equal.
Is there any consensus on what the best 'elevator pitch' argument for AI risk is? Instead of focusing on any one failure mode, I would go with something like this:
"Most philosophers agree that there is no reason why superintelligence is not possible. Anything which is possible will eventually be achieved, and so will superintelligence, perhaps in the far future, perhaps in the next few decades. At some point, superintelligences will be as far above humans as we are above ants. I do not know what will happen at this point, but the only reference case we have is humans and ants, and if superintelligences decide that humans are an infestation, we will be exterminated."
Incidentally, this is the sort of thing I mean by painting LW style ideas as autistic (via David Pierce)
Sometimes David Pierce seems very smart. And sometimes he seems to imply that the ability to think logically while on psychedelic drugs is as important as 'autistic intelligence'. I don't think he thinks that autistic people are zombies that do not experience subjective experience, but that also does seem implied.
I think the basic problem here is an undissolved question: what is 'intelligence'? Humans, being human, tend to imagine a superintelligence as a highly augmented human intelligence, so the natural assumption is that regardless of the 'level' of intelligence, skills will cluster roughly the way they do in human minds, i.e. having the ability to take over the world implies a high posterior probability of having the ability to understand human goals.
The problem with this assumption is that mind-design space is large (<--understatement), and the prior probability of a superintelligence randomly ending up with ability clusters analogous to human ability clusters is infinitesimal. Granted, the probability of this happening given a superintelligence designed by humans is significantly higher, but still not very high. (I don't actually have enough technical knowledge to estimate this precisely, but just by eyeballing it I'd put it under 5%.)
In fact, autistic people are an example of non-human-standard ability clusters, and even that's only by a tiny amount in the scale of mind-design-space.
As for an elevator pitch of this concept, something like "just because evolution happened design our brains to be really good at modeling human goal systems, doesn't mean all intelligences are good at it, regardless of how good they might be at destroying the planet".