Suppose you buy the argument that humanity faces both the risk of AI-caused extinction and the opportunity to shape an AI-built utopia. What should we do about that? As Wei Dai asks, "In what direction should we nudge the future, to maximize the chances and impact of a positive intelligence explosion?"

This post serves as a table of contents and an introduction for an ongoing strategic analysis of AI risk and opportunity.

Contents:

  1. Introduction (this post)
  2. Humanity's Efforts So Far
  3. A Timeline of Early Ideas and Arguments
  4. Questions We Want Answered
  5. Strategic Analysis Via Probability Tree
  6. Intelligence Amplification and Friendly AI
  7. ...


Why discuss AI safety strategy?

The main reason to discuss AI safety strategy is, of course, to draw on a wide spectrum of human expertise and processing power to clarify our understanding of the factors at play and the expected value of particular interventions we could invest in: raising awareness of safety concerns, forming a Friendly AI team, differential technological development, investigating AGI confinement methods, and others.

Discussing AI safety strategy is also a challenging exercise in applied rationality. The relevant issues are complex and uncertain, but we need to take advantage of the fact that rationality is faster than science: we can't "try" a bunch of intelligence explosions and see which one works best. We'll have to predict in advance how the future will develop and what we can do about it.


Core readings

Before engaging with this series, I recommend you read at least the following articles:


Example questions

Which strategic questions would we like to answer? Muehlhauser (2011) elaborates on the following questions:

  • What methods can we use to predict technological development?
  • Which kinds of differential technological development should we encourage, and how?
  • Which open problems are safe to discuss, and which are potentially dangerous?
  • What can we do to reduce the risk of an AI arms race?
  • What can we do to raise the "sanity waterline," and how much will this help?
  • What can we do to attract more funding, support, and research to x-risk reduction and to specific sub-problems of successful Singularity navigation?
  • Which interventions should we prioritize?
  • How should x-risk reducers and AI safety researchers interact with governments and corporations?
  • How can optimal philanthropists get the most x-risk reduction for their philanthropic buck?
  • How does AI risk compare to other existential risks?
  • Which problems do we need to solve, and which ones can we have an AI solve?
  • How can we develop microeconomic models of WBEs and self-improving systems?
  • How can we be sure a Friendly AI development team will be altruistic?

Salamon & Muehlhauser (2013) list several other questions gathered from the participants of a workshop following Singularity Summit 2011, including:

  • How hard is it to create Friendly AI?
  • What is the strength of feedback from neuroscience to AI rather than brain emulation?
  • Is there a safe way to do uploads, where they don't turn into neuromorphic AI?
  • How possible is it to do FAI research on a seastead?
  • How much must we spend on security when developing a Friendly AI team?
  • What's the best way to recruit talent toward working on AI risks?
  • How difficult is stabilizing the world so we can work on Friendly AI slowly?
  • How hard will a takeoff be?
  • What is the value of strategy vs. object-level progress toward a positive Singularity?
  • How feasible is Oracle AI?
  • Can we convert environmentalists into people concerned with existential risk?
  • Is there no such thing as bad publicity [for AI risk reduction] purposes?

These are the kinds of questions we will be tackling in this series of posts for Less Wrong Discussion, in order to improve our predictions about which direction we can nudge the future to maximize the chances of a positive intelligence explosion.

New to LessWrong?

AI Risk and Opportunity: A Strategic Analysis
New Comment


163 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Selective opinion and answers (for longer discussions, respond to specific points and I'll furnish more details):

Which kinds of differential technological development should we encourage, and how?

I recommend pushing for whole brain emulations, with scanning-first and emphasis on fully uploading actual humans. Also, military development of AI should be prioritised over commercial and academic development, if possible.

Which open problems are safe to discuss, and which are potentially dangerous?

Seeing what has already been published, I see little adva... (read more)

2Wei Dai
What are your most important disagreements with other FHI/SIAI people? How do you account for these disagreements? You say: but also: which makes me a bit confused. Are you saying we should push them simultaneously, or what? Also, what path do you see from a successful Oracle AI to a positive Singularity? For example, use Oracle AI to develop WBE technology, then use WBEs to create FAI? Or something else?
6Stuart_Armstrong
Main disagreement with FHI people is that I'm more worried about AI than they are (I'm probably up with the SIAI folks on this). I suspect an anchoring effect here - I was drawn to the FHI's work through AI risk, others were drawn in through other angles (also I spend much more time on Less Wrong, making AI risks very salient). Not sure what this means for accuracy, so my considered opinion is that AI is less risky than I individually believe. My main disagreement with SIAI is that I think FAI is unlikely to be implementable on time. So I want to explore alternative avenues, several ones ideally. Oracle to FAI would be one route; Oracle to people taking AI seriously to FAI might be another. WBE opens up many other avenues (including "no AI"), so is also worth looking into. I haven't bothered to try and close the gap between me and SIAI on this, because even if they are correct, I think it's valuable for the group to have someone looking into non-FAI avenues.
4Wei Dai
Thanks for the answers. The main problem I have with Oracle AI is that it seems a short step from OAI to UFAI, but a long path to FAI (since you still need to solve ethics and it's hard to see how OAI helps with that), so it seems dangerous to push for it, unless you do it in secret and can keep it secret. Do you agree? If so, I'm not sure how "Oracle to people taking AI seriously to FAI" is supposed to work.
2Stuart_Armstrong
My main "pressure point" is pushing UFAI development towards OAI. ie I don't advocate building OAI, but making sure that the first AGIs will be OAIs. And I'm using far too many acronyms.
8Wei Dai
What does it matter that the first AGIs will be OAIs, if UFAIs follow immediately after? I mean, once knowledge of how to build OAIs start to spread, how are you going to make sure that nobody fails to properly contain their Oracles, or intentionally modifies them into AGIs that act on their own initiatives? (This recent post of mine might better explain where I'm coming from, if you haven't already read it.)
3cousin_it
We can already think productively about how to win if oracle AIs come first. Paul Christiano is working on this right now, see the "formal instructions" posts on his blog. Things are still vague but I think we have a viable attack here.
2Stuart_Armstrong
Wot cousin_it said. Of course the model "OAIs are extremely dangerous if not properly contained; let's let everyone have one!" isn't going to work. But there are many things we can try with an OAI (building a FAI, for instance), and most importantly, some of these things will be experimental (the FAI approach relies on getting the theory right, with no opportunity to test it). And there is a window that doesn't exist with a genie - a window where people realise superintelligence is possible and where we might be able to get them to take safety seriously (and they're not all dead). We might also be able to get exotica like a limited impact AI or something like that, if we can find safe ways of experimenting with OAIs. And there seems no drawback to pushing an UFAI project into becoming an OAI project.
4Wei Dai
Cousin_it's link is interesting, but it doesn't seem to have anything to do with OAI, and instead looks like a possible method of directly building an FAI. Hmm, maybe I'm underestimating the amount of time it would take for OAI knowledge to spread, especially if the first OAI project is a military one (on the other hand, the military and their contractors don't seem to be having better luck with network security than anyone else). How long do you expect the window of opportunity (i.e., the time from the first successful OAI to the first UFAI, assuming no FAI gets built in the mean time) to be? I'd like to have FAI researchers determine what kind of experiments they want to do (if any, after doing appropriate benefit/risk analysis), which probably depends on the specific FAI approach they intend to use, and then build limited AIs (or non-AI constructs) to do the experiments. Building general Oracles that can answer arbitrary (or a wide range of) questions seems unnecessarily dangerous for this purpose, and may not help anyway depending on the FAI approach. There may be, if the right thing to do is to instead push them to not build an AGI at all.
0Stuart_Armstrong
One important fact I haven't been mentioning: OAI help tremendously with medium speed takeoffs (fast takeoffs are dangerous for the usual reasons, slow takeoffs mean that we will have moved beyond OAIs by the time the intelligence level hits dangerous), because we can then use them to experiment. Interacting with AGI people at the moment (organising a jointish conference), will have a clearer idea of how they react to these ideas at a later stage.
0Vladimir_Nesov
Moved where/how? Slow takeoff means we have more time, but I don't see how it changes the nature of the problem. Low time to WBE makes (not particularly plausible) slow takeoff similar to the (moderately likely) failure to develop AGI before WBE.
2Vladimir_Nesov
Together with Wei's point that OAI doesn't seem to help much, there is the downside that existence of OAI safety guidelines might make it harder to argue against pushing AGI in general. So on net it's plausible that this might be a bad idea, which argues for weighing this tradeoff more carefully.
2Stuart_Armstrong
Possibly. But in my experience even getting the AGI people to admit that there might be safety issues is over 90% of the battle.
0Vladimir_Nesov
It's useful for AGI researchers to notice that there are safety issues, but not useful for them to notice that there are "safety issues" which can be dealt with by following OAI guidelines. The latter kind of understanding might be worse than none at all, as it seemingly resolves the problem. So it's not clear to me that getting people to "admit that there might be safety issues" is in itself a worthwhile milestone.
2Vladimir_Nesov
Why do you say this is a disagreement? Who at SIAI thinks FAI is likely to be implementable on time (and why)? Right, assuming we can find any alternative avenues of comparable probability of success. I think it's unlikely for FAI to be implementable both "on time" (i.e. by humans in current society), and via alternative avenues (of which fast WBE humans seems the most plausible one, which argues for late WBE that's not hardware-limited, not pushing it now). This makes current research as valuable as alternative routes despite improbability of current research's success.
2Stuart_Armstrong
Let me rephrase: I think the expected gain from pursuing FAI is less that pursuing other methods. Other methods are less likely to work, but more likely to be implementable. I think SIAI disagrees with this accessment.
2Vladimir_Nesov
I assume that by "implementable" you mean that it's an actionable project, that might fail to "work", i.e. deliver the intended result. I don't see how "implementability" is a relevant characteristic. What matters is whether something works, i.e. succeeds. If you think that other methods are less likely to work, how are they of greater expected value? I probably parsed some of your terms incorrectly.
2Stuart_Armstrong
Whether the project reached the desired goal, versus whether that goal will actually work. If Nick and Eliezer both agreed about some design that "this is how you build a FAI", then I expect it will work. However, I don't think it's likely that would happen. It's more likely they will say "this is how you build a proper Oracle AI", but less likely the Oracle will end up being safe.
0Vladimir_Nesov
Okay, but I still don't understand how a project with lower probability of "actually working" can be of higher expected value. I'm referring to this statement: The argument you seem to be giving in support of higher expected value of other methods is that they are "more likely to be implementable" (a project reaching its stated goal, even if that goal turns out to be no good), but I don't see how is that an interesting property.
0[anonymous]
He didn't say other architectures would be no good, he said they're less likely to be safe. He thinks the distribution P(Outcome | do(complete Oracle AI project)) isn't as highly peaked at Weirdtopia as P(outcome | do(complete FAI)); Oracle AI puts more weight on regions like "Lifeless universe", "Eternal Torture", "Rainbows and Slow Death", and "Failed Utopia". However, "Complete FAI" isn't an actionable procedure, so he examines the chance of completion conditional on different actions he can take. "Not worth pursuing because non-implementable" means that available FAI supporting actions don't have a reasonable chance of producing friendly AI, which discounts the peak in the conditional outcome distribution at valuable futures relative to do(complete FAI). And supposedly he has some other available oracle AI supporting strategy which fares better. Eating a sandwich isn't as cool as building an interstellar society with wormholes for transportation, but I'm still going to make a sandwich for lunch, because it's going to work and maybe be okay-ish.
0[anonymous]
What do you mean to be the distinction between these?
2Wei Dai
Where can we read FHI's analysis of AI risk? Why are they not as worried as you and SIAI people? Has there ever been a debate between FHI and SIAI on this? What threats are they most worried about? What technologies do they want to push or slow down?
3Stuart_Armstrong
AI is high on the list - one of the top risks, even if their objective assessment is lower than SIAI. Nuclear war, synthetic biology, nanotech, pandemics, social collapse: these are the other ones we're looking it.
3Stuart_Armstrong
Basically they don't buy the "AI inevitably goes foom and inevitably takes over". They see definite probabilities of these happening, but their estimates are closer to 50% than to 100%.
5TheOtherDave
They estimate it at 50%??? And there are other things they are more concerned about? What are those other things?
3Stuart_Armstrong
They estimate a variety of of conditional statements ("AI possible this century", "if AI then FOOM", "if FOOM then DOOM", etc...) with magnitudes between 20% and 80% (I had the figures somewhere, but can't find them). I think when it was all multiplied out it was in the 10-20% range. And I didn't say they thought other things were more worrying; just that AI wasn't the single overwhelming risk/reward factor that SIAI (and me) believe it to be.
0XiXiDu
A wild guess. FHI believes that the best what can reasonably be done about existential risks at this point in time is to do research into existential risks, including possible unknown unknowns, and into strategies to reduce current existential risks. This somewhat agrees with their FAQ: In other words, FHI seems to focus on meta issues, existential risks in general, rather than associated specifics.

I suggest adding some more meta questions to the list.

  • What improvements can we make to the way we go about answering strategy questions? For example, should we differentiate between "strategic insights" (such as Carl Shulman's insight that WBE-based Singletons may be feasible) and "keeping track of the big picture" (forming the overall strategy and updating it based on new insights and evidence), and aim to have people specialize in each, so that people deciding strategy won't be tempted to overweigh their own insights? Another exampl
... (read more)

"In what direction should we nudge the future, to maximize the chances and impact of a positive Singularity?"

Friendly AI is incredible hard to get right and a friendly AI that is not quite friendly could create a living hell for the rest of time, increasing negative utility dramatically.

I vote for antinatalism. It should be seriously considered to create a true paperclip maximizer that transforms the universe into an inanimate state devoid of suffering. Friendly AI is simply too risky.

I think that humans are not psychological equal. Not only a... (read more)

a friendly AI that is not quite friendly could create a living hell for the rest of time, increasing negative utility dramatically

"Ladies and gentlemen, I believe this machine could create a living hell for the rest of time..."

(audience yawns, people look at their watches)

"...increasing negative utility dramatically!"

(shocked gasps, audience riots)

-2XiXiDu
Do you actually disagree with anything or are you just trying to ridicule it? Do you think that the possibility that FAI research might increase negative utility is not to be taken seriously? Do you think that world states where faulty FAI designs are implemented have on average higher utility than world states where nobody is alive? If so, what research could I possible do to come to the same conclusion? What arguments do I miss? Do I just have to think about it longer? Consider the way Eliezer Yudkowsky agrues in favor of FAI research: or Is his style of argumentation any different from mine except that he promises lots of positive utility?

I was just amused by the anticlimacticness of the quoted sentence (or maybe by how it would be anticlimactic anywhere else but here), the way it explains why a living hell for the rest of time is a bad thing by associating it with something so abstract as a dramatic increase in negative utility. That's all I meant by that.

It should be seriously considered to create a true paperclip maximizer that transforms the universe into an inanimate state devoid of suffering.

Have you considered the many ways something like that could go wrong?

  • The paperclip maximizer (PM) encounters an alien civilization and causes lots of suffering warring with it
  • PM decides there's a chance that it's in a simulation run by a sadistic being who will punish it (prevent it from making paperclips) unless it creates trillions of conscious beings and tortures them
  • PM is itself capable of suffering
  • PM decides to create lots of descendent AIs in order to maximize paperclip production and they happen to be capable of suffering. (Our genes made us to maximize copies of them and we happen to be capable of suffering.)
  • somebody steals PM's source code before it's launched, and makes a sadistic AI

From your perspective, wouldn't it be better to just build a really big bomb and blow up Earth? Or alternatively, if you want to minimize suffering throughout the universe and maybe throughout the multiverse (e.g., by acausal negotiation with superintelligences in other universes), instead of just our corner of the world, you'd have to solve a lot of the same problems as FAI.

1XiXiDu
I don't think that it is likely that it will encounter anything that has equal resources and if it does that suffering would occur (see below). That seems like one of the problems that have to be solved in order to build an AI that transforms the universe into an inanimate state. But I think it is much easier to make an AI not simulate any other agents than to create a friendly AI. Much more can go wrong by creating a friendly AI, including the possibility that it tortures trillions of beings. In the case of a transformer you just have to make sure that it values an universe that is as close as possible to a state where no computation takes place and that does not engage in any kind of trade, acausal or otherwise. I believe that any sort of morally significant suffering is an effect of (natural) evolution, and may in fact be dependent on that. I think that the kind of maximizer that SI has in mind is more akin to a transformation process that isn't consciousness, does not have emotions and cannot suffer. If those qualities would be necessary requirements then I don't think that we will build an artificial general intelligence any time soon and that if we do it will happen slowly and not be able to undergo dangerous recursive self-improvement. I think that this is more likely to be the case with friendly AI research because it takes longer.
0XiXiDu
The reason for why I think that working towards FAI might be a bad idea is that it increases the chance of something going horrible wrong. If I was to accept the framework of beliefs hold by SI then I would assign a low probability to the possibility that the default scenario in which an AI undergoes recursive self-improvement will include a lot of blackmailing that leads to a lot of suffering. Where the default is that nobody tries to make AI friendly. I believe that any failed attempt at friendly AI is much more likely to 1) engage in blackmailing 2) keep humans alive 3) fail in horrible ways: I think that working towards friendly AI will in most cases lead to negative utility scenarios that vastly outweigh the negative utility of an attempt that creating a simple transformer that turns the universe into an inanimate state. ETA Not sure why the graph looks so messed up. Does anyone know of a better graphing tool?
9Wei Dai
I think it's too early to decide this. There are many questions whose answers will become clearer before we have to make a choice one way or another. If eventually it becomes clear that building an antinatalist AI is the right thing to do, I think the best way to accomplish it would be through an organization that's like SIAI but isn't too attached to the idea of FAI and just wants to do whatever is best. Now you can either try to build an organization like that from scratch, or try to push SIAI in that direction (i.e., make it more strategic and less attached to a specific plan). Of course, being lazy, I'm more tempted to do the latter, but your miles may vary. :)
4lukeprog
Yes. I, for one, am ultimately concerned with doing whatever's best. I'm not wedded to doing FAI, and am certainly not wedded to doing 9-researchers-in-a-basement FAI.
6XiXiDu
Well, that's great. Still, there are quite a few problems. How do I know * ... that SI does not increase existential risk by solving problems that can be used to build AGI earlier? * ... that you won't launch a half-baked friendly AI that will turn the world into a hell? * ... that you don't implement some strategies that will do really bad things to some people, e.g. myself? Every time I see a video of one of you people I think, "Wow, those seem like really nice people. I am probably wrong. They are going to do the right thing." But seriously, is that enough? Can I trust a few people with the power to shape the whole universe? Can I trust them enough to actually give them money? Can I trust them enough with my life until the end of the universe? You can't even tell me what "best" or "right" or "winning" stands for. How do I know that it can be or will be defined in a way that those labels will apply to me as well? I have no idea what your plans are for the day when time runs out. I just hope that you are not going to hope for the best and run some not quite friendly AI that does really crappy things. I hope you consider the possibility of rather blowing everything up than risking even worse outcomes.
3lukeprog
Hell no. This is an open problem. See "How can we be sure a Friendly AI development team will be altruistic?" on my list of open problems.
2timtyler
Blowing everying up would be pretty bad. Bad enough to not encourage the possibility.
-1Vladimir_Nesov
"Would you murder a child, if it's the right thing to do?" If FAI is by definition a machine that does whatever is best, this distinction doesn't seem meaningful.
4Wei Dai
Ok, let me rephrase that to be clearer.
1Vladimir_Nesov
Do you think SingInst is too attached to a specific kind of FAI design? This isn't my impression. (Also, at this point, it might be useful to unpack "SingInst" into particular people constituting it.)
7Wei Dai
XiXiDu seems to think so. I guess I'm less certain but I didn't want to question that particular premise in my response to him. It does confuse me that Eliezer set his focus so early on CEV. I think "it's too early to decide this" applies to CEV just as well as XiXiDu's anti-natalist AI. Why not explore and keep all the plausible options open until the many strategically important questions become clearer? Why did it fall to someone outside SIAI (me, in particular) to write about the normative and meta-philosophical approaches to FAI? (Note that the former covers XiXiDu's idea as a special case.) Also concerning is that many criticisms have been directed at CEV but Eliezer seems to ignore most of them. I'd be surprised if there weren't people within SingInst who disagree with the focus on CEV, but if so, they seem reluctant to disagree in public so it's hard to tell who exactly, or how much say they have in what SingInst actually does. I guess this could all be due to PR considerations. Maybe Eliezer just wanted to focus public attention on CEV because it's the politically least objectionable FAI approach, and isn't really terribly attached to the idea when it comes to actually building an FAI. But you can see how an outsider might get that impression...
9Jayson_Virissimo
I always thought CEV was half-baked as a technical solution, but as a PR tactic it is...genius.
6Will_Newsome
Yeah, I thought it was explicitly intended more as a political manifesto than a philosophical treatise. I have no idea why so many smart people, like lukeprog, seem to be interpreting it not only as a philosophical basis but as outlining a technical solution.
5amcknight
Why do you think an unknown maximizer would be worse than a not quite friendly AI? Failed Utopia #4-2 sounds much better than a bunch of paperclips. Orgasmium sounds at least as good as paper clips.
4timtyler
Graphs make your case more convincing - even when they are drawn wrong and don't make sense! ...but seriously: where are you getting the figures in the first graph from? Are you one of these "negative utilittarians" - who thinks that any form of suffering is terrible?
2timtyler
You sound a bit fixated on doom :-( What do you make of the idea that the world has been consistently getting better for most of the last 3 billion years (give or take the occasional asteroid strike) - and that the progress is likely to continue?
-6SingularityUtopia

Currently you suspect that there are people, such as yourself, who have some chance of correctly judging whether arguments such as yours are correct, and of attempting to implement the implications if those arguments are correct, and of not implementing the implications if those arguments are not correct.

Do you think it would be possible to design an intelligence which could do this more reliably?

8steven0461
I don't get it. Design a Friendly AI that can better judge whether it's worth the risk of botching the design of a Friendly AI? ETA: I suppose your point applies to some of XiXiDu's concerns but not others?
4Vladimir_Nesov
A lens that sees its flaws.
3steven0461
I don't understand. Is the claim here that you can build a "decide whether the risk of botched Friendly AI is worth taking machine", and the risk of botching such a machine is much less than the risk of botching a Friendly AI?
9Vladimir_Nesov
A FAI that includes such "Should I run?" heuristic could pose a lesser risk than a FAI without such heuristic. If this heuristic works better than human judgment about running a FAI, it should be used instead of human judgment. This is the same principle as for AI's decisions themselves, where we don't ask AI's designers for object-level moral judgments, or encode specific object-level moral judgments into AI. Not running an AI would then be equivalent to hardcoding the decision "Should the AI run?" resolved by designers to "No." into the AI, instead of coding the question and letting the AI itself answer it (assuming we can expect it to answer the question more reliably than the programmers can).
5steven0461
If we botched the FAI, wouldn't we also probably have botched its ability to decide whether it should run?
2Vladimir_Nesov
Yes, and if it tosses a coin, it has 50% chance of being right. The question is calibration, how much trust should such measures buy compared to their absence, given what is known about given design.
0John_Maxwell
The machine Steve proposes might not bear as much risk of creating "living hell" by attempting to get the human utility function right, but missing in such a way that humans are still alive, just living very unpleasantly. To me, this seems by far the biggest of XiXiDu's concerns.
3XiXiDu
Here are a few premises: * Complex systems can fail in complex ways. * Destruction is easier than creation. * Expected utility maximization is rational and feasible. * We should be extremely conservative about not implementing a half-baked friendly AI. If you believe that self-improving AI is inevitable and that creating friendly AI is more difficult than creating unfriendly AI then to launch an AI that simply destroys everything as quickly as possible has a higher expected utility than doing nothing or trying to implement an AI that is not completely friendly. The negative utility of an AI that transforms the universe into an inanimate state is simply the positive utility of a perfectly friendly AI. But most outcomes are expected to yield unfriendly AI, or not quite friendly AI, which will actively increase negative utility by possibly keeping alive living beings indefinitely given abhorrent circumstances. To prevent those scenarios one can try to solve friendly AI, which will most likely fail (or even increase the chances of a negative singularity), or try to launch a destructive singleton with simple goals to prevent further suffering and the evolution of life elsewhere in the universe. Creating a destructive self-improving AI that kills all life should be the easiest of all possibilities with a high probability of success.
3MichaelAnissimov
Assuming your argument is correct, wouldn't it make more sense to blow ourselves up with nukes rather than pollute the universe with UFAI? There may be other intelligent civilizations out there leading worthwhile lives that we threaten unfairly by unleashing UFAI. I'm skeptical that friendly AI is as difficult as all that because, to take an example, humans are generally considered pretty "wicked" by traditional writers and armchair philosophers, but lately we haven't been murdering each other or deliberately going out of way to make each other's lives miserable very often. For instance, say I were invincible. I could theoretically stab everyone I meet without any consequences, but I doubt I would do that. And I'm just human. Goodness may seem mystical and amazingly complex from our current viewpoint, but is it really as complex as all that? There were a lot of things in history and science that seemed mystically complex but turned out to be formalizable in compressed ways, such as the mathematics of Darwinian population genetics. Who would have imagined that the "Secrets of Life and Creation" would be revealed like that? But they were. Could "sufficient goodness that we can be convinced the agent won't put us through hell" also have a compact description that was clearly tractable in retrospect?
3XiXiDu
There might be countless planets that are about to undergo an evolutionary arms race for the next few billions years resulting in a lot of suffering. It is very unlikely that there is a single source of life that is exactly on the right stage of evolution with exactly the right mind design to not only lead worthwhile lives but also get their AI technology exactly right to not turn everything into a living hell. In case you assign negative utility to suffering, which is likely to be universally accepted to have negative utility, then given that you are an expected utility maximizer it should be a serious consideration to end all life. Because 1) agents that are an effect of evolution have complex values 2) to satisfy complex values you need to meet complex circumstances 3) complex systems can fail in complex ways 4) any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways. To name just one example where things could go horrible wrong. Humans are by their very nature interested in domination and sex. Our aversion against sexual exploitation is largely dependent on the memeplex of our cultural and societal circumstances. If you knew more, were smarter and could think faster you might very well realize that such an aversion is a unnecessary remnant that you can easily extinguish to open up new pathways to gain utility. That Gandhi would not agree to have his brain modified into a baby-eater is incredible naive. Given the technology people will alter their preferences and personality. Many people actually perceive their moral reservations to be limiting. It only takes some amount of insight to just overcome such limitations. You simply can't be sure that future won't hold vast amounts of negative utility. It is much easier for things to go horrible wrong than to be barely acceptable. Maybe not, but betting on the possibility that goodness can be easily achieved is like pulling a random AI from mind design space hoping that it t
4timtyler
Similarly, it is easier to make piles of rubble than skyscrapers. Yet - amazingly - there are plenty of skyscrapers out there. Obviously something funny is going on...
2timtyler
Hang on, though. That's still normally better than not existing at all! Hell has to be at least bad enough for the folk in it to want to commit suicide for utility to count as "below zero". Most plausible futures just aren't likely to be that bad for the creatures in them.
-1XiXiDu
The present is already bad enough. There is more evil than good. You are more often worried than optimistic. You are more often hurt than happy. That's the case for most people. We just tend to remember the good moments more than the rest of our life. It is generally easier to arrive at bad world states than good world states. Because to satisfy complex values you need to meet complex circumstances. And even given simple values and goals, the laws of physics are grim and remorseless. In the end you're going to lose the fight against the general decay. Any temporary success is just a statistical fluke.
1timtyler
No, I'm not! Yet most creatures would rather live than die - and they show that by choosing to live. Dying is an option - they choose not to take it. It sounds as though by now there should be nothing left but dust and decay! Evidently something is wrong with this reasoning. Evolution produces marvellous wonders - as well as entropy. Your existence is an enormous statistical fluke - but you still exist. There's no need to be "down" about it.
0katydee
For some people, this is a solved problem.
-6timtyler
9Wei Dai
Earlier, you wrote Surely building an anti-natalist AI that turns the universe into inert matter would be considered unacceptable by most people. So I'm confused. Do you intend to denounce SIAI if they do seriously consider this strategy, and also if they don't?
-2XiXiDu
Yet I am not secretive about it and I believe that it is one of the less horrible strategies. Given that SI is strongly attached to decision theoretic ideas, which I believe are not the default outcome due to practically intractable problems, I fear that their strategies might turn out to be much worse than the default case. I think that it is naive to simply trust SI because they seem like nice people. Although I don't doubt that they are nice people. But I think that any niceness is easily drowned by their eagerness to take rationality to its logical extreme without noticing that they have reached a point where the consequences constitute a reductio ad absurdum. If game and decision theoretic conjectures show that you can maximize expected utility by torturing lots of people, or by voluntary walking into death camps, then that's the right thing to do. I don't think that they are psychopathic personalities per se though. Those people are simply hold captive by their idea of rationality. And that is what makes them extremely dangerous. I would denounce myself if I would seriously consider that strategy. But I would also admire them for doing so because I believe that it is the right thing to do given their own framework of beliefs. What they are doing right now seems just hypocritical. Researching FAI will almost certainly lead to worse outcomes than researching how to create an anti-natalist AI as soon possible. What I really believe is that there is not enough data to come to any definitive conclusion about the whole idea of a technological singularity and dangerous recursive self-improvement in particular and that it would be stupid to act on any conclusion that one could possible come up with at this point. I believe that SI/lesswrong mainly produces science fiction and interesting, although practically completely useless, though-experiments. The only danger I see is that some people associated with SI/lesswrong might run rampant once someone demonstrates ce
3Wei Dai
I agree with the "not enough data to come to any definitive conclusion" part, but think we could prepare for the Singularity by building an organization that is not attached to any particular plan but is ready to act when there is enough data to come to definitive conclusions (and tries to gather more data in the mean time). Do you agree with this, or do you think we should literally do nothing? I guess I have a higher opinion of SIAI than that. Just a few months ago you were saying: What made you change your mind since then?
1XiXiDu
I did not change my mind. All I am saying is that I wouldn't suggest anyone to contribute money to SI who fully believes what they believe. Because that would be counterproductive. If I accepted all of their ideas then I would make the same suggestion as you, to build "an organization that is not attached to any particular plan". But I do not share all of their beliefs. Particularly I do not currently believe that there is a strong case that uncontrollable recursive self-improvement is possible. And if it is possible I do not think that it is feasible. And even if it is feasible I believe that it won't happen any time soon. And if it will happen soon I do not think that SI will have anything to do with it. I believe that SI is an important organisation that deserves money. Although if I would share their idea of rationality and their technological optimism then the risks would outweigh the benefit. Why I believe SI deserves money: * It makes people think by confronting them with the logical consequences of state of the art ideas from the field of rationality. * It explores topics and fringe theories that are neglected or worthy of consideration. * It challenges the conventional foundations of charitable giving, causing organisations like GiveWell to reassess and possibly improve their position. * It creates a lot of exciting and fun content and dicussions. All in all I believe that SI will have a valuable influence. I believe that the world needs people and organisations that explore crazy ideas, that try to treat rare diseases in cute kittens and challenge conventional wisdom. And SI is such an organisation. Just like Roger Penrose and Stuart Hameroff. Just like all the creationists who caused evolutionary biologist to hone their arguments. SI will influence lots of fields and make people contemplate their beliefs. To fully understand why my criticism of SI and willingness to donate does not contradict, you also have to realize that I do not accept the us
-3XiXiDu
Before you throw more of what I wrote in the past at me: * I sometimes take different positions just to explore an argument, because it is fun to discuss and because I am curious what reactions I might provoke. * I don't have a firm opinion on many issues. * There are a lot of issues for which there are as many arguments that oppose a certain position as there are arguments that support it. * Most of what I write is not thought-out. I most often do not consciously contemplate what I write. * I find it very easy to argue for whatever position. * I don't really care too much about most issues but write as if I do, to evoke feedback. I just do it for fun. * I am sometimes not completely honest to exploit the karma system. Although I don't do that deliberately. * If I believe that SI/lesswrong could benefit from criticism I voice it if nobody else does. The above is just some quick and dirty introspection that might hint at the reason for some seemingly contradictionary statements. The real reasons are much more complex of course, but I haven't thought about that either :-) I just don't have the time right now to think hard about all the issues discussed here. I am still busy improving my education. At some point I will try to tackle the issues with due respect and in all seriousness.
6wedrifid
I have quoted everything XiXiDu said here so that it is not lost in any future edits. Many of XiXis contributions consist of persuasive denunciations. As he points out in the parent (and quoted below), often these are based off little research, without much contemplation and are done to provoke reactions rather than because they are correct. Since XiXiDu is rather experienced at this mode of communication - and the arguments he uses have been able to be selected for persuasiveness through trial and error - there is a risk that he will be taken more seriously than is warranted. The parent should be used to keep things in perspective when XiXiDu is rabble rousing.

That said, I think his fear of culpability (for being potentially passively involved in an existential catastrophe) is very real. I suspect he is continually driven, at a level beneath what anyone's remonstrations could easily affect, to try anything that might somehow succeed in removing all the culpability from him. This would be a double negative form of "something to protect": "something to not be culpable for failure to protect".

If this is true, then if you try to make him feel culpability for his communication acts as usual, this will only make his fear stronger and make him more desperate to find a way out, and make him even more willing to break normal conversational rules.

I don't think he has full introspective access to his decision calculus for how he should let his drive affect his communication practices or the resulting level of discourse. So his above explanations for why he argues the way he does are probably partly confabulated, to match an underlying constraining intuition of "whatever I did, it was less indefensible than the alternative".

(I feel like there has to be some kind of third alternative I'm missing here, that would derail t... (read more)

1wedrifid
I certainly wouldn't try to make him feel culpability. Or, for that matter, "try to make him" anything at all. I don't believe I have the ability to influence XiXi significantly and I don't believe it would be useful to try (any more). It is for this reason that I rather explicitly spoke in the third person to any prospective future readers that it may be appropriate to refer here in the future. Pretending that I was actually talking to XiXiDu when I was clearly speaking to others is would just be insulting to him. There are possible future cases (and plenty of past cases) where a reply to one of XiXiDu's fallacious denunciations that consists of simply a link here is more useful than ignoring the comment entirely and hoping that the damage done is minimal.
1XiXiDu
Show me just one.
-2XiXiDu
You could easily influence me with actual arguments.
0XiXiDu
What is your suggestion then? How do I get out? Delete all of my posts, comments and website like Roko? Seriously, if it wasn't for assholes like wedrifid I wouldn't even bother anymore and just quit. The grandparent was an attempt at honesty, trying to leave. Then that guy comes along claiming that most of my submissions consisted of "persuasive denunciations". Someone as him who does nothing else all the time. Someone who never argues for his case. ETA Ah fuck it all. I'll take another attempt and log out now and not get involved anymore. Happy self-adulation.
0Alsadius
If a denunciation is accurate, does it really matter what the source is? Sometimes, putting pin to balloon is its own reward.
0wedrifid
The rhetorical implication appears to be non-sequitur. Again. Please read more carefully.
0Alsadius
You're suggesting that he might be making arguments that are taken more seriously than they warrant. Unless an argument is based on incorrect facts, it should be taken exactly as seriously as it warrants on its own merits. Why does the source matter?
0wedrifid
Even if the audience is assumed to be perfect at evaluating evidence on it's merits then the source matters to the extent that the authority of the author and the authority of the presentation are considered evidence. Knowing how pieces of evidence were selected also gives information, so knowing about the can provide significant information. And the above assumption definitely doesn't hold - people are not perfect at evaluating evidence on it's merits. Considerations about how arguments optimized through trial error for persuasiveness become rather important when all recipients have known biases and you are actively trying to reduce the damage said biases cause. Finally, considerations about how active provocation may have an undesirable influence on the community are qualitatively different from considerations about whether a denunciation is accurate. Just because I evaluate XiXiDu's typical 'arguments' as terribly nonsensical thinking that does not mean I should be similarly dismissive of the potential damage that can be done by them, given the expressed intent and tactics. I can evaluate the threat that the quoted agenda has as significant even when I don't personally take the output of that agenda seriously at all.
-4XiXiDu
You might want to save this as well. Here is how I see it. I am just an uneducated below average IQ individual and don't spend more time on my submissions than it takes to write them. If people are swayed by my ramblings then how firm could their beliefs possible be in the first place? I could have as easily argued in favor of SI. If I was to start now and put some extra effort into it I believe I could actually become more persuasiveness than SI itself. Do you believe that in a world where I did that you would tell people that my arguments are based on little research and that there is a risk that I am taken more seriously than is warranted?
8[anonymous]
Don't self-deprecate too much. Have you taken a (somewhat recent) IQ test, say an online matrix test or the Mensa one? (If so, personal prediction.) Even though LW over-estimates its own IQ, don't forget how stupid IQ 100 really is.
5David Althaus
Don't be ridiculous.
0XiXiDu
Yesterday I took an IQ test suggested by muflax and scored 78.
2David Althaus
Yeah, I took it too and scored 37 - because my eyes were closed. Do you really believe that you're dumber than 90% of all people? (~ IQ of 78; I suppose the SD was 15) Seriously, do you know just how stupid most humans are? I deny the data.
0wedrifid
For sure. XiXiDu uses grammar correctly! (Well, enough so that "become more persuasiveness" struck me as an editing error rather than typical.) If someone uses grammar correctly it is an overwhelmingly strong indicator that either they are significantly educated (self or otherwise) or have enough intelligence to compensate!
2MichaelAnissimov
Given all these facts, it's pretty hard to take what you say seriously...
8Kaj_Sotala
As pessimistic as this sounds, I'm not sure if I actually disagree with any of it.
5David_Gerard
Has anyone constructed even a vaguely plausible outline, let alone a definition, of what would constitute a "human-friendly intelligence", defined in terms other than effects you don't want it to have? As you note, humans aren't human-friendly intelligences, or we wouldn't have internal existential risk. The CEV proposal seems to attempt to move the hard bit to technological magic (a superintelligence scanning human brains and working out a solution to human desires that is possible, is coherent and won't destroy us all) - this is saying "then a miracle occurs" in more words.
2John_Maxwell
It's possible that particular humans might approximate human friendly intelligences.
0David_Gerard
Assuming it's not impossible, how would you know? What constitutes a human-friendly intelligence, in other than negative terms?
-4timtyler
Er, that's how it is defined - at least by Yudkowsky. You want to argue definitions? Without even offering one of your own? How will that help?
-3David_Gerard
No, I'm pointing out that a purely negative definition isn't actually a useful definition that describes the thing the label is supposed to be pointing at. How does one work toward a negative? We can say a few things it isn't - what is it?
0timtyler
Yudkowsky says: That isn't a "purely negative" definition in the first place. Even if it was - would you object to the definition of "hole" on similar grounds? What exactly is wrong with defining some things in terms of what they are not? It I say a "safe car" is one that doesn't kill or hurt people, that seems just fine to me.
2David_Gerard
The word "artificial" there makes it look like it means more than it does. And humans are just as made of atoms. Let's try it without that: It's only described in terms of its effects, and then only vaguely. We have no idea what it would actually be. The CEV plan doesn't include what it would actually be, it just includes a technological magic step where it's worked out. This may be better than nothing, but it's not enough to say it's talking about anything that's actually understood in even the vaguest terms. For an analogy, what would a gorilla-friendly human-level intelligence be like? How would you reasonably make sure it wasn't harmful to the future of gorillas? (Humans out the box do pretty badly at this.) What steps would the human take to ascertain the CEV of gorillas, assuming tremendous technological resources?
0timtyler
We can't answer the "how can you do this?" questions today. If we could we would be done. It's true that CEV is an 8-year old, moon-onna-stick wishlist - apparently created without much thought about to how to implement it. C'est la vie.
1John_Maxwell
Interesting thoughts. It seems like an attempt at Oracle AI, which simply strives to answer all questions accurately while otherwise exerting as little influence on the world as possible, would be strictly better than a paperclip maximizer, no? At the very least you wouldn't see any of the risks of "almost friendly AI". You might see some humans getting power over other humans, but to be honest I don't think that would be worse than humans existing, period. Keep in mind that historically, the humans that were put in power over others were the ones who had the ruthlessness necessary to get to the top – they might not be representative. Can you name any female evil dictators?
0lukeprog
[ignore; was off-topic]
-1Will_Newsome
Do you think that it is possible to build an AI that does the moral thing even without being directly contingent on human preferences? Conditional on its possibility, do you think we should attempt to create such an AI? I share your trepidation about humans and their values, but I see that as implying that we have to be meta enough such that even if humans are wrong, our AI will still do what is right. It seems to me that this is still a real possibility. For an example of an FAI architecture that is more in this direction, check out CFAI.
3XiXiDu
No. I believe that it is practically impossible to systematically and consistently assign utility to world states. I believe that utility can not even be grounded and therefore defined. I don't think that there exists anything like "human preferences" and therefore human utility functions, apart from purely theoretical highly complex and therefore computationally intractable approximations. I don't think that there is anything like a "self" that can be used to define what constitutes a human being, not practically anyway. I don't believe that it is practically possible to decide what is morally right and wrong in the long term, not even for a superintelligence. I believe that stable goals are impossible and that any attempt at extrapolating the volition of people will alter it. Besides I believe that we won't be able to figure out any of the following in time: * The nature of consciousness and its moral significance. * The relation and moral significance of suffering/pain/fun/happiness. I further believe that the following problems are impossible to solve, respectively constitute a reductio ad absurdum of certain ideas: * Utility monsters * Pascal’s Mugging * The Lifespan Dilemma
2timtyler
Strange stuff. Surely "right" and "wrong" make the most sense in the context of a specified moral system. If you are using those terms outside such a context, it usually implies some kind of moral realism - in which case, one wonders what sort of moral realism you have in mind.

Link: In this ongoing thread, Wei Dai and I discuss the merits of pre-WBE vs. post-WBE decision theory/FAI research.

What could an FAI project look like? Louie points out that it might look like Princeton's Institute for Advanced Study:

Created as a haven for thinking, the Institute [for Advanced Study] remains for many the Shangri-la of academe: a playground for the scholarly superstars who become the Institute's permanent faculty. These positions carry no teaching duties, few administrative responsibilities, and high salaries, and so represent a pinnacle of academic advancement. The expectation is that given this freedom, the professors at the Institute will think the

... (read more)

But did the IAS actually succeed? Off-hand, the only thing I can think of them for was hosting Einstein in his crankish years, Kurt Godel before he want crazy, and Von Neumann's work on a real computer (which they disliked and wanted to get rid of). Richard Hamming, who might know, said:

When you are famous it is hard to work on small problems. This is what did Shannon in. After information theory, what do you do for an encore? The great scientists often make this error. They fail to continue to plant the little acorns from which the mighty oak trees grow. They try to get the big thing right off. And that isn't the way things go. So that is another reason why you find that when you get early recognition it seems to sterilize you. In fact I will give you my favorite quotation of many years. The Institute for Advanced Study in Princeton, in my opinion, has ruined more good scientists than any institution has created, judged by what they did before they came and judged by what they did after. Not that they weren't good afterwards, but they were superb before they got there and were only good afterwards.

(My own thought is to wonder if this is kind of a regression to the mean, or perhaps regression due to aging.)

5Wei Dai
How do you maintain secrecy in such a setting? Or is there a new line of thought that says secrecy isn't necessary for an FAI project?
1lukeprog
The person/people working on FAI there could work exclusively on the relatively safe problems, e.g. CEV.

Ok, I thought when you said "FAI project" you meant a project to build FAI. But I've noticed two problems with trying to do some of the relatively safe FAI-related problems in public:

  1. It's hard to predetermine whether a problem is safe, and hard to stop or slow down once research momentum gets going. For example I've become concerned that decision theory research may be dangerous, but I'm having trouble getting even myself to stop.
  2. All the problems, safe and unsafe, are interrelated, and people working on the (seemly) safe problems will naturally become interested in the (more obviously) unsafe ones as well and start thinking about them. (For example solving CEV seems to require understanding the nature of "preference", which leads to decision theory, and solving decision theory seems to require understanding the nature of logical uncertainty.) It seems very hard to prevent this or make all the researchers conscientious and security-conscious enough to not leak out or deliberately publish (e.g. to gain academic reputation) unsafe research results. Even if you pick the initial researchers to be especially conscientious and security-conscious, the problem will get worse as they publish results and other people become interested in their research areas.
4lukeprog
Yes, both Eliezer and I (and many others) agree with these points. Eliezer seems pretty set on only doing a basement-style FAI team, perhaps because he's thought about the situation longer and harder than I have. I'm still exploring to see whether there are strategic alternatives, or strategic tweaks. I'm hoping we can discuss this in more detail when my strategic analysis series gets there.
8Wei Dai
But it seems like SIAI has already deviated from the basement-style FAI plan, since it started supporting research associates who are allowed/encouraged to publish openly, and encouraging public FAI-related research in other ways (such as publishing a list of open problems). And if the "slippery slope" problems I described were already known, why didn't anyone bring them up during the discussions about whether to publish papers about UDT? (I myself only thought of them in the general explicit form yesterday.) If SIAI already knew about these problems but still thinks it's a good idea to promote public FAI-related research and publish papers about decision theory, then I'm even more confused than before. I hope your series "gets there" soon so I can see where the cause of the disagreement lies.
4lukeprog
What I'm saying is that there are costs and benefits to open FAI work. You listed some costs, but that doesn't mean there aren't also benefits. See, e.g. Vladimir's comment.
2Wei Dai
The benefits are only significant if there is a significant chance of successfully building FAI before some UFAI project takes off. Maybe our disagreement just boils down to different intuitions about that? But Nesov agrees this chance is "tiny" and still wants to push open research, so I'm still confused.
1Vladimir_Nesov
I want to make it bigger, as much as I can. It doesn't matter how small a chance of winning there is, as long as our actions improve it. Giving up doesn't seem like a strategy that leads to winning. The strategy of navigating the WBE transition (or some more speculative intelligence improvement tool) is a more complicated question, and I don't see in what way the background catastrophic risk matters for it. This also came up in a previous discussion about this we had: it's necessary to distinguish the risk within a given interval of years, and the eventual risk (i.e. the risk of never building a FAI). The same action can make immediate risk worse, but probability of eventually winning higher. I think encouraging an open effort for researching metaethics through decision theory is like that; also better acceptance of the problem might be leveraged to overturn the hypothetical increase in UFAI risk.
6Wei Dai
Yes, if we're talking about the overall chance of winning, but I was talking about the chance of winning through a specific scenario (directly building FAI). If the chance of that is tiny, why did your cost/benefit analysis of the proposed course of action (encouraging open FAI research) focus completely on it? Shouldn't we be thinking more about how the proposal affects other ways of winning? ETA: To spell it out, encouraging open FAI research decreases the probability that we win by winning the WBE race or through intelligence amplification, by increasing the probability that UFAI happens first. Nobody is saying "let's give up". If we don't encourage open FAI research, we can still push for a positive Singularity in other ways, some of which I've posted about recently in discussion. What do you mean? What aren't you seeing? Yes, of course. I am talking about the probability of eventually winning.
0Vladimir_Nesov
(Another thread of this conversation is here.) I see, I'm guessing you view the "second round" (post-WBE/human intelligence improvement) as not being similarly unlikely to eventually win. I agree that if the first round (working on FAI now, pre-WBE) has only a tiny chance of winning, while the second has a non-tiny chance (taking into account the probability of no catastrophe till the second round and it being dominated by a FAI project rather than random AGI), then it's better to sacrifice the first round to make the second round healthier. But I also only see a tiny chance of winning the second round, mostly because of the increasing UFAI risk and the difficulty of winning a race that grants you the advantages of the second round, rather than producing an UFAI really fast.
0Will_Newsome
Near/Far. Long-term effects aren't predictable and shouldn't be traded for more predictable short-term losses. In my experience it fails the Predictable Retrospective Stupidity test. Even when you try to factor in structural uncertainty, you still end up getting burned. And even if you still want to make such a tradeoff then you should halt all research until you've come to agreement or a natural stopping point with Wei Dai or others who have reservations. Stop, melt, catch fire, don't destroy the world. (Disclaimer: This comment is fueled by a strong emotional reaction due to contingent personal details that might or might not upon further reflection deserve to be treated as substantial evidence for the policy I recommend.)
5Vladimir_Nesov
Just to make clear what specific idea this is about: Wei points out that researching FAI might increase UFAI risk, and suggests that therefore FAI shouldn't be researched. My reply is to the effect that while FAI research might increase UFAI risk within any given number of years, it also decreases the risk of never solving FAI (which IIRC I put at something like 95% if we research it pre-WBE, and 97% if we don't).
2wedrifid
When I have analyzed this problem previously my reasoning matched that listed by Nesov here.
1lukeprog
Yeah, we'll come back to this in the strategy series. There are lots of details to consider.
4Vladimir_Nesov
There seems to be a tradeoff here. An open project has more chances to develop the necessary theory faster, but having such project in the open looks like a clearly bad idea towards the endgame. So on one hand, an open project shouldn't be cultivated (and becomes harder to hinder) as we get closer to the endgame, but on the other, a closed project will probably not get off the ground, and fueling it by an initial open effort is one way to make it stronger. So there's probably some optimal point to stop encouraging open development, and given the current state of the theory (nil) I believe the time hasn't come yet. The open effort could help the subsequent closed project in two related ways: gauge the point where the understanding of what to actually do in the closed project is sufficiently clear (for some sense of "sufficiently"), and form enough of background theory to be able to convince enough young Conways (with necessary training) to work on the problem on the closed stage.
4Wei Dai
Your argument seems premised on the assumption that there will be an endgame. If we assume some large probability that we end up deciding not to have an endgame at all (i.e., not to try to actually build FAI with unenhanced humans), then it's no longer clear "the time hasn't come yet". Even if we assume that with probability ~1 there will be an effort to directly build FAI, given the slippery slope effects we have to stop encouraging open research well before the closed project starts. The main deciding factors for "when" must be how large the open research community has gotten, how strong the slippery slope effects are, and how much "pull" SingInst has against those effects. The "current state of the theory" seems to have little to do with it. (Edit: No that's too strong. Let me amend it to "one consideration among many".)
4Vladimir_Nesov
This is something we'll know better further down the road, so as long as it's possible to defer this decision (i.e. while the downside is not too great, however that should be estimated), it's the right thing to do. I still can't rule out that there might be a preference definition procedure (that refers to humans) simple enough to be implemented pre-WBE, and decision theory seems to be an attack on this possibility (clarifying why this is naive, for example, in which case it'll also serve as an argument to the powerful in the WBE race). Well, maybe not specifically current, but what can be expected eventually, for the closed project to benefit from, which does seem to me like a major consideration in the possibility of its success.
3Will_Newsome
I'm confused as to what you have in mind when you're thinking of work on CEV. Do you mean things like getting a better model of the philosophy of reflective consistency, or studying mechanism design to find algorithms for relatively fair aggregation, or looking into neuroscience to see how beliefs and preferences are encoded, or...? Is there perhaps a post I missed or am forgetting?

Which open problems are safe to discuss, and which are potentially dangerous

This one seems particularly interesting, especially as it seems to apply to itself. Due to the attention hazard problem, coming up with a "list of things you're not allowed to discuss" sounds like a bad idea. But what's the alternative? Yeuugh.