It strikes me that this is the wrong way to look at the issue.
The problem scenario is if someone, anywhere, develops a powerful AGI that isn't safe for humanity. How do you stop the invention and proliferation of an unsafe technology? Well, you can either try to prevent anybody from building an AI without authorization; or you can try to make your own powerful friendly AGI before anybody else gets unfriendly AGI. The latter has the advantage that you only have to be really good at technology, you don't have to enforce an unenforceable worldwide law.
Building an AI that doesn't want to get out of its box doesn't solve the problem that somewhere, somebody may build an AI that does want to get out of its box.
I think that a is just a special case of a narrow AI.
Like, GAI is dangerous because it can do anything, and would probably ruin this section of the universe for us if its goals were misaligned with ours.
I'm not sure if GAI is needed to do highly domain-specific tasks like a.
It might be worth noting that I often phrase questions as "how would we design an FAI to think about that" not because I want to build an FAI, but because I want the answer to some philosophical question for myself, and phrasing it in terms of FAI seems to be (1) an extremely productive way of framing the problem, and (2) generates interest among those who have good philosophy skills and are already interested in FAI.
ETA: Even if we don't build an FAI, eventually humanity might have god-like powers, and we'd need to solve those problems to figure out what we want to do.
If you figured out artificial general intelligence that is capable of explosive recursive self-improvement and know how to achieve goal-stability and know how to constrain it then you ought to concentrate on taking over the universe because of the multiple discovery hypothesis and that you can't expect other humans to be friendly.
I'm not sure about the Riemann hypothesis since there's a likely chance that RH is undecidable in ZFC. But this might be more safe if one adds a time limit to when one wants the answer by.
But simply in terms of specification I agree that formalizing "don't get out of your box" is probably easier than formalizing what all of humanity wants.
making AI friendly requires solving two problems
The goal is not to "make an AI friendly" (non-lethal), it's to make a Friendly AI. That is, not to make some powerful agent that doesn't kill you (and does something useful), but make an agent that can be trusted with autonomously building the future. For example, a merely non-lethal AI won't help with preventing UFAI risks.
So it's possible that some kind of Oracle AI can be built, but so what? And the risk of unknown unknowns remains, so it's probably a bad idea even if it looks provably safe.
The primary task that EY and SIAI have in mind for Friendly AI is "take over the world". (By the way, I think this is utterly foolish, exactly the sort of appealing paradox (like "warring for peace") that can nerd-snipe the best of us.)
To some extent technolology itself (lithography, for example) is actually Safe technology, (or BelievedSafe technology). As part of the development of the technology, we also develop the safety procedures around it. The questions and problems about "how should you correctly draw up a contract with th...
The Riemann hypothesis seems like a special case, since it's a purely mathematical proposition. A real world problem is more likely to require Eliezer's brand of FAI.
Also, I believe solving FAI requires solving a problem not on your list, namely that of solving GAI. :-)
If you disagree that (a) looks easier than (b), congratulations, you've been successfully brainwashed by Eliezer :-)
This was supposed to be humour, right?
2) Invent a way to code an AI so that it's mathematically guaranteed not to change its goals after many cycles of self-improvement, negotiations etc. TDT is intended as a step toward that.
Only superficially. It would be possible to create an AI with said properties with CDT.
To put the question sharply, which of the following looks easier to formalize:
a) Please output a proof of the Riemann hypothesis, and please don't get out of your box along the way.
b) Please do whatever the CEV of humanity wants.
The difficulty level seems on the same order of magnitude.
According to Eliezer, making AI safe requires solving two problems:
1) Formalize a utility function whose fulfillment would constitute "good" to us. CEV is intended as a step toward that.
2) Invent a way to code an AI so that it's mathematically guaranteed not to change its goals after many cycles of self-improvement, negotiations etc. TDT is intended as a step toward that.
It is obvious to me that (2) must be solved, but I'm not sure about (1). The problem in (1) is that we're asked to formalize a whole lot of things that don't look like they should be necessary. If the AI is tasked with building a faster and more efficient airplane, does it really need to understand that humans don't like to be bored?
To put the question sharply, which of the following looks easier to formalize:
a) Please output a proof of the Riemann hypothesis, and please don't get out of your box along the way.
b) Please do whatever the CEV of humanity wants.
Note that I'm not asking if (a) is easy in absolute terms, only if it's easier than (b). If you disagree that (a) looks easier than (b), why?