Series: How to Purchase AI Risk Reduction
A key part of SI's strategy for AI risk reduction is to build toward hosting a Friendly AI development team at the Singularity Institute.
I don't take it to be obvious that an SI-hosted FAI team is the correct path toward the endgame of humanity "winning." That is a matter for much strategic research and debate.
Either way, I think that building toward an FAI team is good for AI risk reduction, even if we decide (later) that an SI-hosted FAI team is not the best thing to do. Why is this so?
Building toward an SI-hosted FAI team means:
- Growing SI into a tighter, larger, and more effective organization in general.
- Attracting and creating people who are trustworthy, altruistic, hard-working, highly capable, extremely intelligent, and deeply concerned about AI risk. (We'll call these people "superhero mathematicians.")
Both (1) and (2) are useful for AI risk reduction even if an SI-hosted FAI team turns out not to be the best strategy.
This is because: Achieving part (1) would make SI more effective at whatever it is doing to reduce AI risk, and achieving part (2) would bring great human resources to the cause of AI risk reduction, which will be useful to a wide range of purposes (FAI team or otherwise).
So, how do we accomplish both these things?
Growing SI into a better organization
Like many (most?) non-profits with less than $1m/yr in funding, SI has had difficulty attracting the top-level executive talent often required to build a highly efficient and effective organization. Luckily, we have made rapid progress on this front in the past 9 months. For example we now have (1) a comprehensive donor database, (2) a strategic plan, (3) a team of remote contractors used to more efficiently complete large and varied projects requiring many different skillsets, (4) an increasingly "best practices" implementation of central management, (5) an office we actually use to work together on projects, and many other improvements.
What else can SI do to become a tighter, larger, and more effective organization?
- Hire a professional bookkeeper, implement additional bookkeeping and accounting best practices. (Currently underway.)
- Create a more navigable and up-to-date website. (Currently underway.)
- Improve our fundraising strategy, e.g. by creating a deck of slides for major donors which explains what we're doing and what we can do with more funding. (Currently underway.)
- Create standard policy documents that lower our risk of being distracted by an IRS audit. (Currently underway.)
- Shift the Singularity Summit toward being more directly useful for AI risk reduction, and also toward greater profitability—so that we have at least one funding source that is not donations. (Currently underway.)
- Spin off the Center for Applied Rationality so that SI is more solely focused on AI safety. (Currently underway.)
- Build a fundraising/investment-focused Board of Trustees (ala IAS or SU) in addition to our Board of Directors and Board of Advisors.
- Create an endowment to ensure ongoing funding for core researchers.
- Consult with the most relevant university department heads and experienced principal investigators (e.g. at IAS and Santa Fe) about how to start and run an effective team for advanced technical research.
- Do the things recommended by these experts (that are relevant to SI's mission).
They key point, of course, is that all these things cost money. They may be "boring," but they are incredibly important.
Attracting and creating superhero mathematicians
The kind of people we'd need for an FAI team are:
- Highly intelligent, and especially skilled in maths, probably at the IMO medal-winning level. (FAI team members will need to create lots of new math during the course of the FAI research initiative.)
- Trustworthy. (Most FAI work is not "Friendliness theory" but instead AI architectures work that could be made more dangerous if released to a wider community that is less concerned with AI safety.)
- Altruistic. (Since the fate of humanity may be in their hands, they need to be robustly altruistic.)
- Hard-working, determined. (FAI is a very difficult research problem and will require lots of hard work and also an attitude of "shut up and do the impossible.")
- Deeply committed to AI risk reduction. (It would be risky to have people who could be pulled off the team—with all their potentially dangerous knowledge—by offers from hedge funds or Google.)
- Unusually rational. (To avoid philosophical confusions, to promote general effectiveness and group cohesion, and more.)
There are other criteria, too, but those are some of the biggest.
We can attract some of the people meeting these criteria by using the methods described in Reaching young math/compsci talent. The trouble is that the number of people on Earth who qualify may be very close to 0 (especially given the "committed to AI risk reduction" criterion).
Thus, we'll need to create some superhero mathematicians.
Math ability seems to be even more "fixed" than the other criteria, so a (very rough) strategy for creating superhero mathematicians might look like this:
- Find people with the required level of math ability.
- Train them on AI risk and rationality.
- Focus on the few who become deeply committed to AI risk reduction and rationality.
- Select from among those people the ones who are most altruistic, trustworthy, hard-working, and determined. (Some training may be possible for these features, too.)
- Try them out for 3 months and select the best few candidates for the FAI team.
All these steps, too, cost money.
I believe Vladimir is thinking in terms of a general theory which could, say, take an arbitrary computational state-machine, interpret it as a decision-theoretic agent, and deduce the "state-machine it would want to be", according to its "values", where the phrases in quotes represent imprecise or even misleading designations for rigorous concepts yet to be identified. This would be a form of the long-sought "reflective decision theory" that gets talked about.
From this perspective, the coherent extrapolation of human volition is a matter of reconstructing the human state machine through first-principles physical and computational analysis of the human brain, identifying what type of agent it is, and reflectively idealizing it according to its type and its traits. (An examples of type-and-traits analysis would be 1) identifying an agent as an expected-utility maximizer - that's its "type" - 2) identifying its specific utility function - that's a "trait". But the cognitive architecture underlying human decision-making is expected to be a lot more complicated to specify.)
So the paradigm really is one in which one hopes to skip over all the piecemeal ideas and empirical analysis that cognitive scientists have produced, by coming up with an analytical and extrapolative method of perfect rigor and great generality. In my opinion, people trying to develop this perfect a-priori method can still derive inspiration and knowledge from science that has already been done. But the idea is not "we can neglect existing science because our team will be smarter", the idea is that a universal method - in the spirit of Solomonoff induction, but tractable - can be identified, which will then allow the problem to be solved with a minimum of prior knowledge.
From an outside view, such a plan seems unlikely to succeed. Science moves forward by data, engineering moves forward by trying things out. This is just intuition though, I would guess there is a reasonable amount of empirical evidence to be gained by looking at theoretical work and seeing how often it runs awry of unexpected facts about the world (I'm embarrassingly unsure of what the answer would be here; added to my list of things to try to figure out).