GiveWell recently release notes from their interview with Jaan Tallinn, Skype co-founder and a major SIAI donor, about SIAI (link). Holden Karnofsky says 

[M]y key high-level takeaways are that

  1. I appreciated Jaan's thoughtfulness and willingness to engage in depth. It was an interesting exchange.
  2. I continue to disagree with the way that SIAI is thinking about the "Friendliness" problem. 
  3. It seems to me that all the ways in which Jaan and I disagree on this topic have more to do with philosophy (how to quantify uncertainty; how to deal with conjunctions; how to act in consideration of low probabilities) and with social science-type intuitions (how would people likely use a particular sort of AI) than with computer science or programming (what properties has software usually had historically; which of these properties become incoherent/hard to imagine when applied to AGI)
New Comment
8 comments, sorted by Click to highlight new comments since:

I continue to be impressed by Holden's thoughtfulness and rigor. If people want charity evaluators to start rating the effectiveness of x-risk-reducing organizations, then those organizations need to do a better job with (1) basic org effectiveness and transparency - publishing a strategic plan, whistleblower policy, and the stuff Charity Navigator expects - and with (2) making the case for the utility of x-risk reduction more clearly and thoroughly.

Luckily I am currently helping the Singularity Institute with both projects, and there are better reasons to do (1) and (2) than 'looking good to charity evaluators'. That is a side benefit, though.

[-]Rain70

Rather than issues of philosophy or social science intuitions, I think the problems remain in the realm of concrete action... however, there's too many assumptions left unstated from both sides to unpack them in that debate, and the focus became too narrow.

Channeling XiXiDu, someone really needs to create a formalization of the Friendly AI problem, so that these sorts of debates don't continue along the same lines, where they talk past each other so often.

This bit (from Karnofsky):

I feel like once we basically understand how the human predictive algorithm works, it may not be possible to improve on that algorithm (without massive and time-costly experimentation) no matter what the level of intelligence of the entity trying to improve on it. (The reason I gave: The human one has been developed by trial-and-error over millions of years in the real world, a method that won't be available to the GMAGI. So there's no guarantee that a greater intelligence could find a way to improve this algorithm without such extended trial-and-error)

...is probably not right. Nobody really knows how tough this problem will prove to be once we stop being able to crib from the human solution - and it is possible that progress will get tougher. However, much of the progress on the problem has not been obviously based on reverse-engineering the human prediction algorithm in the first place. Also machine prediction capabilities already far exceed human ones in some domains - e.g. chess, the weather.

Anyway, this problem makes little difference either way. Machines don't have the human pelvis to contend with, and won't be limited to running at 200 Hz.

These ideas might inform the exchange:

  • The point about hidden complexity of wishes applies fully to specifying the fact that needs to be predicted. Such a wish is still very hard to formalize.
  • If Oracle AI is used to construct complex designs, it needs to be more than a predictor, for the space of possible designs is too big, for example the designs need to be understandable by people who read them. (This is much less of a problem if the predictor just makes up a probability.) If it's not just a predictor, it needs a clear enough specification of what parameters it's optimizing its output for.
  • What does the AI predict, for each possible answer? It predicts the consequences of having produced a particular answer, and then it argmaxes over possible answers. In other words, it's not a predictor at all, it's a full-blown consequentialist agent.
  • A greedy/unethical person scenario is not relevant for two reasons: (1) it's not apparent that an AI can be built that gives significant power, for the hidden complexity of wishes reasons, and (2) if someone has taken over the world, the problem is still the same: what's next, and how to avoid destroying the world?
  • It's not clear in what way powerful humans/narrow AI teams "make SIAI's work moot". Controlling the world doesn't give insight about what to do with it, or guard from fatal mistakes.

It's not clear in what way powerful humans/narrow AI teams "make SIAI's work moot". Controlling the world doesn't give insight about what to do with it, or guard from fatal mistakes.

I think Holden is making the point that the work SIAI is trying to do (i.e. sort out all the issues of how to make FAI) might be so much easier to do in the future with the help of advanced narrow AI that it's not really worth investing a lot into trying to do it now.

Note: for anyone else who'd been wondering about Eliezer's position on Oracle AI, see here.

  • A greedy/unethical person scenario is not relevant for two reasons:

...

(1) it's not apparent that an AI can be built that gives significant power, for the hidden complexity of wishes reasons

A powerful machine couldn't give a human "significant power"?!? Wouldn't Page and Brin be counter-examples?

(2) if someone has taken over the world, the problem is still the same: what's next, and how to avoid destroying the world?

One problem with an unethical ruler is that they might trash some fraction in the world in the process of rising to power. For those who get trashed, what the ruler does afterwards may be a problem they are not around to worry about.

  • If Oracle AI is used to construct complex designs, it needs to be more than a predictor, for the space of possible designs is too big, for example the designs need to be understandable by people who read them.

You mean you can't think of scenarios where an Oracle prints out complex human-readable designs? How about you put the Oracle into a virtual world where it observes a plan to steal those kinds of design, and then ask it what it will observe next - as the stolen plans are about to be presented to it?

Holden seems to assume that GMAGI has access to predictive algorithms for all possible questions--this seems to me to be unlikely (say, 1% chance), compared to the possibility that it has the ability to write novel code for new problems. If it writes novel code and runs it, it must have some algorithm for how that code is written and what resources are used to implement it--limiting that seems like the domain of SIAI research.

Holden explicitly states:

All I know is that you think prediction_function() is where the risk is. I don't understand this position, because prediction_function() need not be a self-improving function or a "maximizing" function; it can just be an implementation of the human version with fewer distractions and better hardware.

i.e., that he believes that all novel questions will have algorithms already implemented, which seems to me to be clearly his weakest assumption, if he is assuming that GMAGI is non-narrow.

I would bet that development teams would tap into budding AGI in whatever ways they could that were clearly "safe," such as asking it for ways to improve itself and considering them

I thought the whole danger of a black box scenario is that humans may be unable to successfully screen unsafe improvements?

These seem like serious weaknesses in his PoV to me.

However his points that

a) a narrow philosophy-AI could outdo SIAI in the arena of FAI (in which case identifying the problem of FAI is 99% of SIAI's value and is already accomplished) b) FAI research may not be used for whatever reason by whatever teams DO develop AGI c) Something Weird Happens

Seem like very strong points diminishing the value of SIAI.

I can sympathize with his position that as an advocate of efficient charity he should be focused on promoting actions of charitable actors which he has a high level of certainty will be significantly more efficient, and that maintaining the mindset in himself that highly reliable charities are preferable to highly valuable charities helps him fulfill the social role he is in. That is, he should be very averse to a scenario in which he recommends a charity which turns out to be less efficient than charities he is recommending it over. The value of SIAI does not seem to me to be totally overdetermined.

In conclusion, I have some updating to do, but I don't know in which direction it is. And I absolutely love reading serious well thought out conversations by intelligent people about important subjects.