So here are some more problems I have:
UFAI isn't necessarily about deception. You also have to worry that the AI will perform its assigned task in a way inimical to human values, that jumps through constraints intended to prevent this, through sheer ingenuity... Suppose the AI is designed to do X, something that human beings want, but that humans also care about Y and Z. And suppose the AI isn't designed to intrinsically respect Y and Z. Instead there are constraints C that it knows about, the violation of which is also monitored by human beings, and these constraints are supposed to protect values Y and Z from violation. You have to worry that the AI will achieve X in a way which satisfies C but still violates Y and Z.
Auditing has the potential to slow down the AI - the AI may be paused regularly for forensic analysis and/or it may go slow in order to satisfy the safety constraints. Audited AI projects may be overtaken by others with a different methodology.
You want humans to "take us through the singularity". But we aren't through the singularity until superhuman intelligence exists. Is your plan, therefore, to suppress development of superhuman AI, until there are humans with superhumanly augmented intelligence? Do you plan to audit their development as well?
I am not opposed to the auditing concept, for AI or for augmented humans, but eventually one must directly answer the question, what is the design of a trustworthy superintelligence, in terms that make no reference to human supervision.
UFAI isn't necessarily about deception.
Oracle / tool AI is. The usual premise is that questions are asked to the superhuman AI, and responses only implemented if they are comprehensible, sane, and morally acceptable. Your example of satisfies C but still violates Y and Z would be picked up by the human oversight (or, the output is too complicated to be understood, and is shelved). Blindly following the AI's directives is a failure mode the oracle AI path is meant to avoid. Further, search processes do not happen across solutions which are seemingly ok b...
Summary: I do not understand why MIRI hasn’t produced a non-technical (pamphlet/blog post/video) to persuade people that UFAI is a serious concern. Creating and distributing this document should be MIRI’s top priority.
If you want to make sure the first AGI is FAI, one way to do so is to be the first to create an AI, and ensure it is FAI. Another is to persuade people that UFAI is a legitimate concern, and do so in large numbers. Ideally this would become a real concern, so nobody runs into the trap of Eliezer1999ish of “I’m going to build an AI and see how it works”.
1) is tough for an organisation of MIRI’s size. 2) is a realistic goal. It benefits from:
Funding: MIRI’s funding almost certainly goes up if more people are concerned with AI x-risk. Ditto FHI.
Scalability: If MIRI has a new math finding, that's one new theorem. If MIRI creates a convincing demonstration that we have to worry about AI, spreading this message to a million people is plausible.
Partial goal completion: making a math breakthrough that reduces the time to AI might be counter-productive. Persuading an additional person of the dangers of UFAI raises the sanity waterline.
Task difficulty: creating an AI is hard. Persuading people that “UFAI is a possible extinction risk. Take it seriously” is nothing like as difficult. (I was persuaded of this in about 20 minutes of conversation.)
One possible response is “it’s not possible to persuade people without math backgrounds, training in rationality, engineering degrees, etc”. To which I reply: what’s the data supporting that hypothesis? How much effort has MIRI expended in trying to explain to intelligent non-LW readers what they’re doing and why they’re doing it? And what were the results?
Another possible response is “We have done this, and it's available on our website. Read the Five Theses”. To which I reply: Is this is in the ideal form to persuade a McKinsey consultant who’s never read Less Wrong? If an entrepreneur with net worth $20m but no math background wants to donate to the most efficient charity he finds, would he be convinced? What efforts has MIRI made to test the hypothesis that the Five Theses, or Evidence and Import, or any other document, has been tailored to optimise the chance of convincing others?
(Further – if MIRI _does_ think this is as persuasive as it can possibly be, why doesn't it shift focus to get the Five Theses read by as many people as possible?)
Here’s one way to go about accomplishing this. Write up an explanation of the concerns MIRI has and how it is trying to allay them, and do so in clear English. (The Five Theses are available in Up-Goer Five form. Writing them in language readable by the average college graduate should be a cinch compared to that). Send it out to a few of the target market and find the points that could be expanded, clarified, or made more convinced. Maybe provide two versions and see which one gets the most positive response. Continue this process until the document has been through a series of iterations and shows no signs of improvement. Then shift focus to getting that link read by as many people as possible. Ask all of MIRI’s donors, all LW readers, HPMOR subscribers, friends and family etc, to forward that one document to their friends.