Brainstorming additional AI risk reduction ideas

John_Maxwell

Brainstorming additional AI risk reduction ideas — LessWrong

19 Brainstorming additional AI risk reduction ideas

14th Jun 2012

1 min read

19

It looks as though lukeprog has finished his series on how to purchase AI risk reduction. But the ideas lukeprog shares are not the only available strategies. Can Less Wrong come up with more?

A summary of recommendations from Exploring the Idea Space Efficiently:

Deliberately avoid exposing yourself to existing lines of thought on how to solve a problem. (The idea here is to defeat anchoring and the availability heuristic.) So don't review lukeprog's series or read the comments on this thread before generating ideas.
Start by identifying broad categories where ideas might be found. If you're trying to think of calculus word problems, your broad categories might be "jobs, personal life, the natural world, engineering, other".
With these initial broad categories, try to include all the categories that might contain a solution and none that will not.
Then generate subcategories. Subcategories of "jobs" might include "agriculture, teaching, customer service, manufacturing, research, IT, other". You're also encouraged to generate subsubcategories and so on.
Spend more time on those categories that seem promising.
You may wish to map your categories and subcategories on a piece of paper.

If you don't like that approach, here's another that's more difficult to summarize. Of course, unstructured idea generation is fine too.

If you're strictly a lurker, you can send your best ideas to lukeprog anonymously using his feedback box. Or send them to me anonymously using my feedback box so I can post them here and get all your karma.

Thread Usage

Please reply here if you wish to comment on the idea of this thread.

You're encouraged to discuss the ideas of others in addition to coming up with your own ideas.

If you split your ideas into individual comments, they can be voted on individually and you will probably increase your karma haul.

AI Risk

Personal Blog

19

Mentioned in

22How to Purchase AI Risk Reduction

New Comment

37 comments, sorted by

top scoring

Click to highlight new comments since: Today at 4:21 PM

[-]VincentYu14y120

Establish a scholarship to collect information on young talent

Create a merit scholarship for the type of young talent that SI wants to attract – this can reveal valuable information about this group of people, and can potentially be used a targeted publicity tool if handled well.

Information that could be collected from applications

Basic personal details (age, location, contact methods, etc.)
Education (past and future)
Academic interests
Career goals
Awards and competition results
Third-party reviews (i.e., letters of recommendation)
Basic personality assessment (see previous LW discussion on correlates with Big Five personality traits: [1], [2], [3])
Ideas about and attitudes toward x-risks/FAI/SI/FHI (these could be responses to prompts – as a bonus, applicants are introduced to the content in the prompts)
... Pretty much anything else (personal anecdote: I've revealed things about myself in college and scholarship applications that I have never expressed to anyone else)

Uses of this information

Check whether SI is effectively reaching the right people with its current plans.
The compiled list of young talent could be directly used to advertise things like SPARC to the right people.
General survey tool.

Potential problems and difficulties

Its use as an information gathering tool could be seen negatively.
Legal issues?
Publicity. The scholarship has to be made known to the relevant people, and this has to be done in such a way that SI is seen as a reputable institute. However, a scholarship does open up new avenues for publicity.
Cost and manpower.

Is anyone else doing this?

As with many ideas, we ought to be cautious if we see no one else doing something similar. Indeed, I cannot think of any high school scholarship that is used primarily to collect information for the sponsoring organization (is this really the case?). However, there is good reason for this – no one else is interested in reaching the same group of high school students. SI is the only organization I know of who wants to reach high school students for their research group.

FHI had a competition that could be an attempt to collect information, but I'm not sure.

High school scholarships

It would be wise to consult current high school scholarships, and AoPS has a good list.

[-]John_Maxwell14y40

I like this. The fact that this sort of thing is done by so many organizations seems evidence that it's a good way to reach young people. It might be wise to emphasize getting young folks to think about existential risks through an essay contest over gathering data about them.

see previous LW discussion on correlates with Big Five personality traits

Here's what I found:

http://lesswrong.com/lw/82g/on_the_openness_personality_trait_rationality/

http://lesswrong.com/lw/9m6/the_personality_of_greatcreative_scientists_open/

I suspect this sort of competition is better run through FHI, if only due to their Oxford affiliation.

[-]VincentYu14y40

The fact that this sort of thing is done by so many organizations seems evidence that it's a good way to reach young people.

Hmm... I was thinking that no other organization does this, but I imagine that we mean different things by "this" – I suppose you are referring to the large number of organizations that sponsor a scholarship, as opposed to the lack of organizations that sponsor a scholarship with the specific goal of collecting information on high school students.

It might be wise to emphasize getting young folks to think about existential risks through an essay contest over gathering data about them.

Good point. The way this is publicized would be important – it might be worthwhile to consult professionals.

Here's what I found:

http://lesswrong.com/lw/82g/on_the_openness_personality_trait_rationality/

http://lesswrong.com/lw/9m6/the_personality_of_greatcreative_scientists_open/

Thanks! These are the posts I was thinking of. I've added them to the grandparent. One more tangentially related post (also by Gwern):
http://lesswrong.com/r/discussion/lw/ac4/online_education_and_conscientiousness/

I suspect this sort of competition is better run through FHI, if only due to their Oxford affiliation.

That's a good idea. The main obstacle I see is that FHI might not want to get involved in such an endeavor, especially since they are targeting only postdocs for their research positions, and have no programs for students.

[-]tenlier14y-20

"Indeed, I cannot think of any high school scholarship that is used primarily to collect information for the sponsoring organization (is this really the case?). However, there is good reason for this – no one else is interested in reaching the same group of high school students. SI is the only organization I know of who wants to reach high school students for their research group."

I find this place persistently surprising, which is nice. Try to imagine what you would think if a religious organization did this and how you would feel. It's alright to hold a scholarship to encourage kids to be interested in a topic; not so to garner information for your own purposes, unless that is incredibly clear upfront. Very Gwernian.

[-]John_Maxwell14y120

Thoughts on the idea of this thread go here.

[-]lukeprog14y120

I very much approve of this thread and its clearly organized execution. Thank you.

[-]John_Maxwell14y00

You're welcome!

[-]PECOS-914y50

Might I suggest some people try using the techniques I posted about for brainstorming?

[-]John_Maxwell14y00

Thanks, I linked to them in the post.

[-]Stuart_Armstrong14y100

Check for an AI breakout in a toy model

Without deliberately stacking the deck, setup a situation in which an AI has a clear role to play in a toy model. But make the toy model somewhat sloppy, and give the AI great computer power, in the hope that it will "escape" and achieve its goals in unconventional way. If it doesn't, that useful information; if it does, that's even more useful, and we can get some info by seeing how it did that.

Then instead of the usual "paperclip maximiser goes crazy", we could point to this example as a canonical model of misbehaviour. Not something that is loaded with human terms and seemingly vague sentiments about superintelligences, but more like "how do you prevent the types of behaviour that agent D-f55F showed in the factorising Fibonacci number in the FCS world? After all, interacting with humans and the outside world throws up far more vulnerabilities than the specific ones D-f55F took advantage of in that problem. What are you doing to formally rule out exploitation of these vulnerabilities?"

(if situations like this have happened before, then no need to recreate them, but they should be made more prominent).

[-]private_messaging14y-10

The issue here is that almost no-one other than SI sees material utilitarianism as fundamental definition of intelligence (actually there probably aren't even any proponents of material utilitarianism as something to strive for at all). We don't have definition of what is number of paperclips, such definition seems very difficult to create, it is actually unnecessary for using computers to aid creation of paperclips, and it is trivially obvious that material utilitarianism is dangerous; you don't need to go around raising awareness of that among AI researchers whom aren't even working to implement material utilitarianism. If the SI wants to be taken seriously it ought to stop defining idiosyncratic meanings to the words and then confusing mainstream meanings with their own.

Basically, SI seem to see a very dangerous way of structuring intelligence as the only way, as the very definition of intelligence; that, coupled with nobody else seeing it as the only way, doesn't make AI research dangerous, it makes SI dangerous.

It gets truly ridiculous when the oracle is discussed.

Reasonably, if I want to make useful machine that answers question, when I ask it how to make a cake, it would determine what information I lack for making a cake, determine communication protocol, and provide that information to me. Basically, it'd be an intelligent module which I can use. I would need that functionality as part of any other system that helps make a cake. I'm not asking to be convinced to make a cake. A system that tries to convince me to make a cake would clearly be annoying. I don't need to think science fictional thoughts as of how it would destroy the world, it suffices that it is clearly doing something unnecessary and annoying (and in addition it would need inside itself a subsystem that does what i want). When building stuff bottom up there is no danger of accidentally building an aircraft carrier when all you want is a fishing boat and when its clear that aircraft carrier makes for a very crappy fishing boat.

In SI's view, the oracle has to set cake existence as a goal for itself (material utilitarianism), and then there is the danger that the oracle is going to manipulate me into making the cake. Or it might set me physically having inside my brain information for making cake as material goal for itself. Or something else material. Or, to quote this exact piece more directly, the predictor may want to manipulate the world (as it has material goal of predicting). This is outright ridiculous as for determining the action for influencing the world, predictor needs a predictor within itself which would not seek alteration of the world but would evaluate consequences of actions. And herein lies the other issue, the SI's intelligence is a monolithic, ontologically basic concept, and so the statements like these do not self defeat via the argument of "okay let's just not implement the unnecessary part of the AI that will clearly make it run amok and kill everyone, or at best make it less useful".

[-]John_Maxwell14y70

Grow the Cognitive Surplus-Powered AI Risks Research Community

Luke already identified funding Friendliness-related academic papers as a way to purchase AI risk reduction. But being an academic is not a necessary condition for producing useful research.

There already seems to be a community of amateurs who think about decision theory issues and other issues related to Friendliness in their spare time on Less Wrong. Some simple ideas that might grow/encourage this community: Offer cash prizes for useful insights, or feature useful insights on the Singularity Institute blog. Respond to the work of amateurs, as a way of recognizing its legitimacy. Improve recommended reading lists for understanding/contributing to FAI related topics and increase their visibility. Sponsor a "FAI programmer wannabe" mailing list/reading group.

[-]Manfred14y30

Offer cash prizes for useful insights, or feature useful insights on the Singularity Institute blog. Respond to the work of amateurs, as a way of recognizing its legitimacy. Improve recommended reading lists for understanding/contributing to FAI related topics and increase their visibility. Sponsor a "FAI programmer wannabe" mailing list/reading group.

I like your later suggestions much more than your first. We already have a supply of interested people - enabling people will probably have much more bang/buck than rewarding people. (And of course for those who haven't seen that video)

[-]John_Maxwell14y70

Publish AI Research Guidelines

The Singularity Institute has argued that the pace of friendliness research should outpace that of general purpose AI research if we want to have a positive singularity. But not all general purpose AI research is created equal. Some might be relevant to friendliness. Some might be useful in architecting a FAI. And some might only be useful for architecting a potential UFAI.

It seems possible that persuading AI researchers to change the course of their research is much easier than persuading them to quit altogether. By publishing a set of research recommendations, SI could potentially shape whatever AI research is done towards maximally Friendly ends.

Costs: Someone would have to understand all major AI research fronts in enough depth to evaluate their relevance for Friendliness, and have a good idea of the problems that need to be solved for Friendliness and what form a Friendly architecture might take. Psychological research on persuasion might additionally be beneficial. For example, is it best to explicitly identify lines of research that seem especially dangerous or just leave them out of the document altogether?

[-]lukeprog14y30

I believe FHI is working on this, and may have it ready once Nick's book is done. I think Nick told me he plans to involve SIAI in the creation of these guidelines.

[-]John_Maxwell14y50

Persuade Grantmaking Organizations that Certain Lines of Research are Dangerous

I've never been involved in academia, but my vague idea of how things work is that researchers apply for grants from organizations that sponsor their research. If these organizations could be persuaded to change the criteria they used to assign grants, it's possible the progress of AI research could be shaped.

[-]John_Maxwell14y40

More thoughts on this:

Assuming my model of how academia works is correct (can someone comment on this?), persuading grantmakers could be a better use of time than trying to persuade researchers directly for a few reasons:

There are probably many researchers for each grantmaker, so personal communication with individual grantmakers gives greater leverage.
Grantmakers probably have less personal investment in the research they judge, which would mean less motivated cognition to go through with the research.
Grantmakers are more likely to be interested in what will be beneficial for society as a whole, whereas individual researchers may be more more motivated by gaining status or solving problems for their own sake.

[-]private_messaging14y00

For example, the research into a: how to make AI relate it's computational structure to the substrate (AIXI does not, and fails to self preserve), b: how to prevent wireheading for AI that does relate it's computational structure to the substrate, and c: how to define real world goals for AI to pursue (currently the AIs are just mathematics that makes some abstract variables satisfy abstract properties that may be described in the real world terms in the annotations in the papers but implement no correspondence to the real world).

Such research is clearly dangerous, and also unnecessary for creation of practically useful AIs (so it is not done at large; perhaps it is only done by SI in which case persuading grantmaking organizations not to give any money to SI may do the trick)

[-]John_Maxwell14y40

Hire a High-Profile AI Researcher

SI says they've succeeded in convincing a few high-profile AI researchers that AGI research is dangerous. If one of these researchers could be hired as a SI staff member, they could lend their expertise to the development of Friendliness theories and also enhance SI's baseline credibility in the AI research community in general.

A related idea is to try to get these AI researchers to sign a petition making a statement about AI dangers.

Both of these ideas risk potentially unwanted publicity.

Note that both of these ideas are on SI's radar; I mention them here so folks can comment.

[-]ChristianKl14y10

Could you elaborate how those ideas could lead to unwanted publicity?

[-]John_Maxwell14y00

Having a high-profile AI researcher join SI, or a number of high-profile AI researchers express concern with AI safety, could make an interesting headline for a wide variety of audiences. It's not clear that encouraging commentary on AI safety from the general public is a good idea.

[-]Emile14y40

Make a "moral expert system" contest

Have a set of moral dilemmas, and

1) Through an online form, humans say what choice they would make in that situation

2) There's a contest to write a program that would choose like a human in those situations.

(Or alternatively, a program that given some of the choices that a human made, guesses which other choices he made in other situations)

[-]John_Maxwell14y20

A contest like this could be a nice way to put the rhetorical onus on AI researchers to demonstrate that their approach to AI can be safe. Instead of the Singularity Institute having to prove that AGI can potentially be dangerous, really AGI researchers should have to prove the opposite.

It's also pretty digestible from a publicity standpoint. You don't have to know anything about the intelligence explosion to notice that robots are being used in warfare and worry about this.

(I suspect that if SI found the right way to communicate their core message, they could persuade average people that AI research is dangerous pretty easily without any technical jargon or reference to science fiction concepts.)

And contestants will probably make at least some progress on Friendliness in the course of participating.

On the other hand, if the contest is easy and fails to reflect real-world friendliness challenges then its effect could be negative.

[-]TheOtherDave14y00

(I suspect that if SI found the right way to communicate their core message, they could persuade average people that AI research is dangerous pretty easily without any technical jargon or reference to science fiction concepts.)

I have no doubt of this. It's not difficult to convince average people that a given technological innovation is dangerous. Whether doing so would cause more good than harm is a different question.

[-]private_messaging14y-10

Instead of the Singularity Institute having to prove that AGI can potentially be dangerous, really AGI researchers should have to prove the opposite.

How's about we prove that teens texting can not result in emergence of hivemind that would subsequently invent better hardware to run itself on, and rid of everyone?

How's about you take AIXI, and analyze it, and see that it doesn't relate itself to it's computational substrate, subsequently being unable to understand self preservation? There are other, much more relevant ways of being safe than "ohh it talks so moral".

[-]Emile14y20

Sponsor a "morality turing test" contest

From Prolegomena to any future artificial moral agent:

A Moral Turing Test (MTT ) might similarly be proposed to bypass disagreements about ethical standards by restricting the standard Turing Test to conversations about morality. If human interrogators ’ cannot identify the machine at above chance accuracy, then the machine is, on this criterion, a moral agent . [...] To shift the focus from conversational ability to action, an alternative MTT could be structured in such a way that theinterrogator ’ is given pairs of descriptions of actual, morally-significant actions of a human and an AMA, purged of all references that would identify the agents. If the interrogator correctly identify es the machine at a level above chance, then the machine has failed the test. A problem for this version of the MTT is that distinguishability is the wrong criterion because the machine might be recognizable for acting in ways that are consistently better than a human in the same situation. So instead, the interrogator might be asked to assess whether one agent is less moral than the other. If the machine is not reported as responding less morally than the human, it will have passed the test. This test is called the ‘comparative MTT’"

The rules may have to be tweaked a bit more, but it sounds like something that might get various AI students or wannabe AI programmers interested in morality.

[-]jacob_cannell14y20

This may have some value, but probably not towards actually making AI more moral/friendly on average. Conversing about morality can demonstrate knowledge of morality, but does little to demonstrate evidence of being moral/friendly. Example: a psychopath would not necessarily have any difficulty passing this Moral Turing Test.

[-]Viliam_Bur14y60

On the other hand, a machine could fail a morality test simply by saying something controversial, or just failing to signal properly. For example atheism could be considered immoral by religious people; they could conclude that the machine is missing a part of human utility function. Or if some nice and correct belief has bad consequences, but humans compartmentalize it away and the machine would point it out explicitly, that could be percieved as a moral failure.

If the machine is allowed to lie, passing this test could just mean the machine is a skilled psychopath. If the machine is not allowed to lie, failing this test could just mean humans confuse signalling with the real thing.

[-]Emile14y00

I agree, the goal is to get humans to think about programming some forms of moral reasoning, even if it's far from sufficient (and it's far from being the hardest part of FAI).

[-]Emile14y20

Lobby the Government for security regulations for AI

Just like there are security regulations on bioengineering, chemicals, nuclear power, weapons, etc. - there could be regulations on AI, with official auditing of risks, etc. This would create more demand for officially recognized "AI Risk" experts; will force projects to pay more attention to those issues (even if it's only for coming up with rationalizations for why their project is safe), etc.

This doesn't have to mean banning "unsafe" research; the existence of a "safe AI" certification means it might become a prerequisite for certain grants, or a marketing argument (even if the security standards for "safe AI" are not sufficient to actually guarantee safety).

[-]jacob_cannell14y10

Create a Direct Friendly AGI Investment Vehicle

Currently most AGI research is primarily sponsored by private angel investors and VCs towards private ends.

I have savings I would like to invest in friendly AGI, and I don't see a clear vehicle to enable that purchase at the moment. I've brought this up before at LW meetings and believe there is a growing market for such a vehicle.

The idea could take the form of a public accessible angel group or community chest type organization that allows investors to get exposure to a large pool of relatively safe AGI designs. The SIAI could take a lead role as a safety rating agency, connecting to John Maxwell's Safety Guideline ideas elsewhere in this thread.

AGI research is taking off regardless, the idea here is to differentially accelerate the safer designs.

[-]Manfred14y40

What business model using FAI will convince people to invest? Just adding "and it will be FAI" to typical AI pitches?

[-]jacob_cannell14y00

The specific business models will naturally vary from project to project, but so far the general form still appears to be the boosting approach where a more narrow AI application area is monetized first. For example Deepmind seems to be first pursuing AI for social games, whereas Vicarious is going for vision.

The key figures both gave Singularity Summit talks, Peter Thiel's group invested in both, and Thiel is obviously aware of friendliness issues, so I assume that the pitches involved something more than "and it will be FAI", but beyond that I can only speculate.

[-]lukeprog14y20

Please email me so we can discuss this. luke [at] singularity.org

[-]Emile14y10

Make a "Friendly AI" programming challenge in a toy simulated society

(warning: this is pretty half-baked)

Step one (to prepare the contest) would be making interesting simulated societies, in which agents evolve social norms.

A simulated society would be a population of agents, and each turn some of them would be picked randomly to participate in a game, and as a result some agents may be killed, or modified (tagged "evil", "pregnant", "hungry", etc.), or new agents may be created, etc.

Agents would each have a set of "norms" ("in situation X, doing Y is good") that together would effectively amount to the agent's utility function. These norms would also work as the agents' genome, so when new agents are created, their norms are derived from those of their parent(s).

In effect, this would amount to evolving the most appropriate utility function for the simulation's specific rules. Several sets of rules could be devised to make different interesting societies.

Step two would be to have a contest where programmers have to write a "God" program that would be analogous to a superhuman AI in our world. I'm not quite sure of the best way to do that, but in all cases the program should be given some of the evolved norms as input.

One possibility would be to have the "God" program be a utility function, after which the whole simulated world is optimized according to that utility function.

Another would be to have the "God" program be an agent that participates in all games with many more choices available.

A twist could be that the "God" program is given only the norms on which some agents disagree, those on which everybody always agree "go without saying" (such as "rocks don't count as sentient beings"). Another is that the same "'God" program would be used in simulations with different rules.

(A big problem with this is that in the simulations, unlike in reality, agents are ontologically basic objects, so a lot of "dreams of friendly AI" would actually work in the simulation. Still, it should be possible to devise a simulation where the "God" program doesn't have access to the agents as ontologically basic objects)

A contest like that may allow people to realize that some of their ideas on value extrapolation etc. do not work.

[-]John_Maxwell14y00

Someone sent me this anonymous suggestion:

Well, let’s consider AIXI-tl properly, mathematically, without the 'what would I do in it’s shoes’ idiocy and without incompetent “let’s just read the verbal summary”. The AIXI-tl

1: looks for a way to make the button pressed.

2: actually, not even that; it does not relate itself to the representation of itself inside it’s representation of the world, and can’t model world going on without itself. It can’t understand death. It’s internal model is dualist.

It is an AI that won’t stop you from shutting it down. If you try to resolve 2, then you hit another very hard problem, wireheading.

Those two problems naturally stay in the way of creation of AI that kills everyone, or AI that wants to bring about heaven on earth, but they are entirely irrelevant to the creation of useful AI in general. Thus the alternative approach to AI risk reduction is to withdraw all funding from SI or any other organization working on philosophy of mind for AI, as those organizations create the risk of AGI that solves those two very hard problems which prevent arbitrary useful AI from killing us all.

[-]jacob_cannell14y00

Just a guess, but this sounds very much like.

[This comment is no longer endorsed by its author]Reply

Moderation Log