Advice for AI makers

7 Post author: Stuart_Armstrong 14 January 2010 11:32AM

A friend of mine is about to launch himself heavily into the realm of AI programming. The details of his approach aren't important; probabilities dictate that he is unlikely to score a major success. He's asked me for advice, however, on how to design a safe(r) AI. I've been pointing him in the right directions and sending him links to useful posts on this blog and the SIAI.

Do people here have any recommendations they'd like me to pass on? Hopefully, these may form the basis of a condensed 'warning pack' for other AI makers.

Addendum: Advice along the lines of "don't do it" is vital and good, but unlikely to be followed. Coding will nearly certainly happen; is there any way of making it less genocidally risky?

Comments (196)

Comment author: Bugmaster 29 June 2014 02:39:57AM 0 points [-]

What does "AI programming" even mean ? If he's trying to make some sort of an abstract generally-intelligent AI, then he'll be wasting his time, since the probability of him succeeding is somewhere around epsilon. If he's trying to make an AI for some specific purpose, then I'd advise him to employ lots of testing and especially cross-validation, to avoid overfitting. Of course, if his purpose is something like "make the smartest killer drone ever", then I'd prefer him to fail...

Comment author: JamesAndrix 19 January 2010 03:57:06PM 4 points [-]

Create a hardware device that would be fatal to the programmer. Allow it to be activated by a primitive action that the program could execute. Give the primitive a high apparent utility. Code the AI however he wants.

If he gets cold sweats every time he does a test run, the rest of us will probably be OK.

Comment author: zero_call 17 January 2010 07:52:13PM *  1 point [-]

I've read through the AI-Box experiment, and I can still say that I recommend the "sealed AI" tactic. The Box experiment isn't very convincing at all to me, which I could go into detail about, but that would require a whole post. But of course, I'll never develop the karma to do that because apparently the rate at which I ask questions of proper material exceeds the rate at which I post warm, fuzzy comments. Well, at least I have my own blog...

Comment author: orthonormal 17 January 2010 08:24:33PM *  2 points [-]

It looks like you're picking up karma relatively rapidly of late; it takes a while to learn the ways of speaking around here that don't detract from the content of one's comments, but once that happens, most people will accumulate karma reasonably quickly.

But since the AI-Box experiment has been discussed a bit here already, it might make sense to lay out your counterargument here or on the Open Thread for now. I know that's not as satisfying as making a post, but I think you'll still get quality discussion.

(Also, a top-level post on an old topic by a relative newcomer runs a risk of getting downvoted for redundancy if the argument recapitulates someone's old position— and post downvotes can kill your karma for a while. Caveat scriptor!)

P.S. Also on the topic, and quite interesting: That Alien Message.

Comment author: zero_call 17 January 2010 09:16:38PM 0 points [-]

I think God themselves just struck me with +20 karma somehow... thank ye almighty lords! Yeah, but indeed I will heed your advice and look into the issue more before posting.

Comment author: zero_call 16 January 2010 08:29:05PM 2 points [-]

For solving the Friendly AI problem, I suggest the following constraints for your initial hardware system:

1.) All outside input (and input libraries) are explicitly user selected. 2.) No means for the system to establish physical action (e.g., no robotic arms.) 3.) No means for the system to establish unexpected communication (e.g., no radio transmitters.)

Once this closed system has reached a suitable level of AI, then the problem of making it friendly can be worked on much easier and more practically, and without risk of the world ending.

To start out from the beginning to make a GAI friendly through some other means seems rather ambitious to me. Why not just work on AI now, make sure when you're getting close to the goal, that the AI is suitably restricted, and then finally use the AI itself as an experimental testbed for "personality certification".

(Can someone explain/link me to why this isn't currently espoused?)

Comment author: timtyler 17 January 2010 11:14:21AM *  1 point [-]

Didn't David Chalmers propose that here:

http://www.vimeo.com/7320820

...?

Test harnesses are a standard procedure - but they are not the only kind of test.

Basically, unless you are playing chess, or something, if you don't test in the real world, you won't really know if it works - and it can't do much to help you do important things - like raise funds to fuel development.

Comment author: orthonormal 16 January 2010 11:57:56PM *  2 points [-]

I don't understand why this comment was downvoted.

Yes, zero call asks a question many of us feel has been adequately answered in the past; but they are asking politely, and it would have taken extensive archive-reading for them to have already known about the AI-Box experiment.

Think before you downvote, especially with new users!

EDIT: As AdeleneDawner points out, zero call isn't that new. Even so, the downvotes (at -2 when I first made my comment) looked more like signaling disagreement than anything else.

Comment author: Vladimir_Nesov 17 January 2010 10:56:24AM 1 point [-]

I downvoted the comment not because of AI box unsafety (which I don't find convincing at the certainty level with which it's usually asserted -- disutility may well give weight to the worry, but not to the probability), but because it gives advice on the paint color for a spaceship in the time when Earth is still standing on a giant Turtle in the center of the world. It's not a sane kind of advice.

Comment author: orthonormal 17 January 2010 06:11:37PM 1 point [-]

If I'd never heard of the AI-Box Experiment, I'd think that zero call's comment was a reasonable contribution to a conversation about AI and safety in particular. It's only when we realize that object-level methods of restraining a transhuman intelligence are probably doomed that we know we must focus so precisely on getting its goals right.

Comment author: blogospheroid 17 January 2010 07:47:06PM 1 point [-]

Vladimir and orthonormal,

Please point me to some more details about the AI box experiment, since I think what i suggested earlier as isolated virtual worlds is pretty much the same as what zero call is suggesting here.

I feel that there are huge assumptions in the present AI Box experiment. The gatekeeper and the AI share a language, for one, by which the AI convinces the gatekeeper.

If AGI is your only criteria without regards to friendliness, just make sure not to communicate with the AI. Turing tests are not the only proofs of intelligence. If the agi can come up with unique solutions in the universe in which it is isolated, that is enough to understand this algorithm is creative.

Comment author: AdeleneDawner 17 January 2010 09:40:32PM 1 point [-]

This just evoked a possibly-useful thought:

If observing but not communicating with a boxed AI does a good enough job of patching the security holes (which I understand that it might not - that's for someone who better understands the issue to look at), perhaps putting an instance of a potential FAI in a contained virtual world would be useful as a test. It seems to me that a FAI that didn't have humans to start with would perhaps have to invent us, or something like us in some specific observable way(s), because of its values.

Comment author: AdeleneDawner 17 January 2010 12:04:39AM 0 points [-]

Good thought, but on further examination it turns out that zero isn't all that new - xe's been commenting since November; xyr karma is low because xe has been downvoted almost as often as upvoted.

Comment author: Technologos 16 January 2010 08:32:26PM 8 points [-]

This is essentially the AI box experiment. Check out the link to see how even an AI that can only communicate with its handler(s) might be lethal without guaranteed Friendliness.

Comment author: Alicorn 16 January 2010 08:35:56PM 7 points [-]

I don't think the publicly available details establish "how", merely "that".

Comment author: Technologos 16 January 2010 08:56:23PM 4 points [-]

Sure, though the mechanism I was referring to is "it can convince its handler(s) to let it out of the box through some transhuman method(s)."

Comment author: RobinZ 16 January 2010 09:03:28PM 1 point [-]

Wait, since when is Eliezer transhuman?

Comment author: Technologos 16 January 2010 09:23:54PM 6 points [-]

Who said he was? If Eliezer can convince somebody to let him out of the box--for a financial loss no less--then certainly a transhuman AI can, right?

Comment author: RobinZ 16 January 2010 10:19:19PM 2 points [-]

Certainly they can; what I am emphasizing is that "transhuman" is an overly strong criterion.

Comment author: Technologos 16 January 2010 10:21:25PM 3 points [-]

Definitely. Eliezer reflects perhaps a maximum lower bound on the amount of intelligence necessary to pull that off.

Comment deleted 15 January 2010 05:14:07PM *  [-]
Comment author: timtyler 15 January 2010 11:10:36PM 0 points [-]

I figure that would be slow, ineffectual and probably more dangerous than other paths in the unlikely case that it was successful.

Comment deleted 15 January 2010 11:20:20PM *  [-]
Comment author: timtyler 15 January 2010 11:35:10PM *  0 points [-]

I'm not sure that is a proper sentence.

I do think that we could build something more dangerous to civilization than the human race is at that time - but that seems like a rather obvious thing to think - and the fact that it is possible does not necessarily mean that it is likely.

Comment author: JamesAndrix 17 January 2010 12:21:19AM 0 points [-]

Key Noun phrase: the human race,..., trying to build an AI,

Then: {description of difficulty of said activity}

I'm not sure it's proper either, but I'm sure you misparsed it.

Comment author: timtyler 17 January 2010 01:45:51AM *  -2 points [-]

Yay, that really helped!

Roko and I don't see eye to eye on this issue. From my POV, we have had 50 years of unsuccessful attempts. That is not exactly "getting it right the first time".

Google was not the first search engine, Microsoft was not the first OS maker - and Diffie–Hellman didn't invent public key crypto.

Being first does not necessarily make players uncatchable - and there's a selection process at work in the mean time, that weeds out certain classes of failures.

From my perspective, this is mainly a SIAI confusion. Because their funding is all oriented around the prospect of them saving the world from imminent danger, the execution of their mission apparently involves exaggerating the risks associated with that - which has the effect of stimulating funding from those who they convince that DOOM is imminent - and that the SIAI can help with averting in.

Humans will most likely get the machines they want - because people will build them to sell them - and because people won't buy bad machines.

Comment deleted 17 January 2010 02:01:17AM *  [-]
Comment author: timtyler 17 January 2010 10:20:00AM *  2 points [-]

The other thing to say is that there's an important sense in which most modern creatures don't value anything - except for their genetic heritage - which all living things necessarily value.

Contrast with a gold-atom maximiser. That values collections of pure gold atoms. It cares about something besides the survival of its genes (which obviously it also cares about - no genes, no gold). It strives to leave something of value behind.

Most modern organisms don't leave anything behind - except for things that are inherited - genes and memes. Nothing that they expect to last for long, anyway. They keep dissipating energy gradients until everything is obliterated in high-entropy soup.

Those values are not very difficult to preserve - they are the default state.

If ecosystems cared about creating some sort of low-entropy state somewhere, then that property would take some effort to preserve (since it is vulnerable to invasion by creatures who use that low-entropy state as fuel). However, with the current situation, there aren't really any values to preserve - except for those of the replicators concerned.

The idea has been called variously: goal system zero, god's utility function, Shiva's values.

Even the individual replicators aren't really valued in themselves - except by themselves. There's a parliament of genes, and any gene is expendable, on a majority vote. Genes are only potentially immortal. Over time, the representation of the original genes drops. Modern refactoring techniques will mean it will drop faster. There is not really a floor to the process - eventually, all may go.

Comment author: timtyler 17 January 2010 09:56:32AM *  1 point [-]

I figure a fair amount of modern heritable information (such as morals) will not be lost. Civilization seems to be getting better at keeping and passing on records. You pretty-much have to hypothesize a breakdown of civilization for much of genuine value to be lost - an unprecedented and unlikely phenomenon.

However, I expect increasing amounts of it to be preserved mostly in history books and museums as time passes. Over time, that will probably include most DNA-based creatures - including humans.

Evolution is rather like a rope. Just as no strand in a rope goes from one end to the other, most genes don't tend to do that either. That doesn't mean the rope is weak, or that future creatures are not - partly - our descendants.

Comment author: Technologos 17 January 2010 09:16:23PM 1 point [-]

an unprecedented and unlikely phenomenon

Possible precedents: the Library of Alexandria and the Dark Ages.

Comment author: timtyler 17 January 2010 09:27:33PM *  1 point [-]

Reaching, though: the dark ages were confined to Western Europe - and something like the Library of Alexandria couldn't happen these days - there are too many libraries.

Comment deleted 17 January 2010 11:45:06AM [-]
Comment author: timtyler 17 January 2010 01:34:23PM -1 points [-]

Museums have some paperclips in them. You have to imagine future museums as dynamic things that recreate and help to visualise the past - as well as preserving artefacts.

Comment author: blogospheroid 15 January 2010 07:35:10AM 2 points [-]

Due to the lack of details, it is difficult to make a recomendation, but some thoughts.

Both as an AGI challenge and for general human safety, business intelligence datawarehouses are probably a good bet. Any pattern undetected by humans detected by an AI could mean good money, which could feedback into more resources for the AI. Also, the ability of corporations to harm others doesn't increase significantly with a better business intelligence tool.

Virtual worlds - If the AI is tested in an isolated virtual world, that will be better for us. Test it in a virtual world that is completely unlike ours, a gas giant simulation maybe. Even if it develops extremely capable technology to deal with the gas giant environment within the simulation, it would mean very little in the real world except as a demonstration of intelligence.

Comment author: JamesAndrix 17 January 2010 12:45:37AM 4 points [-]

Virtual Worlds doesn't buy you any safety, even if it can't break out of the simulator.

If you manage to make AI, you've got a Really Powerful Optimization Process. If it worked out simulated physics and has access to it's own source, it's probably smart enough to 'foom', even with the simulation. At which point you have a REALLY powerful optimizer, and no idea how to prove anything about it's goal system. An untrustable genie.

Also, spending all those cycles on that kind of simulated world would be hugely inefficient.

Comment author: blogospheroid 17 January 2010 07:22:59PM 2 points [-]

James, you can't blame me for responding to the question. Stuart has said that advice on giving up will not be accepted. The question is to minimise the fallout of a lucky stroke moving this guy's AI forward and fooming. Both of my suggestions were around that.

Comment author: JamesAndrix 17 January 2010 08:26:56PM 2 points [-]

You are quite right.

Comment author: ChristianKl 16 January 2010 06:38:32PM 0 points [-]

Additionally the computer on which the virtual world runs shouldn't be directly connected to other computers to prevent the AGI to escape through some 0day.

Comment author: wedrifid 15 January 2010 08:33:08AM *  3 points [-]

Virtual worlds - If the AI is tested in an isolated virtual world, that will be better for us. Test it in a virtual world that is completely unlike ours, a gas giant simulation maybe. Even if it develops extremely capable technology to deal with the gas giant environment within the simulation, it would mean very little in the real world except as a demonstration of intelligence.

You are giving a budding superintelligence exposure to a simulation based on our physics? It would work out the physics of the isolated virtual world, deduce from the traces you leave in the design that it is in a simulation and have a good guess on what we believe to be the actual physics of our universe. Maybe even have a hunch about how we have physics wrong. I would not want to bet our existence on it being unable to get out of that box.

Comment author: blogospheroid 16 January 2010 10:01:15AM 1 point [-]

My point with the virtual worlds was to put the AI into a simulation sufficiently unlike our world that it wouldn't be a threat and sufficiently like our world that we would be able to recognise what it does as intelligence. Hence the Gas giant example.

If we were to release an AI into today's simulations like sims which are much less granular than the one I have proposed in my post, then it would figure out that it is in a simulation much faster.

If we put it into some other kind of universe with weird physics, a magical universe lets say, then we will need to send someone intelligent to do a considerable amount of trials before we release the AI. This is to prove that whatever solutions the AI comes up with are genuinely intelligent and not something that is obvious.

I too agree that we wouldn't want to bet our existence on it being unable to get out of that box, but what evidence will we leave in the simulation which will point to it that it has to "Press Red for talking to simulator"? Or to put it in even simpler terms, where in our universe is OUR "Press Red to talk to simulator" button?

Comment author: Normal_Anomaly 27 June 2011 05:30:51PM 0 points [-]

My point with the virtual worlds was to put the AI into a simulation sufficiently unlike our world that it wouldn't be a threat and sufficiently like our world that we would be able to recognise what it does as intelligence. Hence the Gas giant example.

I'm not sure I follow. Gas giants run on the same physics as you and me. Do you mean a world with actual different simulated physics?

Comment author: wedrifid 16 January 2010 01:41:08PM 0 points [-]

I too agree that we wouldn't want to bet our existence on it being unable to get out of that box, but what evidence will we leave in the simulation which will point to it that it has to "Press Red for talking to simulator"?

I don't know. Who is going to be creating the simulation? How can I be comfortable that he will not either make a bug or design a simulation that a superintelligence cannot deduce that it is artificial? Proving that things way way smarter than me couldn't know stuff is hard. Possible sometimes but hard.

Or to put it in even simpler terms, where in our universe is OUR "Press Red to talk to simulator" button?

The presence or absence of such a button in our universe provides some evidence about whether we could reliably create a simulation that is undetectable. But not that much evidence.

Comment author: ChristianKl 16 January 2010 06:43:38PM 2 points [-]

How would you design such a button? Reciting a fixed verse and afterwards stating what you want from the simulator seems like a good technique. A majority of the people on this earth believe that such a button exists in form of praying ;)

Comment author: Eliezer_Yudkowsky 15 January 2010 03:43:34AM 11 points [-]

"And I heard a voice saying 'Give up! Give up!' And that really scared me 'cause it sounded like Ben Kenobi." (source)

Friendly AI is a humongous damn multi-genius-decade sized problem. The first step is to realize this, and the second step is to find some fellow geniuses and spend a decade or two solving it. If you're looking for a quick fix you're out of luck.

The same (albeit to a lesser degree) is fortunately also true of Artificial General Intelligence in general, which is why the hordes of would-be meddling dabblers haven't killed us all already.

Comment author: Wei_Dai 16 January 2010 04:35:13AM *  30 points [-]

This article (which I happened across today) written by Ben Goertzel should make interesting reading for a would-be AI maker. It details Ben's experience trying to build an AGI during the dot-com bubble. His startup company, Webmind, Inc., apparently had up to 130 (!) employees at its peak.

According to the article, the AGI was almost completed, and the main reason his effort failed was that the company ran out of money due to the bursting of the bubble. Together with the anthropic principle, this seems to imply that Ben is the person responsible for the stock market crash of 2000.

I was always puzzled why SIAI hired Ben Goertzel to be its research director, and this article only deepens the mystery. If Ben has done an Eliezer-style mind-change since writing that article, I think I've missed it.

ETA: Apparently Ben has recently been helping his friend Hugo de Garis build an AI at Xiamen University under a grant from the Chinese government. How do you convince someone to give up building an AGI when your own research director is essentially helping the Chinese government build one?

Comment author: timtyler 25 June 2011 12:06:47PM *  7 points [-]

I was always puzzled why SIAI hired Ben Goertzel to be its research director, and this article only deepens the mystery.

Ben has a Phd, can program, has written books on the subject and has some credibility. Those kinds of things can help a little if you are trying to get people to give you money in the hope of you building a superintelligent machine. For more see here:

It has similarly been a general rule with the Singularity Institute that, whatever it is we're supposed to do to be more credible, when we actually do it, nothing much changes. "Do you do any sort of code development? I'm not interested in supporting an organization that doesn't develop code" -> OpenCog -> nothing changes. "Eliezer Yudkowsky lacks academic credentials" -> Professor Ben Goertzel installed as Director of Research -> nothing changes. The one thing that actually has seemed to raise credibility, is famous people associating with the organization, like Peter Thiel funding us, or Ray Kurzweil on the Board.

Comment author: XiXiDu 24 June 2011 09:35:07AM 3 points [-]

According to the article, the AGI was almost completed, and the main reason his effort failed was that the company ran out of money due to the bursting of the bubble. Together with the anthropic principle, this seems to imply that Ben is the person responsible for the stock market crash of 2000.

Phew...I was almost going to call bullshit on this but that would be impolite.

Comment author: Wei_Dai 20 January 2010 04:15:32AM 5 points [-]

I just came across an old post of mine that asked a similar question:

BTW, I still remember the arguments between Eliezer and Ben about Friendliness and Novamente. As late as January 2005, Eliezer wrote:

And if Novamente should ever cross the finish line, we all die. That is what I believe or I would be working for Ben this instant.

I'm curious how that debate was resolved?

From the reluctance of anyone at SIAI to answer this question, I conclude that Ben Goertzel being the Director of Research probably represents the outcome of some internal power struggle/compromise at SIAI, whose terms of resolution included the details of the conflict being kept secret.

What is the right thing to do here? Should we try to force an answer out of SIAI, for example by publicly accusing it of not taking existential risk seriously? That would almost certainly hurt SIAI as a whole, but might strengthen "our" side of this conflict. Does anyone have other suggestions for how to push SIAI in a direction that we would prefer?

Comment author: Eliezer_Yudkowsky 20 January 2010 04:25:05AM 9 points [-]

The short answer is that Ben and I are both convinced the other is mostly harmless.

Comment author: Furcas 20 January 2010 04:48:32AM 2 points [-]

Can we know how you came to that conclusion?

Comment author: wedrifid 20 January 2010 04:39:07AM 3 points [-]

There is one 'mostly harmless' for people who you think will fail at AGI. There is an entirely different 'mostly harmless' for actually have a research director who tries to make AIs that could kill us all. Why would I not think the SIAI is itself an existential risk if the criteria for director recruitment is so lax? Being absolutely terrified of disaster is the kind of thing that helps ensure appropriate mechanisms to prevent defection are kept in place.

What is the right thing to do here? Should we try to force an answer out of SIAI, for example by publicly accusing it of not taking existential risk seriously?

Yes. The SIAI has to convince us that they are mostly harmless.

Comment author: Wei_Dai 20 January 2010 04:36:07AM 3 points [-]

Have you updated that in light of the fact that Ben just convinced the Chinese government to start funding AGI? (See my article link earlier in this thread.)

Comment author: Eliezer_Yudkowsky 20 January 2010 04:39:01AM 7 points [-]

Hugo de Garis is around two orders of magnitude more harmless than Ben.

Comment author: Kevin 24 June 2010 08:28:27PM 12 points [-]

Update for anyone that comes across this comment: Ben Goertzel recently tweeted that he will be taking over Hugo de Garis's lab, pending paperwork approval.

http://twitter.com/bengoertzel/status/16646922609

http://twitter.com/bengoertzel/status/16647034503

Comment author: Wei_Dai 20 January 2010 05:11:33AM 4 points [-]

Hugo de Garis is around two orders of magnitude more harmless than Ben.

What about all the other people Ben might help obtain funding for, partly due to his position at SIAI?

And what about the public relations/education aspect? It's harmless that SIAI appears to not consider AI to be a serious existential risk?

Comment author: wedrifid 20 January 2010 12:26:58PM 6 points [-]

And what about the public relations/education aspect? It's harmless that SIAI appears to not consider AI to be a serious existential risk?

This part was not answered. It may be a question to ask someone other than Eliezer. Or just ask really loudly. That sometimes works too.

Comment author: Eliezer_Yudkowsky 20 January 2010 06:37:00AM 4 points [-]

What about all the other people Ben might help obtain funding for, partly due to his position at SIAI?

The reverse seems far more likely.

Comment author: Wei_Dai 20 January 2010 12:16:06PM 1 point [-]

What about all the other people Ben might help obtain funding for, partly due to his position at SIAI?

The reverse seems far more likely.

I don't know how to parse that. What do you mean by "the reverse"?

Comment author: wedrifid 20 January 2010 12:23:09PM 1 point [-]

I don't know how to parse that. What do you mean by "the reverse"?

Ben's position at SIAI may reduce the expected amount of funding he obtains for other existentially risky persons.

Comment author: wedrifid 20 January 2010 04:47:41AM 2 points [-]

How much of this harmlessness is perceived impotence and how much is it an approximately sane way of thinking?

Comment author: XiXiDu 04 November 2010 06:53:23PM 3 points [-]

Do you believe the given answer? And if Ben is really that impotent, what do you think does it reveal about the SIAI, or whoever put Ben into a position within the SIAI?

Comment author: wedrifid 04 November 2010 07:00:43PM 5 points [-]

Do you believe the given answer?

I don't know enough about his capabilities when it comes to contributing to unfriendly AI research to answer that. Being unable to think sanely about friendliness or risks may have little bearing on your capabilities with respect to AGI research. The modes of thinking have very little bearing on each other.

And if Ben is really that impotent, what do you think does it reveal about the SIAI, or whoever put Ben into a position within the SIAI?

That they may be more rational and less idealistic than I may otherwise have guessed. There are many potential benefits the SIAI could gain from an affiliation with those inside the higher status AGI communities. Knowing who to know has many uses unrelated to knowing what to know.

Comment author: ata 04 November 2010 08:48:36PM *  6 points [-]

That they may be more rational and less idealistic than I may otherwise have guessed. There are many potential benefits the SIAI could gain from an affiliation with those inside the higher status AGI communities. Knowing who to know has many uses unrelated to knowing what to know.

Indeed. I read part of this post as implying that his position had at least a little bit to do with gaining status from affiliating with him ("It has similarly been a general rule with the Singularity Institute that, whatever it is we're supposed to do to be more credible, when we actually do it, nothing much changes. 'Do you do any sort of code development? I'm not interested in supporting an organization that doesn't develop code' -> OpenCog -> nothing changes. 'Eliezer Yudkowsky lacks academic credentials' -> Professor Ben Goertzel installed as Director of Research -> nothing changes.").

Comment author: XiXiDu 04 November 2010 07:41:15PM 7 points [-]

There are many potential benefits the SIAI could gain from an affiliation with those inside the higher status AGI communities. Knowing who to know has many uses unrelated to knowing what to know.

Does this suggest that founding a stealth AGI institute (to coordinate conferences, and communication between researchers) might be suited to oversee and influence potential undertakings that could lead to imminent high-risk situations?

By the way, I noticed from my server logs that the Institute for Defense Analyses seems to be reading LW. They visited my homepage, referred by my LW profile. So one should think about the consequences of discussing such matters in public, respectively not doing so.

Comment author: Eliezer_Yudkowsky 20 January 2010 05:01:11AM 6 points [-]

Wholly perceived impotence.

Comment author: outlawpoet 16 January 2010 10:20:05PM 1 point [-]

That is an excellent question.

Comment author: Stuart_Armstrong 15 January 2010 10:20:19AM 2 points [-]

That's the justification he gave me: he won't be able to make much of a difference to the subject, so he won't be generating much risk.

Since he's going to do it anyway, I was wondering whether there were safer ways of doing so.

Comment author: Psy-Kosh 15 January 2010 05:16:06AM 3 points [-]

And now for a truly horrible thought:

which is why the hordes of would-be meddling dabblers haven't killed us all already.

I wonder to what extent we've been "saved" so far by anthropics. Okay, that's probably not the dominant effect. I mean, yeah, it's quite clear that AI is, as you note, REALLY hard.

But still, I can't help but wonder just how little or much that's there.

Comment author: cousin_it 18 January 2010 01:48:04PM *  6 points [-]

If you think anthropics has saved us from AI many times, you ought to believe we will likely die soon, because anthropics doesn't constrain the future, only the past. Each passing year without catastrophe should weaken your faith in the anthropic explanation.

Comment author: satt 30 June 2014 09:06:20PM *  1 point [-]

The first sentence seems obviously true to me, the second probably false.

My reasoning: to make observations and update on them, I must continue to exist. Hence I expect to make the same observations & updates whether or not the anthropic explanation is true (because I won't exist to observe and update on AI extinction if it occurs), so observing a "passing year without catastrophe" actually has a likelihood ratio of one, and is not Bayesian evidence for or against the anthropic explanation.

Comment author: Houshalter 01 October 2013 11:40:15PM 1 point [-]

Wouldn't the anthropic argument apply just as much in the future as it does now? The world not being destroyed is the only observable result.

Comment author: [deleted] 02 October 2013 12:50:06AM -1 points [-]

The future hasn't happened yet.

Comment author: Houshalter 02 October 2013 02:32:47AM 0 points [-]

Right. My point was in the future you are still going to say "wow the world hasn't been destroyed yet" even if in 99% of alternate realities it was. cousn_it said:

Each passing year without catastrophe should weaken your faith in the anthropic explanation.

Which shouldn't be true at all.

If you can not observe a catastrophe happen, then not observing a catastrophe is not evidence for any hypothesis.

Comment author: nshepperd 02 October 2013 04:45:22AM 1 point [-]

"Not observing a catastrophe" != "observing a non-catastrophe". If I'm playing russian roulette and I hear a click and survive, I see good reason to take that as extremely strong evidence that there was no bullet in the chamber.

Comment author: Houshalter 02 October 2013 06:19:51AM 1 point [-]

But doesn't the anthropic argument still apply? Worlds where you survive playing russian roulette are going to be ones where there wasn't a bullet in the chamber. You should expect to hear a click when you pull the trigger.

Comment author: nshepperd 02 October 2013 06:24:32AM *  0 points [-]

As it stands, I expect to die (p=1/6) if I play russian roulette. I don't hear a click if I'm dead.

Comment author: Houshalter 02 October 2013 10:18:18PM 1 point [-]

That's the point. You can't observe anything if you are dead, therefore any observations you make are conditional on you being alive.

Comment author: JamesAndrix 14 January 2010 07:53:36PM 1 point [-]

My current toy thinking along these lines is imagining a program that will write a program to solve the towers of hanoi, given only some description of the problem, and do nothing else, using only fixed computational resources for the whole thing.

I think that's safe, and would illustrate useful principles for FAI.

Comment author: Morendil 24 June 2011 11:10:46AM *  0 points [-]

An earlier comment of mine on the Towers of Hanoi. (ETA: I mean earlier relative to the point in time when this thread was resurrected.)

Are you familiar with Hofstadter's work in "microdomains", such as Copycat et al.?

Comment author: JenniferRM 25 June 2010 12:41:13AM 0 points [-]

So.... you want to independently re-invent a prolog compiler?

Comment author: SilasBarta 25 June 2010 01:16:46AM 0 points [-]

What Blueberry said. The page you linked just gives the standard program for solving Towers of Hanoi. What JamesAndrix was imagining was a program that comes up with that solution, given just the description of the problem -- i.e., what the human coder did.

Comment author: aletheilia 24 June 2011 10:49:06AM 0 points [-]

Well, this can actually be done (yes, in Prolog with a few metaprogramming tricks), and it's not really that hard - only very inefficient, i.e. feasible only for relatively small problems. See: Inductive logic programming.

Comment author: JamesAndrix 25 June 2011 08:00:18AM 0 points [-]

No, not learning. And the 'do nothing else' parts can't be left out.

This shouldn't be a general automatic programing method, just something that goes through the motions of solving this one problem. It should already 'know' whatever principles lead to that solution. The outcome should be obvious to the programmer, and I suspect realistically hand-traceable. My goal is a solid understanding of a toy program exactly one meta-level above hanoi.

This does seem like something Prolog could do well, if there is already a static program that does this I'd love to see it.

Comment author: Blueberry 25 June 2010 01:12:07AM 1 point [-]

More like a program that takes

This object of this famous puzzle is to move N disks from the left peg to the right peg using the center peg as an auxiliary holding peg. At no time can a larger disk be placed upon a smaller disk.

as input and returns the Prolog code as output.

Comment author: JGWeissman 14 January 2010 08:14:06PM 0 points [-]

Until you specify the format of a description of the problem, and how the program figures out how to write a program to solve the problem, it is hard to tell if this would be safe.

And if you don't know that it is safe, it isn't. Using some barrier like "fixed computational resources" to contain a non-understood process is a red flag.

Comment author: JamesAndrix 14 January 2010 08:52:35PM 0 points [-]

The format of the description is something I'm struggling with, but I'm not clear how it impacts safety.

How the AI figures things out is up to the human programmer. Part of my intent in this exercise is to constrain the human to solutions they fully understand. In my mind my original description would have ruled out evolving neural nets, but now I see I definitely didn't make that clear.

By 'fixed computational resources' I mean that you've got to write the program such that if it discovers some flaw that gives it access to the internet, it will patch around that access because what it is trying to do is solve the puzzle of (solving the puzzle using only these instructions and these rules and this memory.)

What I'm looking for is a way to work on friendliness using goals that are much simpler than human morality, implemented by minds that are at least comprehensible in their operation, if not outright step-able.

Comment author: Psychohistorian 14 January 2010 06:36:06PM 2 points [-]

This seems rather relevant - and suggests the answer is go watch more TV. Or, at least, I felt it really needed to be linked here, and this gave me the perfect opportunity!

Comment author: arbimote 18 January 2010 11:23:47AM 0 points [-]

Someone actually made a top-level post on this the other day. Just sayin'.

Comment author: RobinZ 18 January 2010 04:30:10PM 1 point [-]

This comment and that post are actually within seventeen minutes of each other. I think Psychohistorian may be forgiven for not noticing dclayh.

Comment author: Psychohistorian 19 January 2010 05:38:36AM 1 point [-]

I think Psychohistorian may be forgiven for not noticing dclayh.

That is odd; I distinctly recall posting this before the top-level.

Comment author: RobinZ 19 January 2010 05:40:52AM *  0 points [-]

That would be an even better excuse.

Edit: It occurs to me that the datestamp may correspond to the writing of a draft, not the time of publication.

Comment author: Vladimir_Nesov 14 January 2010 04:55:34PM *  6 points [-]

For useful-tool AI, learn stuff from statistics and machine learning before making any further moves.

For self-improving AI, just don't do it as AI, FAI is not quite an AI problem, and anyway most techniques associated with "AI" don't work for FAI. Instead, learn fundamental math and computer science, to a good level -- that's my current best in-a-few-words advice for would-be FAI researchers.

Comment author: Wei_Dai 14 January 2010 07:09:28PM 1 point [-]

Isn't every AI potentially a self-improving AI? All it takes is for the AI to come upon the insight "hey, I can build an AI to do my job better." I guess it requires some minimum amount of intelligence for such an insight to become likely, but my point is that one doesn't necessarily have to set out to build a self-improving AI, to actually build a self-improving AI.

Comment author: Morendil 14 January 2010 07:55:03PM 7 points [-]

I'm very much out of touch with the AI scene, but I believe the key distinction is between Artificial General Intelligence, versus specialized approaches like chess-playing programs or systems that drive cars.

A chess program's goal structure is strictly restricted to playing chess, but any AI with the ability to formulate arbitrary sub-goals could potentially stumble on self-improvement as a sub-goal.

Comment author: Wei_Dai 14 January 2010 08:25:36PM *  3 points [-]

Today's specialized AIs have little chance of becoming self-improving, but as as specialized AIs adopt more advanced techniques (like the ones Nesov suggested), the line between specialized AIs and AGIs won't be so clear. After all, chess-playing and car-driving programs can always be implemented as AGIs with very specific and limited super-goals, so I expect that as AGI techniques advance, people working on specialized AIs will also adopt them, but perhaps without giving as much thought about the AI-foom problem.

Comment author: ChristianKl 16 January 2010 06:34:09PM 2 points [-]

I would think that specialization reduces the variant trees that the AI has to consider which makes it unlikely that implenting AGI techniques would help the chess playing program.

Comment author: JGWeissman 14 January 2010 08:03:44PM 6 points [-]

Additionally, the actions that a chess AI can consider and take are limited to moving pieces on a virtual chess board, and the consequences of such actions that it considers are limited to the state of the chess game, with no model of how the outside world affects the opposing moves other than the abstract assumption that the opponent will make the best move available. The chess AI simply does not have any awareness of anything outside the chess game.

Comment author: wedrifid 15 January 2010 07:37:17AM 0 points [-]

with no model of how the outside world affects the opposing moves other than the abstract assumption that the opponent will make the best move available.

A good chess AI would not be so constrained. A history of all chess games played by the particular opponent would be quite useful. As would his psychology

Additionally, the actions that a chess AI can consider and take are limited to moving pieces on a virtual chess board

Is it worth me examining the tree beyond this particular move further? How long will it take me (metacognitive awareness...) relative to my time limit?

The chess AI simply does not have any awareness of anything outside the chess game.

Unless someone gives them such awareness, which may be useful in some situations or may just seem useful to naive developers who get their hands on more GAI research than they can safely handle.

Comment author: ChristianKl 16 January 2010 06:30:42PM *  0 points [-]

A history of all chess games played by the particular opponent would be quite useful.

Such a history would also contain of a list of move on a virtual chess game.

Unless someone gives them such awareness, which may be useful in some situations or may just seem useful to naive developers

If you are very naive it's unlikely that you understand the problem of AI well enough to solve it.

Comment author: thomblake 14 January 2010 04:02:59PM *  2 points [-]

There isn't really a general answer to "how to design a safe AI". It really depends what the AI is used for (and what they mean by AI).

For recursively self-improving AI, you've got your choice of "it's always bad", "You should only do it the SIAI way (and they haven't figured that out yet)", or "It's not a big deal, just use sofware best practices and iterate".

For robots, I've argued in the past that robots need to share our values in order to avoid squashing them, but I haven't seen anyone work this out rigorously. On a different tack altogether, Ron Arkin's Governing Lethal Behavior in Autonomous Robots is excellent and describes in detail how to make military robots that use lethality appropriately. In a household application, it's very difficult to see what sorts of actions might be problematic, but in a military application the main concern is making "aim the gun and fire" only happen when you really want it to.

For video games and the like, there's plenty of literature about the question, but not much to take seriously there.

Comment author: Peter_de_Blanc 14 January 2010 02:07:46PM *  1 point [-]

Try to build an AI that:

  1. Implements a timeless decision theory.
  2. Is able to value things that it does not directly perceive, and in particular cares about other universes.
  3. Has a utility function such that additional resources have diminishing marginal returns.

Such an AI is more likely to participate in trades across universes, possibly with a friendly AI that requests our survival.

[EDIT]: It now occurs to me that an AI that participates in inter-universal trade would also participate in inter-universal terrorism, so I'm no longer confident that my suggestions above are good ones.

Comment author: Blueberry 14 January 2010 05:54:30PM 0 points [-]

Can you please elaborate on "trades across universes"? Do you mean something like quantum civilization suicide, as in Nick Bostrom's paper on that topic?

Comment author: Wei_Dai 14 January 2010 07:05:46PM 1 point [-]

Here's Nesov's elaboration of his trading across possible worlds idea.

Personally, I think it's an interesting idea, but I'm skeptical that it can really work, except maybe in very limited circumstances such as when the trading partners are nearly identical.

Comment author: Blueberry 15 January 2010 05:33:43PM 0 points [-]

Cool, thanks!

Comment author: byrnema 14 January 2010 02:57:26PM *  1 point [-]

(Disclaimer: I don't know anything about AI.)

Is the marginal utility of resources something that you can input? It seems to me that since resources have instrumental value (pretty much, that's what a resource is by definition), their value would be something that has to be outputted by the utility function.

If you tried to input the value of resources, you'd run into difficulties with the meaning of resources. For example, would the AI distinguish "having resources" from "having access to resources" from "having access to the power of having access to resources"? Even if 'having resources' has negative utility for the AI, he might enjoy controlling resources in all kinds of ways in exchange for power to satisfy terminal values.

Even if you define power as a type of resource, and give that negative utility, then you will basically be telling the AI to enjoy not being able to satisfy his terminal values. (But yet, put that way, it does suggest some kind of friendly passive/pacifist philosophy.)

Comment author: Technologos 14 January 2010 03:18:17PM 0 points [-]

There is a difference between giving something negative utility and giving it decreasing marginal utility. It's sufficient to give the AI exponents strictly between zero and one for all terms in a positive polynomial utility function, for instance. That would be effectively "inputting" the marginal utility of resources, given any current state of the world.

Comment author: byrnema 14 January 2010 03:27:43PM *  1 point [-]

There is a difference between giving something negative utility and giving it decreasing marginal utility.

I was considering the least convenient argument, the one that I imagined would result in the least aggressive AI. (I should explain here that I considered that even a 0 terminal utility for the resource itself would not result in 0 utility for that resource, because that resource would have some instrumental value in achieving things of value.)

(Above edited because I don't think I was understood.)

But I think the problem in logic identified with inputting the value of an instrumental value remains either way.

Comment author: Peter_de_Blanc 14 January 2010 08:30:00PM *  0 points [-]

You pretty much have to guess about the marginal value of resources. But let's say the AI's utility function is "10^10th root of # of paperclips in universe." Then it probably satisfies the criterion.

EDIT: even better would be U = 1 if the universe contains at least one paperclip, otherwise 0.

Comment author: CronoDAS 14 January 2010 01:47:11PM 4 points [-]

If you think you have an AI that might improve itself and act on the real world, don't run it.

Comment author: JamesAndrix 14 January 2010 07:44:53PM 1 point [-]

This rules out FAI.

Comment author: ciphergoth 14 January 2010 11:02:58PM 7 points [-]

Sure, this is advice along the lines of "don't design your own cipher".

Only more so.

Comment author: Stuart_Armstrong 15 January 2010 10:23:24AM 0 points [-]

Entertainingly, he's entering the field from mathematical cryptography; so "don't design your own cipher" is precisely the wrong analogy to use here :-)

Comment author: ciphergoth 15 January 2010 10:44:05AM 0 points [-]

"mathematical cryptography"? What other sort of cryptography is there?

Comment author: Stuart_Armstrong 15 January 2010 01:10:38PM 0 points [-]

It used to be the domain of the linguists... But you're correct; nowadays, I'm using mathematical cryptography as a short hand for "y'know, like, real cryptography, not just messing around with symbols to impress you friends".

Comment author: ciphergoth 15 January 2010 01:58:24PM 0 points [-]

Ah, OK!

It's possible in that case that I may actually know your friend, if they happened to touch on some of the same parts of the field as me.

Comment author: Stuart_Armstrong 15 January 2010 02:14:00PM 0 points [-]

No extra clues :-)

Comment author: JamesAndrix 15 January 2010 03:08:05AM 1 point [-]

In general wise, but in this case we need a cipher, don't have any, and will probably be handed a bad one in the future.

Our truisms need to be advice we would want everyone to follow.

Comment author: Nick_Tarleton 15 January 2010 03:30:05AM *  0 points [-]

We should encourage thinking about the intent (incoming) and expected effect (outgoing) of truisms, rather than their literal meaning. If either of the above injunctions actually doesn't apply to you, you'll know it.

Comment author: JamesAndrix 15 January 2010 07:20:42AM 0 points [-]

My concern is you'll also 'know' it doesn't apply to you when it does. People write ciphers all the time.

Comment author: Nick_Tarleton 15 January 2010 03:44:10PM *  0 points [-]

"Then they are fools and nothing can be done about it." In any case, this seems to be the opposite of the concern you were citing before.

Comment author: JamesAndrix 15 January 2010 04:36:03PM 0 points [-]

I reread the thread, leaning towards your position now.

Comment author: JamesAndrix 15 January 2010 04:30:46PM 1 point [-]

If we use truisms that everyone knows have to be ignored by someone, It becomes easier to think they can be ignored by oneself.

Comment author: ciphergoth 15 January 2010 08:34:07AM 3 points [-]

Yes, this is my concern too. However, anyone who posts to a newsgroup saying "I'm about to write my own cipher, any advice" should not do it. The post indicated someone who planned to actually start writing code; that's a definite sign that they shouldn't do it.

Comment author: Stuart_Armstrong 15 January 2010 10:31:30AM 0 points [-]

See the addendum above; "don't do it" isn't likely to work.

Comment author: ciphergoth 15 January 2010 10:45:31AM 0 points [-]

Even though it's unlikely to work, it is still the approach which minimizes risk; even a small reduction in their probability of going ahead will likely be a bigger effect than any other safety advice you can give, and any other advice will act against its efficacy.

Comment author: ciphergoth 14 January 2010 03:22:21PM 4 points [-]

Strike "and act on the real world" - all AIs act on the real world.

Comment author: CronoDAS 14 January 2010 09:59:41PM 2 points [-]

I mean, act on the real world in a way more significant than your typical chess-playing program.

Comment author: whpearson 14 January 2010 11:57:00AM 3 points [-]

If you want to design a complex malleable AI design and have some guarantees about what it will do (rather than just fail in some creative way), think of simple properties you can prove about your code, and then try and prove them using Coq or other theorem proving system.

If you can't think of any properties that you want to hold for your system, think more.