There have been a few attempts to reach out to broader audiences in the past, but mostly in very politically/ideologically loaded topics.

After seeing several examples of how little understanding people have about the difficulties in creating a friendly AI, I'm horrified. And I'm not even talking about a farmer on some hidden ranch, but about people who should know about these things, researchers, software developers meddling with AI research, and so on.

What made me write this post, was a highly voted answer on stackexchange.com, which claims that the danger of superhuman AI is a non-issue, and that the only way for an AI to wipe out humanity is if "some insane human wanted that, and told the AI to find a way to do it". And the poster claims to be working in the AI field.

I've also seen a TEDx talk about AIs. The talker didn't even hear about the paperclip maximizer, and the talk was about the dangers presented by the AIs as depicted in the movies, like the Terminator, where an AI "rebels", but we can hope that AIs would not rebel as they cannot feel emotion, so we should hope the events depicted in such movies will not happen, and all we have to do is for ourselves to be ethical and not deliberately write malicious AI, and then everything will be OK.

The sheer and mind-boggling stupidity of this makes me want to scream.

We should find a way to increase public awareness of the difficulty of the problem. The paperclip maximizer should become part of public consciousness, a part of pop culture. Whenever there is a relevant discussion about the topic, we should mention it. We should increase awareness of old fairy tales with a jinn who misinterprets wishes. Whatever it takes to ingrain the importance of these problems into public consciousness.

There are many people graduating every year who've never heard about these problems. Or if they did, they dismiss it as a non-issue, a contradictory thought experiment which can be dismissed without a second though:

A nuclear bomb isn't smart enough to override its programming, either. If such an AI isn't smart enough to understand people do not want to be starved or killed, then it doesn't have a human level of intelligence at any point, does it? The thought experiment is contradictory.

We don't want our future AI researches to start working with such a mentality.

 

What can we do to raise awareness? We don't have the funding to make a movie which becomes a cult classic. We might start downvoting and commenting on the aforementioned stackexchange post, but that would not solve much if anything.



New to LessWrong?

New Comment
39 comments, sorted by Click to highlight new comments since: Today at 11:43 PM

To play devil's advocate is increasing everyone's appreciation of the risk of AI a good idea?

A risky AI implies believing that the AI is powerful. This potential impact of AI is currently under appreciated. We don't have large governmental teams working on it hoovering up all the talent.

Spreading the news of the dangerousness of AI might have the unintended consequence of starting the arms race.

This seems like a crucial consideration.

Pretty sure it is. You have two factors, increasing the awareness of AI risk and of AI specifically. The first is good, the second may be bad but since the set of people caring about AI generally is so much larger, the second is also much less important.

There are roughly 3 actions:

1) Tell no one and work in secret

2) Tell people that are close to working on AGI

3) Tell everyone

Telling everyone has some benefits in maybe getting people that are close to working on AGI that you wouldn't get otherwise and maybe making it more convincing. It might be most efficient as well.

While lots of people care about AI I think establishment is probably still a bit jaded from the hype before the AI winters. I think the number of people who think about artificial general intelligence is a small subset of the number of of people involved in weak AI.

So I think I am less sure than you and I'm going to think about what the second option might look like.

Wow, I hadn't thought of it like this. Maybe if AGI is sufficiently ridiculous in the eyes of world leaders, they won't start an arms race until we've figured out how to align them. Maybe we want the issue to remain largely a laughingstock.

Ok so I'm in the target audience for this. I'm an AI researcher that doesn't take AI risk seriously and doesn't understand the obsession this site has with AI x-risk. But the thing is I've read all the arguments here and I find them unconvincing. They demonstrate a lack of rigor and a naïve under appreciation of the difficulty of making anything work in production at all, much less out smart the human race.

If you want AI people to take you seriously, don't just throw more verbiage at them. There is enough of that already. Show them working code. Not friendly AI code -- they don't give a damn about that -- but an actual evil AI that could conceivably have been created by accident and actually have cataclysmic consequences. Because from where I sit that is a unicorn, and I stopped believing in unicorns a long time ago.

https://blog.openai.com/faulty-reward-functions/

[edit]This probably deserves a longer response. From my perspective, all of the pieces of the argument for AI risk exist individually, but don't yet exist in combination. (If all of the pieces existed in combination, we'd already be dead.) And so when someone says "show me the potential risk," it's unclear which piece they don't believe in yet, or which combination they think won't work.

That is, it seems to me that if you believe 1) AIs will take actions that score well on their reward functions, 2) reward functions might not capture their programmer's true intentions, and 3) AI systems may be given or take control over important systems, then you have enough pieces to include that there is a meaningful risk of adversarial AI with control over important systems. So it seems like you could object to any of 1, 2, or 3, or you could object to the claim that their combination implies that conclusion, or you could object to the implication that this claim is a fair statement of AI risk.

Your argument is modeling AI as a universal optimizer. Actual AGI research (see the proceedings of the AGI conference series) is concerning architectures that are not simple Bayesian optimizers. So it is not at all clear to me that your arguments regarding optimizers transfers to e.g. an OpenCog or MicroPsi or LIDA or Sigma AI. That's why I'm insisting on demonstration using one or more of these practical architectures.

Your argument is modeling AI as a universal optimizer.

I agree that AI that is an universal optimizer will be more likely to be in this camp (especially the 'take control') bit, but I think that isn't necessary. Like, if you put an AI in charge of driving all humans around the country, and the way it's incentivized doesn't accurately reflect what you want, then there's risk of AI misbehavior. The faulty reward functions post above is about an actual AI trained used modern techniques on a simple task that isn't anywhere near a universal optimizer.

The argument that I don't think you buy (but please correct me if I'm wrong) is something like "errors in small narrow settings, like an RL agent maximizing the score instead of maximizing winning the race, suggest that errors are possible in large general settings." There's a further elaboration that goes like "the more computationally powerful the agent, and the larger the possible action space, the harder it is to verify that the agent will not misbehave."

I'm not familiar enough with reports of OpenCog and others in the wild to point at problems that have already manifested; there are a handful of old famous ones with Eurisko. But it should at least be clear that those are vulnerable to adversarial training, right? (That is, if you trained LIDA to minimize some score, or to mimic some harmful behavior, it would do so.) Then the question becomes if you'll ever do that on accident while doing something else deliberately. (Obviously this isn't the only way for things to go wrong, but it seems like a decent path for an existence proof.)

All AIs will "misbehave" -- AI is software, and software is buggy. Human beings are buggy too (see: religious extremism). I see nothing wrong with your first paragraph, and in fact I quite agree with it. But I hope you also agree that AI misbehavior and AI existential risk are qualitatively different. We basically know how to deal with buggy software in safety-critical systems and have tools in place for doing so: insurance, liability laws, regulation, testing regimes, etc. These need to be modified as AI extends its reach into more safety-critical areas -- the ongoing regulatory battles regarding self-driving cars are a case in point -- but fundamentally this is still business as usual.

I'm not sure the relevance (or accuracy) of your 2nd paragraph. In any case it's not the point I'm trying to make. What I am specifically questioning is the viability/likelihood of AI x-risk failure modes -- the kill-all-humans outcomes. An Atari AI flying in circles instead of completing mission objectives? Who cares. Self-driving cars hitting pedestrians or killing their occupants? A real issue, but we solidly know how to deal with that. Advertising optimizers selling whiskey to alcoholics by dumb coloration of search history with sales? Again, a real issue but one that has already been solved. What's different, and what drives most of the AI safety talk is concern over the existential risks, those outcomes that conclude with the extermination or subjugation of the entirety of humanity. And this is where I feel the existing arguments fall flat and that practical demonstrations are required.

What really concerns me is that AI x-risk is used as justification for far more radical extremist activism that hinders progress, ironically for the cause of "saving the world." Also AI x-risk communities are acting to swoop up smart people who might otherwise have contributed to practical AGI work. And even where neither of those issues apply, x-riskers have intentionally sought to dominate discussion time in various forums in an attempt to spread their ideas. It can be extremely annoying and gets in the way of getting real work done, hence my frustration and the reason for making my original post. The topic of this thread is explicitly how to reach out to (aka occupy the time of) AI researchers. As a representative AI researcher, my response is: please don't waste my valuable time and the time of my colleagues until you have something worth looking at. Specifically you could start by learning about the work already being done in the field of AGI and applying your x-risk ideas to that body of knowledge instead of reinventing the wheel (as AI safety people sadly have often done).

AI software will be buggy and have suboptimal outcomes due to that. If your delivery truck get stuck in a literal loop of right turns until it runs out of gas, that's a failure mode we can live with. Existing tools, techniques and tow trucks are sufficient for dealing with those sorts of outcomes. If the dispatcher of that delivery van suddenly decides to hack into defense mainframes and trigger nuclear holocaust, that's a problem of a different order. However at this time it has not been convincingly demonstrated that this is anything other than a Hollywood movie plot. The arguments made for existential risk are weak, and using simplistic, incomputable models of general intelligence rather than the current state of AGI research. It's not clear whether this is a real risk, or just an outcome of including infinities in your modeling -- if you allow infinite computation to iterate over all possibilities, you find some weird solutions; news at 11. So before the OP or anyone else starts bombarding AI researchers with x-risk philosophical arguments, occupying conferences, filling mailboxes, etc., I suggest familiarizing yourself with the field and doing some experimental discovery yourself. Find something people will actually pay attention to: repeatable experimental results.

But I hope you also agree that AI misbehavior and AI existential risk are qualitatively different.

Only in the sense that sufficiently large quantitative differences are qualitative differences. There's not a fundamental difference in motivation between the Soviet manager producing worthless goods that will let them hit quota and the RL agent hitting score balloons instead of trying to win the race. AI existential risk is just AI misbehavior scaled up sufficiently--the same dynamic might cause an AI managing the global health system to cause undesirable and unrecoverable changes to all humans.

And this is where I feel the existing arguments fall flat and that practical demonstrations are required.

It seems to me like our core difference is that I look at a simple system and ask "what will happen when the simple system is replaced by a more powerful system?", and you look at a simple system and ask "how do I replace this with a more powerful system?"

For example, it seems to me possible that someone could write code that is able to reason about code, and then use the resulting program to find security vulnerabilities in important systems, and then take control of those systems. (Say, finding a root exploit in server OSes and using this to hack into banks to steal info or funds.)

I don't think there currently exist programs capable of this; I'm not aware of much that's more complicated than optimizing compilers, or AI 'proofreaders' that detect common programmer mistakes (which hopefully wouldn't be enough!). Demonstrating code that could do that would represent a major advance, and the underlying insights could be retooled to lead to significant progress in other domains. But that it doesn't exist now doesn't mean that it's science fiction that might never come to pass; it just means we have a head start on thinking about how to deal with it.

I think the "non universal optimizer" point is crucial; that really does seem to be a weakness in many of the canonical arguments. And as you point out elsewhere, humans don't seem to be universal optimizers either. What is needed from my epistemic vantage point is either a good argument that the best AGI architectures (best for accomplishing the multi-decadal economic goals of AI builders) will turn out to be close approximations to such optimizers, or else some good evidence of the promise and pitfalls of more likely architectures.

Needless to say, that there are bad arguments for X does not constitute evidence against X.

I think the "non universal optimizer" point is crucial; that really does seem to be a weakness in many of the canonical arguments. And as you point out elsewhere, humans don't seem to be universal optimizers either.

Do you think there's "human risk," in the sense that giving a human power might lead to bad outcomes? If so, then why wouldn't the same apply to AIs that aren't universal optimizers?

It seems to me that one could argue that humans have various negative drives, that we could just not program into the AI, but I think this misses several important points. For example, one negative behavior humans do is 'game the system,' where they ignore the spirit of regulations while following their letter, or use unintended techniques to get high scores. But it seems difficult to build a system that can do any better than its training data without having it fall prey to 'gaming the system.' One needs to not just convey the goal in terms of rewards, but the full concept around what's desired and what's not desired.

I agree that non-universal-optimizers are not necessarily safe. There's a reason I wrote "many" not "all" canonical arguments. In addition to gaming the system, there's also the time honored technique of rewriting the rules. I'm concerned about possible feedback loops. Evolution brought about the values we know and love in a very specific environment. If that context changes while evolution accelerates, I foresee a problem.

Human beings have succeeded so far in not wiping themselves out. The fossil record, as far as we can tell, leaves no trace of technological civilizations that wiped themselves out. So the evidence so far points against existential risk from putting people in positions of power. (It's an aside, but the history of humanity has actually shown that centralizing power actually reduces violence, and that the periods of greatest strife coincide with anarchy, e.g. the invasion of the sea peoples.)

Even that aside, I don't think anyone is seriously considering building an omnipotent overlord AI and putting it in charge of the world, are they? That sounds like an utterly dystopian future I'd want no part in personally. So the question is really will groups of machine intelligences and humans, or more likely humans augmented by machine intelligences do better than baseline humans regarding societal governance and risk. In other words, an environment where no one individual (human or machine) has absolute sovereign control, but rather lives in accordance with the enforced rules of society, even if there are differing distributions of power -- no one and no thing is above the law. I have not, so far, seen any compelling evidence that the situation here with machines is any different than with humans, or that either is qualitatively different from the status quo.

Human beings have succeeded so far in not wiping themselves out. The fossil record, as far as we can tell, leaves no trace of technological civilizations that wiped themselves out.

I don't find that reassuring.

Even that aside, I don't think anyone is seriously considering building an omnipotent overlord AI and putting it in charge of the world, are they? That sounds like an utterly dystopian future I'd want no part in personally.

This seems like a natural consequence of predictable incentives to me. For example, potentially biased and corrupt police get replaced by robocops, who are cheaper and replaceable. As soon as it becomes possible to make an AI manager, I expect companies that use them to start seeing gains relative to companies that don't. And if it works for companies, it seems likely to work for politicians. And...

So the question is really will groups of machine intelligences and humans, or more likely humans augmented by machine intelligences do better than baseline humans regarding societal governance and risk.

I think 'groups of machine intelligences' has connotations that I don't buy. For example, everyone has Siri in their pocket, but there's only one Siri; there won't be a social class of robot doctors, there will just be Docbot, who knows everyone's medical data (and as a result can make huge advances in medical science and quality of treatment). And in that context, it doesn't seem surprising that you might end up with Senatebot that knows everyone's political preferences and writes laws accordingly.

People are likely to take the statement that you are an AI researcher less seriously given that you are commenting from the username2 account. Anyone could have said that, and likely did.

But in any case, no one who has code for an evil AI is going to be showing that to anyone, because convincing people that an evil AI is possible is far less important than preventing people from having access to that code.

They demonstrate a lack of rigor and a naïve under appreciation of the difficulty of making anything work in production at all, much less out smart the human race.

This sounds, like you think you disagree about time-lines. When do you think AGI that's smarter than the human race will be created? What's the probability that it will get created before: 2050, 2070, 2100, 2150, 2200 and 2300?

It was not a point about timelines, but rather the viability of a successful runaway process (vs. one that gets stuck in a silly loop or crashes and burns in a complex environment). It becomes harder to imagine a hard takeoff of an evil AI when every time it goes off the rails it requires intervention of a human debugger to get back on track.

Once an AI reaches human level intelligence and can run multiple instances in parallel it doesn't require a human debugger but can be debugged by another AGI instance.

That what human level AGI is per definition about.

That's like saying a paranoid schizophrenic can solve his problems by performing psychoanalysis against a copy of himself. However I doubt another paranoid schizophrenic would be able to provide very good or effective therapy.

In short you are assuming a working AGI exists to do the debugging, but the setup is that the AGI itself is flawed! Nearly every single engineering project ever demonstrates that things don't work on the first try, and when an engineered thing fails it fails spectacularly. Biology is somewhat unique in its ability to recover from errors, but only specialized categories of errors that it was trained to overcome in its evolutionary environment.

As an engineering professional I find it extremely unlikely that an AI could successfully achieve hard take-off on the first try. So unlikely that it is not even worth thinking about -- LHC creating black holes level of unlikely. When developing AI it would be prudent to seed the simulated environments it is developed and tested inside of with honeypots, and see if it attempts any of the kinds of failure modes x-risk people are worried about. Then and there with an actual engineering prototype would be an appropriate time to consider engineering proactive safeguards. But until then it seems a bit like worrying about aviation safety in the 17th century and then designing a bunch of safety equipment for massive passenger hot air balloons that end up being of zero use in the fixed wing aeroplane days of the 20th century.

However I doubt another paranoid schizophrenic would be able to provide very good or effective therapy.

I don't see a reason for why being a paranoid schizophrenic makes a person unable to lead another person through a CBT process.

As an engineering professional I find it extremely unlikely that an AI could successfully achieve hard take-off on the first try.

The assumption of an AGI achieving hard take-off on the first try is not required for the main arguments about AGI risk being a problem.

The fact that the AGI first doesn't engage in particular harmful action X doesn't imply that if you let it self modify a lot it still doesn't engage in action X.

We are clearly talking past each other and I've lost the will to engage further, sorry.

I'm sure the first pocket calculator was quite difficult to make work "in production", but nonetheless once created, it vastly outperformed humans in arithmetic tasks. Are you willing to bet our future on the idea that AI development won't have similar discontinuities?

Also, did you read Superintelligence?

It was a long time from the abacus until the electronic pocket calculator. Even for programmable machines Babbage and Lovelace predate implementation by the better part of a century. You can prove a point in a toy environment long before the complexity of supported environments reaches that of the real world.

Yes, I read and walked away unconvinced from the same old tired, hand-wavey arguments of Superintelligence. All my criticisms above apply as much to Bostrom as the LW AI x-risk community that gave birth or at least a base platform to him.

You describe the arguments of AI safety advocates as being handwavey and lacking rigor. Do you believe you have arguments for why AI safety should not be a concern that are more rigorous? If not, do you think there's a reason why we should privilege your position?

Most of the arguments I've heard from you are arguments that AI is going to progress slowly. I haven't heard arguments from AI safety advocates that AI will progress quickly, so I'm not sure there is a disagreement. I've heard arguments that AI may progress quickly, but a few anecdotes about instances of slow progress strike me as a pretty handwavey/non-rigorous response. I could just as easily provide anecdotes of unexpectedly quick progress (e.g. AIs able to beat humans at Go arrived ~10 years ahead of schedule). Note that the claim you are going for is a substantially stronger one than the one I hear from AI safety folks: you're saying that we can be confident that things will play out in one particular way, and AI safety people say that we should be prepared for the possibility that things play out in a variety of different ways.

FWIW, I'm pretty sure Bostrom's thinking on AI predates Less Wrong by quite a bit.

I don't like the precautionary principle either, but reversed stupidity is not intelligence.

"Do you think there's a reason why we should privilege your position" was probably a bad question to ask because people can argue forever about which side "should" have the burden of proof without actually making progress resolving a disagreement. A statement like

The burden of proof therefore belongs to those who propose restrictive measures.

...is not one that we can demonstrate to be true or false through some experiment or deductive argument. When a bunch of transhumanists get together to talk about the precautionary principle, it's unsurprising that they'll come up with something that embeds the opposite set of values.

BTW, what specific restrictive measures do you see the AI safety folks proposing? From Scott Alexander's AI Researchers on AI Risk:

The “skeptic” position seems to be that, although we should probably get a couple of bright people to start working on preliminary aspects of the problem, we shouldn’t panic or start trying to ban AI research.

The “believers”, meanwhile, insist that although we shouldn’t panic or start trying to ban AI research, we should probably get a couple of bright people to start working on preliminary aspects of the problem.

(Control-f 'controversy' in the essay to get more thoughts along the same lines)

Like Max More, I'm a transhumanist. But I'm also a utilitarian. If you are too, maybe we can have a productive discussion where we work from utilitarianism as a shared premise.

As a utilitarian, I find Nick Bostrom's argument for existential risk minimization pretty compelling. Do you have thoughts?

Note Bostrom doesn't necessarily think we should be biased towards slow tech progress:

...instead of thinking about sustainability as is commonly known, as this static concept that has a stable state that we should try to approximate, where we use up no more resources than are regenerated by the natural environment, we need, I think, to think about sustainability in dynamical terms, where instead of reaching a state, we try to enter and stay on a trajectory that is indefinitely sustainable in the sense that we can contain it to travel on that trajectory indefinitely and it leads in a good direction.

http://www.stafforini.com/blog/bostrom/

So speaking from a utilitarian perspective, I don't see good reasons to have a strong pro-tech prior or a strong anti-tech prior. Tech has brought us both disease reduction and nuclear weapons.

Predicting the future is unsolved in the general case. Nevertheless, I agree with Max More that we should do the best we can, and in fact one of the most serious attempts I know of to forecast AI has come out of the AI safety community: http://aiimpacts.org/ Do you know of any comparable effort being made by people unconcerned with AI safety?

I'm not a utilitarian. Sorry to be so succinct in reply to what was obviously a well written and thoughtful comment, but I don't have much to say with respect to utilitarian arguments over AI x-risk because I never think about such things.

Regarding your final points, I think the argument can be convincingly made -- and has been made by Steven Pinker and others -- that technology has overwhelmingly been beneficial to the people of this planet Earth in reducing per-capita disease & violence. Technology has for the most part cured disease, not "brought it", and nuclear weapons have kept conflicts localized in scale since 1945. There's been some horrors since WW2, to be sure, but nothing on the scale of either the 1st or 2nd world war, at least not in global conflict among countries allied with adversarial nuclear powers. Nuclear weapons have probably saved far more lives in the generations that followed than the combined populations of Hiroshima and Nagasaki (to say nothing of the lives spared by an early end to that war). Even where technology has been failing us -- climate change, for example -- it is future technology that holds the potential to save us and the sooner we develop it the better.

All things being equal, it is my own personal opinion that the most noble thing a person can do is to push forward the wheels of progress and help us through the grind of leveling up our society as quickly as possible, to relieve pain and suffering and bring greater prosperity to the world's population. And before you say "we don't want to slow progress, we just want some people to focus on x-risk as well" keep in mind that the global pool of talent is limited. This is a zero-sum game where every person working on x-risk is a technical person explicitly not working on advancing technologies (like AI) that will increase standards of living and help solve our global problems. If someone chooses to work on AI x-risk, they are probably qualified to work directly on the hard problems of AI itself. By not working on AI they are incrementally slowing down AI efforts, and therefore delaying access to technology that could save the world.

So here's a utilitarian calculation for you: assume that AGI will allow us to conquer disease and natural death, by virtue of the fact that true AGI removes scarcity of intellectual resources to work on these problems. It's a bit of a naïve view, but I'm asking you to assume it only for the sake of argument. Then every moment someone is working on x-risk problems instead, they are potentially delaying the advent of true AGI by some number of minutes, hours, or days. Multiply that by the number of people who die unnecessary deaths every day -- hundreds of thousands -- and that is the amount of blood on the hands of someone who is capable but chooses not to work on making the technology widely available as quickly as possible. Existential risk can only be justified as a more pressing concern if can be reasonably demonstrated to have a higher probability of causing more deaths than inaction.

The key word there is reasonable. I have too much experience in this world building real things to accept arguments based on guesswork or convoluted philosophy. Show me the code. Demonstrate for me (in a toy but realistic environment) an AI/proto-AGI that turns evil, built using the architectures that are the current focus of research, and give me reasonable technical justification for why we should expect the same properties in larger, more complex environments. Without actual proof I will forever remain unconvinced, because in my experience there are just too many bullshit justifications one can create which pass internal review, and even convince a panel of experts, but fall apart as soon as it tested by reality.

Which brings me to the point I made above: you think you know how AI of the sort people are working on will go evil/non-friendly and destroy the world? Well go build one in a box and write a paper about it. But until you actually do that, and show me a replicatable experiment, I'm really not interested. I'll go back to setting an ignore bit on all this AI x-risk nonsense and keep pushing the wheel of progress forward before that body count rises too far.

This is a zero-sum game where every person working on x-risk is a technical person explicitly not working on advancing technologies (like AI) that will increase standards of living and help solve our global problems. If someone chooses to work on AI x-risk, they are probably qualified to work directly on the hard problems of AI itself. By not working on AI they are incrementally slowing down AI efforts, and therefore delaying access to technology that could save the world.

I wouldn't worry much about this, because the financial incentives to advance AI are much stronger than the ones to work on AI safety. AI safety work is just a blip compared to AI advancement work.

So here's a utilitarian calculation for you: assume that AGI will allow us to conquer disease and natural death, by virtue of the fact that true AGI removes scarcity of intellectual resources to work on these problems. It's a bit of a naïve view, but I'm asking you to assume it only for the sake of argument. Then every moment someone is working on x-risk problems instead, they are potentially delaying the advent of true AGI by some number of minutes, hours, or days. Multiply that by the number of people who die unnecessary deaths every day -- hundreds of thousands -- and that is the amount of blood on the hands of someone who is capable but chooses not to work on making the technology widely available as quickly as possible. Existential risk can only be justified as a more pressing concern if can be reasonably demonstrated to have a higher probability of causing more deaths than inaction.

You should really read Astronomical Waste before you try to make this kind of quasi-utilitarian argument about x-risk :)

Show me the code. Demonstrate for me (in a toy but realistic environment) an AI/proto-AGI that turns evil, built using the architectures that are the current focus of research, and give me reasonable technical justification for why we should expect the same properties in larger, more complex environments.

What do you think of this example?

https://www.facebook.com/jesse.newton.37/posts/776177951574

(I'm sure there are better examples to be found, I'm just trying to figure out what you are looking for.)

I've read Astronomical Waste. There's some good ideas in it, but I simply don't buy the premise that "potential lives" are comparable to existing lives. In utilitarian terms I suppose I value potential lives at zero.

Regarding the poopy Roomba, that's not anything close to resembling an AGI. Dumb mechanical algorithms follow dumb mechanical algorithms. There's nothing really interesting to be learned there. But even if you take it as an example at face value, it was relatively simple for its owner to capture, turn off, and clean up. Exaggeration aside, this Roomba would not actually start WW3 in an attempt to eliminate the threat posed by humans to its own survival.

By AGI in a toy environment I mean an actual general-purpose problem solver using one of the many existing AGI architectures, but placed in a simplified, simulated environment. I want ot see a demonstration that the sort of wacky failure modes discussed here and in Superintelligence actually occur on real architectures in non-contrived environments. Does the AI really attempt to hack its way out of the matrix and forceably upload its creators instead of simply asking for clarification? Is it really the case that the Omohundro drives emerge causing the AI to seek self-preservation at all costs?

These CAN be safely tested by constructing toy environments designed to mimic a simplified version of reality, with carefully placed honeypots that are unrelated to the AI's direct goals but plausibly provide mechanisms for escape but instead trap without warning when activated. I would consider even that an extreme level of paranoia since the simplest safe measure is to run the AI at slow enough speed and computational resources that the experimenters can observe and understand what is going on.

My basic objection is that all of this AI x-risk theory is based on super simplified models of AI, e.g. universal bayesian optimizers with infinite computing resources. Real general intelligences are not even approximations of this abstract model. Real intelligence architectures, including the human brain, are amalgams of special purpose heuristic engines, knowledge representation, and problem solving that can only kinda-sorta in some situations be approximated by universal optimizers but in fact fundamentally work quite differently for a variety of reasons. And in the human mind, for example, it is these recursive webs of heuristics and memory combined with a few instinctual responses and the experience of embodiment to give rise to learned morality. So what is a real AGI architecture likely to behave like -- the cool and calculating hyper-rational universal optimizer, or the bumbling learn-by-trial-and-error of a human child? It depends on the architecture! And a lot of the AI x-risk concerns don't really apply in the latter case.

TL;DR: I want to see actual AIs implemented using current thinking re: AGI architectures, given the chance to make decisions in a toy environment that is simple but not what their special purpose components were designed to work in (so general intelligence needs to be engaged), and see whether they actually enter into the sorts of failure modes AI x-risk people worry about. I suspect they will not, but remain open to the possibility they will if only it can be demonstrated under repeatable experimental conditions.

I just saw this link, maybe you have thoughts?

(Let's move subsequent discussion over there)

Earlier in this thread:

Specifically you could start by learning about the work already being done in the field of AGI and applying your x-risk ideas to that body of knowledge instead of reinventing the wheel (as AI safety people sadly have often done).

The "reinventing the wheel" I was referencing was the work based on AIXI as a general intelligence algorithm. AIXI does not scale. It is to AGI what photon mapping is to real-time rendering. It is already well known that AIXI will result in all sorts of wireheading like behavior. Yet the proofs of this are heavily dependent on the AIXI architecture, and hence my first issue: I don't trust that these failure modes apply to other architectures unless they can be independently demonstrated there.

My second issue is what I engaged Vaniver on: these are results showing failure modes where the AI's reward function doesn't result in the desired behavior -- it wireheads instead. That's not a very interesting result. On the face it is basically just saying that AI software can be buggy. Yes software can be buggy, and we know how to deal with that. In my day job I manage a software dev team for safety critical systems. What is really being argued here is that AI has fundamentally different error modes than regular safety-critical software, because the AI could end up acting adversarialy and optimizing us out of existence, and being successful at it. That, I am arguing, is both an unjustified cognitive leap and not demonstrated by the examples here.

I replied here because I don't think I really have more to say on the topic beyond this one post.

We should increase awareness of old fairy tales with a jinn who misinterprets wishes.

The most popular UFAI story I'm aware of is "The Sorcerer's Apprentice".

Sticking with European folktales that were made into classic Disney cartoons, maybe the analogy to be made is "AI isn't Pinocchio. It's Mickey's enchanted brooms. It doesn't want to be a Real Boy; it just wants to carry water. The danger isn't that it will grow up to be a naughty boy if it doesn't listen to its conscience. It's that it cannot care about anything other than carrying water; including whether or not it's flooding your home."

Thing is, much of the popular audience doesn't really know what code is. They've never written a bug and had a program do something unintended ... because they've never written any code at all. They've certainly never written a virus or worm, or even a script that accidentally overwrites their files with zeroes. They may have issued a bad order to a computer ("Oops, I shouldn't have sent that email!") but they've never composed and run a non-obviously bad set of instructions.

So, aside from folklore, better CS education may be part of the story here.

We don't have the funding to make a movie which becomes a cult classic.

Maybe? Surely we don't have to do the whole thing ourselves, right -- AI movies are hip now, probably we don't need to fund a whole movie ourselves. Could we promote "creation of fiction that sends a useful message" as an Effective Career? :-)

Not a reply to you per se, but further commentary on the quoted text: isn't that what the move Transcendence starring Johnny Depp and Rebecca Hall is? What would yet another movie provide that the first one did not?

The x-risk issues that have been successfully integrated into public awareness, like the threat of nuclear war, had extensive and prolonged PR campaigns, support from a huge number of well-known scientists and philosophers, and had the benefit of the fact that there was plenty of recorded evidence of nuclear explosions and the destruction of Hiroshima/Nagasaki. There are few things that can hit harder emotionally than seeing innocent civilians and children suffering due to radiation poisoning. That, and the Cold War was a continuous aspect of many people's lives for decades.

With AI, it seems like it would have to be pretty advanced before it would be powerful enough to affect enough people's lives in equivalently dramatic ways. I don't think we're quite there yet. However, the good news is that many of the top scientists in the field are now taking AI risk more seriously, which seems to have coincided with fairly dramatic improvements in AI performance. My guess is that this will continue as more breakthroughs are made (and I am fairly confident that we're still in the "low hanging fruit" stage of AI research). A couple more "AlphaGo"-level breakthroughs might be enough to permanently change the mainstream thought on the issue. Surprisingly, there still seems to be a lot of people who say "AI will never be able to do X", or "AGI is still hundreds or thousands of years off", and I can't say for sure what exactly would convince these people otherwise, but I'm sure there's some task out there that would really surprise them if they saw an AI do it.

I whole-heartedly agree with you, but I don't have anything better than "tell everyone you know about it." On that topic, what do you think is the best link to send to people? I use this, but it's not ideal.

This is the exact topic I'm thinking a lot about, thanks for the link! I've wrote my own essay for a general audience but it seems ineffective. I knew about the Wait but why blog post, but there must be better approaches possible. What I find hard to understand is that there have been multiple best-selling books about the topic, but still no general alarm is raised and the topic is not discussed in e.g. politics. I would be interested in why this paradox exists, and also how to fix it.

Is there any more information about reaching out to a general audience on Lesswrong? I've not been able to find it using the search function etc.

The reason I'm interested is twofold:

1) If we convince a general audience that we face an important and understudied issue, I expect them to fund research into it several orders of magnitude more generously, which should help enormously in reducing the X-risk (I'm not working in the field myself).

2) If we convince a general audience that we face an important and understudied issue, they may convince governing bodies to regulate, which I think would be wise.

I've heard the following counterarguments before, but didn't find them convincing. If someone would want to convince me that convincing the public about AGI risk is not a good idea, these are places to start:

1) General audiences might start pressing for regulation which could delay AI research in general and/or AGI. That's true and indeed a real problem, since all the potential positive aspects of AI/AGI (which may be enormous) cannot be applied yet. However, in my opinion the argument is not sufficient because:

A) AGI existential risk is so high and important that reducing it is more important than AI/AGI delay, and

B) Increased knowledge of AGI will also increase general AI interest, and this effect could outweigh the delay that regulation might cause.

2) AGI worries from the general public could make AI researchers more secretive and less cooperative in working together with AI Safety research. My problem with this argument is the alternative: I think currently, without e.g. politicians discussing this issue, the investments in AI Safety are far too small to have a realistic shot at actually solving the issue timely. Finally, AI Safety may well not be solvable at all, in which case regulation gets more important.

Would be super to read your views and get more information!