All of Simulation_Brain's Comments + Replies

I think the main concern is that feed forward nets are used as a component in systems that achieve full AGI. For instance, deepmind's agent systems include a few networks and run a few times before selecting an action. Current networks are more like individual pieces of the human brain, like a visual system and a language system. Putting them together and getting them to choose and pursue goals and subgoals appropriately seems all too plausible.

Now, some people also think that just increasing the size of nets and training data sets will produce AGI, becaus... (read more)

I think those are perfectly good concerns. But they don't seem so likely that they make me want to exterminate humanity to avoid them.

I think you're describing a failure of corrigibility. Which could certainly happen, for the reason you give. But it does seem quite possible (and perhaps likely) that an agentic system will be designed primarily for corrigibility, or alternately, alignment by obedience.

The second seems like a failure of morality. Which could certainly happen. But I see very few people who both enjoy inflicting suffering, and who would continue to enjoy that even given unlimited time and resources to become happy themselves.

You are probably guessing correctly. I'm hoping that whoever gets ahold of aligned AGI will also make it corrigible, and that over time they'll trend toward a similar moral view to that generally held in this community. It doesn't have to be fast.

To be fair, I'm probably pretty biased against the idea that all we can realistically hope for is extinction. The recent [case against AGI alignment](https://www.lesswrong.com/posts/CtXaFo3hikGMWW4C9/the-case-against-ai-alignment) post was the first time I'd seen arguments that strong in that direction. I haven't ... (read more)

1Thane Ruthenis
The problem with that is that "corrigibility" may be a transient feature. As in, you train up a corrigible AI, it starts up very uncertain about which values it should enforce/how it should engage with the world. You give it feedback, and gradually make it more and more certain about some aspects of its behavior, so it can use its own judgement instead of constantly querying you. Eventually, you lock in some understanding of how it should extrapolate your values, and then the "corrigibility" phase is past and it just goes to rearrange reality to your preferences. And my concern, here, is that in the rearranged reality, there may not be any places for you to change your mind. Like, say you really hate people from Category A, and tell the AI to make them suffer eternally. Do you then visit their hell to gloat? Probably not: you're just happy knowing they're suffering in the abstract. Or maybe you do visit, and see the warped visages of these monsters with no humanity left in them, and think that yeah, that seems just and good.

Yes. But that seems awfully unlikely to me. What would it need to be, two years from now? AI hype is going to keep ramping up as chatGPT and its successors are more widely used and improved.

If the odds of slipping it by governments and miltaries is slight, wouldn't the conclusion be the opposite - we should spread understanding of AGI alignment issues so that those in power have thought about them by the time they appropriate the leading projects?

This strikes me as a really practically important question. I personally may be rearranging my future based on ... (read more)

2Thane Ruthenis
The original post has been arguing that this leads to a hyperexistential catastrophe, and it's better to let them destroy everything, if they are to win the race. But you have a different model implied here: Can you describe in more detail how you picture this going? I think I can guess, and I have objections to that vision, but I'd prefer if you outline it first.

I think there's a possibility that their lives, or some of them, are vastly worse than death. See the recent post the case against value alignment for some pretty convincing concerns.

I totally agree with the core logic. I've been refraining from spreading these ideas, as much as I want to.

Here's the problem: Do you really think the whole government and military complex is dumb enough to miss this logic, right up to successful AGI? You don't think they'll roll in and nationalize the efforts when the power of AI keeps on progressively freaking people out more and more?

I think a lot of folks in the military are a lot smarter than you give them credit for. Or the issue will become much more obvious than you assume, as we get closer to gene... (read more)

2Thane Ruthenis
If the timelines are sufficiently short, and the takeoff sufficiently hard, they may not have time to update. (If they haven't already, that is.)

Really? Can you say a little more about why you think you have that value? I guess I'm not convinced that it's really a terminal value if it varies so widely across people of otherwise similar beliefs. Presumably that's what lalartu meant as well, but I just don't get it. I like myself, so I'd like more of myself in the world!

0DefectiveAlgorithm
I think a big part of it is that I don't really care about other people except instrumentally. I care terminally about myself, but only because I experience my own thoughts and feelings first-hand. If I knew I were going to be branched, then I'd care about both copies in advance as both are valid continuations of my current sensory stream. However, once the branch had taken place, both copies would immediately stop caring about the other (although I expect they would still practice altruistic behavior towards each other for decision-theoretic reasons). I suspect this has also influenced my sense of morality: I've never been attracted to total utilitarianism, as I've never been able to see why the existence of X people should be considered superior to the existence of Y < X equally satisfied people. So yeah, that's part of it, but not all of it (if that were the extent of it, I'd be indifferent to the existence of copies, not opposed to it). The rest is hard to put into words, and I suspect that even were I to succeed in doing so I'd only have succeeded in manufacturing a verbal rationalization. Part of it is instrumental, each copy would be a potential competitor, but that's insufficient to explain my feelings on the matter. This wouldn't be applicable to, say, the Many-Worlds Interpretation of quantum mechanics, and yet I'm still bothered by that interpretation as it implies constant branching of my identity. So in the end, I think that I can't offer a verbal justification for this preference precisely because it's a terminal preference.

Perhaps you're thinking of the dopamine spike when reward is actually given? I had thought the predictive spike was purely proportional to the odds of success and the amount of reward- which would indeed change with boring tasks, but not in any linear way. If you're right about that basic structure of the predictive spike I should know about it for my research; can you give a reference?

0donjoe
Well, the relationship Sapolsky described wasn't linear, it was more like a bell curve. And no, he doesn't cite any particular study in that lecture, so all I have is his word on this one. I guess you could just ask him. :)

Less Wrong seems like the ideal community to think up better reputation systems. Doctorow's Whuffie is reasonably well-thought-out, but intended for a post-scarcity economy; but its ideas of distinguishing right-handed (people who agree with you) from left-handed (from people who generally don't agree with you) reputations seems like one useful ingredient. Reducing the influence of those who tend to vote together seems like another potential win.

I like to imagine a face-based system; snap an image from a smartphone, and access reputation.

I hope to see more discussion, in particular, VAuroch's suggestion.

I think the example is weak; the software was not that dangerous, the researchers were idiots who broke a vial they knew was insanely dangerous.

I think it dilutes the argument to broaden it to software in general; it could be very dangerous under exactly those circumstances (with terrible physical safety measures), but the dangers of superhuman AGI are vastly larger IMHO and deserve to remain the focus, particularly of the ultra-reduced bullet points.

I think this is as crisp and convincing a summary as I've ever seen; nice work! I also liked the book, but condensing it even further is a great idea.

3[anonymous]
As a side note, I was more convinced by my example at the time, but on rereading this I realized that I wasn't properly remembering how poorly I had expressed the context that substantially weakened the argument (The researchers accidentally breaking the vial.) Which actually identifies a simpler rhetoric improvement method. Have someone tell you (or pretend to have someone tell you) that you're wrong and then reread your original point again, since rereading it when under the impression that you screwed up will give you a fresh perspective on it compared to when you are writing it. I should take this as evidence that I need to do that more often on my posts.

"Pleased to meet you! Soooo... how is YOUR originating species doing?..."

That actually seems like an extremely reasonable question for the first interstellar meeting of superhuman AIs.

I disagree with EY on this one (I rarely do). I don't think it's so likely as to ensure rationally acting Friendly, but I do think that the possibility of encountering an equally powerful AI, and one with a headstart on resource acquisition, shouldn't be dismissed by a rational actor.

I'm game. These are some of my favorite topics. I do computational cognitive neuroscience, and my principal concern with it is how it can/will be used to build minds.

0fowlertm
Head over to meetup.com and search for AI and Existential Risk, then join the group. We just had our inaugural meeting.

I may be confused, but it seems to me that the issue in generalizing from decision utility to utilitarian utility simply comes down to making an assumption allowing utilities among different people to be compared- to put them on the same scale. I think there's a pretty strong argument that we can do so, springing from the fact that we all are running essentially the same neural hardware. Whatever experiential value is, it's made of patterns of neural firing, and we all have basically the same patterns. While we don't run our brains exactly the same, the ... (read more)

0blacktrance
That's a big leap. Why would weighing the quality of our own experiences more highly mean that there's no objective ethics?

I'm out of town or I'd be there. Hope to catch the next one.

Wow, I feel for you. I wish you good luck and good analysis.

4ialdabaoth
nod on an individual level, I appreciate the feels. In my case, I know computer programming, and I've just this week managed to claw my way out of five years of unemployment and back into a reasonably well-paying career job, so I should have access to the necessary resources shortly. But remember that many, many people do not. As EY keeps pointing out, the world is hideously unfair, and there are all sorts of completely random and harsh events that can cause otherwise intelligent and creative and "deserving" people to fail to live up to their potential, or even permanently lose a portion of that potential. (Or, in the case of death, ALL of that potential.) If we really want to see a world that is less crazy, those of us who have the power to might consider ways to build environments that don't throw people into such destructive, irrational feedback loops. "Here's how people who don't suck behave" is less useful for that than "here's what environments look like that don't make people who suck as often."

Ha- I was there the week prior. I hope this is going to happen again. Note also that I'm re-launching a defunct Singularity meetup group for boulder/broomfield if anyone is interested.

Sorry I missed it. I hope there will be more Boulder LW meetups?

Given how many underpaid science writers are out there, I'd have to say that ~50k/year would probably do it for a pretty good one, especially given the 'good cause' bonus to happiness that any qualified individual would understand and value. But is even 1k/week in donations realistic? What are the page view numbers? I'd pay $5 for a good article on a valuable topic; how many others would as well? I suspect the numbers don't add up, but I don't even have an order-of-magnitude estimate on current or potential readers, so I can't myself say.

0somervta
You need not only a good science writer, but one who either already groks the problem, or can be made to do so with a quick explanation. Furthermore, they need to have the above qualifications without being capable of doing primary research on the problem (this is the issue with Eliezer - he would certainly be capable of doing it, but his time is better spent elsewhere.)

Upvoted; the issue of FAI itself is more interesting than whether Eliezer is making an ass of himself and thereby the SIAI message (probably a bit; claiming you're smart isn't really smart, but then he's also doing a pretty good job as publicist).

One form of productive self-doubt is to have the LW community critically examine Eliezer's central claims. Two of my attempted simplifications of those claims are posted here and here on related threads.

Those posts don't really address whether strong AI feasible; I think most AI researchers agree that it will bec... (read more)

Not sure what you mean about by 1), but certainly, recurrent neural nets are more powerful. 2) is no longer true; see for example the GeneRec algorithm. It does something much like backpropagation, but with no derivatives explicitly calculated, there's no concern with recurrent loops.

On the whole, neural net research has slowed dramatically based on the common view you've expressed; but progress continues apace, and they are not far behind cutting edge vision and speech processing algorithms, while working much more like the brain does.

0JoshuaZ
Thanks. GeneRec sounds very interesting. Will take a look. Regarding 1, I was thinking of something like the theorems in chapter 9 in Perceptrons which shows that there are strong limits on what topological features of input a non-recursive neural net can recognize.

I think this is an excellent question. I'm hoping it leads to more actual discussion of the possible timeline of GAI.

Here's my answer, important points first, and not quite as briefly as I'd hoped.

1) even if uFAI isn't the biggest existential risk, the very low investment and interest in it might make it the best marginal value for investment of time or money. As someone noted, having at least a few people thinking about the risk far in advance seems like a great strategy if the risk is unknown.

2) No one but SIAI is taking donations to mitigate the risk ... (read more)

5Wei Dai
See Organizations formed to prevent or mitigate existential risks. (FHI isn't listed there for some reason.) Besides FHI, I know at least Lifeboat Foundation is also taking donations. They endorse SIAI, but have their separate plans.

I work in this field, and was under approximately the opposite impression; that voice and visual recognition are rapidly approaching human levels. If I'm wrong and there are sharp limits, I'd like to know. Thanks!

0JoshuaZ
Thanks, it always is good to actually have input from people who work in a given field. So please correct me if I'm wrong but I'm under the impression that 1) neutral networks cannot in general detect connected components unless the network has some form of recursion. 2) No one knows how to make a neural network with recursion learn in any effective, marginally predictable fashion. This is the sort of thing I was thinking of. Am I wrong about 1 or 2?
3timtyler
Machine intelligence has surpassed "human level" in a number of narrow domains. Already, humans can't manipulate enough data to do anything remotely like a search engine or a stockbot can do. The claim seems to be that in narrow domains there are often domain-specific "tricks" - that wind up not having much to do with general intelligence - e.g. see chess and go. This seems true - but narrow projects often broaden out. Search engines and stockbots really need to read and understand the web. The pressure to develop general intelligence in those domains seems pretty strong. Those who make a big deal about the distinction between their projects and "mere" expert systems are probably mostly trying to market their projects before they are really experts at anything. One of my videos discusses the issue of whether the path to superintelligent machines will be "broad" or "narrow": http://alife.co.uk/essays/on_general_machine_intelligence_strategies/

Now this is an interesting thought. Even a satisficer with several goals but no upper bound on each will use all available matter on the mix of goals it's working towards. But a limited goal (make money for GiantCo, unless you reach one trillion, then stop) seems as though it would be less dangerous. I can't remember this coming up in Eliezer's CFAI document, but suspect it's in there with holes poked in its reliability.

I think the concern stands even without a FOOM; if AI gets a good bit smarter than us, however that happens (design plus learning, or self-improvement), it's going to do whatever it wants.

As for your "ideal Bayesian" intuition, I think the challenge is deciding WHAT to apply it to. The amount of computational power needed to apply it to every thing and every concept on earth is truly staggering. There is plenty of room for algorithmic improvement, and it doesn't need to get that good to outwit (and out-engineer) us.

I think there are very good questions in here. Let me try to simplify the logic:

First, the sociological logic: if this is so obviously serious, why is no one else proclaiming it? I think the simple answer is that a) most people haven't considered it deeply and b) someone has to be first in making a fuss. Kurzweil, Stross, and Vinge (to name a few that have thought about it at least a little) seem to acknowledge a real possibility of AI disaster (they don't make probability estimates).

Now to the logical argument itself:

a) We are probably at risk from the... (read more)

0jacob_cannell
I for one largely agree, but a few differences: We've had a strong exponential since the beginning of computing. Thinking that humans create computers is something of a naive anthropocentric viewpoint: humans don't create computers and haven't for decades. Human+computer systems create computers, and the speed of progress is largely constrained by the computational aspects even today (computers increasingly do more of the work, and perhaps already do the majority). To understand this more, read this post from a former intel engineer (and apparently AI lab manager). Enlightening inside knowledge, but for whatever reason he only got up to 7 karma and wandered away. Also, if you plotted out the data points of brain complexity on earth over time, I'm near certain it also follows a strong exponential. The differences between all these exponentials are 'just' constants. I find this dubious, mainly because physics tells us that using all available matter is actually highly unlikely to ever be a very efficient strategy. However, agreed about the potential danger of future hyper-intelligence.
6utilitymonster
I've heard a lot of variations on this theme. They all seem to assume that the AI will be a maximizer rather than a satisficer. I agree the AI could be a maximizer, but don't see that it must be. How much does this risk go away if we give the AI small ambitions?
3kodos96
The only part of the chain of logic that I don't fully grok is the "FOOM" part. Specifically, the recursive self improvement. My intuition tells me that an AGI trying to improve itself by rewriting its own code would encounter diminishing returns after a point - after all, there would seem to be a theoretical minimum number of instructions necessary to implement an ideal Bayesian reasoner. Once the AGI has optimized its code down to that point, what further improvements can it do (in software)? Come up with something better than Bayesianism? Now in your summary here, you seem to downplay the recursive self-improvement part, implying that it would 'help,' but isn't strictly necessary. But my impression from reading Eliezer was that he considers it an integral part of the thesis - as it would seem to be to me as well. Because if the intelligence explosion isn't coming from software self-improvement, then where is it coming from? Moore's Law? That isn't fast enough for a "FOOM", even if intelligence scaled linearly with the hardware you threw at it, which my intuition tells me it probably wouldn't. Now of course this is all just intuition - I haven't done the math, or even put a lot of thought into it. It's just something that doesn't seem obvious to me, and I've never heard a compelling explanation to convince me my intuition is wrong.

I think the point is that not valuing non-interacting copies of oneself might be inconsistent. I suspect it's true; that consistency requires valuing parallel copies of ourselves just as we value future variants of ourselves and so preserve our lives. Our future selves also can't "interact" with our current self.

3Morendil
The poll in the previous post had to do with a hypothetical guarantee to create "extra" (non-interacting) copies. In the situation presented here there is nothing justifying the use of the word "extra", and it seems analogous to quantum-lottery situations that have been discussed previously. I clearly have a reason to want the world to be such that (assuming MWI) as many of my future selves as possible experience a future that I would want to experience. As I have argued previously, the term "copy" is misleading anyway, on top of which the word "extra" was reinforcing the connotations linked to copy-as-backup, where in MWI nothing of the sort is happening. So, I'm still perplexed. Possibly a clack on my part, mind you.

Quality matters if you have a community that's interested in your work; you'll get more "nice job" comments if it IS a nice job.

I don't think the lack of an earth-shattering ka-FOOM changes much of the logic of FAI. Smart enough to take over the world is enough to make human existence way better, or end it entirely.

It's quite tricky to ensure that your superintelligent AI does anything like what you wanted it to. I don't share the intuition that creating a "homeostasis" AI is any easier than an FAI. I think one move Eliezer is making in his "Creating Friendly AI" strategy is to minimize the goals you're trying to give the machine; just CEV.

I think this makes... (read more)

2Mass_Driver
While CEV is an admirably limited goal compared to the goal of immediately bringing about paradise, it still allows the AI to potentially intervene in billions of people's lives. Even if the CEV is muddled enough that the AI wouldn't actually change much for the typical person, the AI is still being asked to 'check' to see what it's supposed to do to everyone. The AI has to have some instructions that give it the power to redistribute most of the Earth's natural resources, because it's possible that the CEV would clearly and immediately call for some major reforms. With that power comes the chance that the power could be used unwisely, which requires tremendously intricate, well-tested, and redundant safeguards. By contrast, a homeostasis or shield AI would never contemplate affecting billions of people; it would only be 'checking' to see whether a few thousand AI researchers are getting too close. It would only need enough resources to, say, shut off the electricity to a lab now and then, or launch an EMP or thermite weapon. It would be given invariant instructions not to seize control of most of Earth's natural resources. That means, at least for some levels of risk-tolerance, that it doesn't need quite as many safeguards, and so it should be easier and faster to design.

Well, yes; it's not straightforward to go from brains to preferences. But for any particular definition of preference, a given brain's "preference" is just a fact about that brain. If this is true, it's important to understanding morality/ethics/volition.

4orthonormal
Hello! You seem to know your way around already, but it doesn't hurt to introduce yourself on the Welcome page...