User Comment Replies

I kind of feel like it’s the opposite, people actually do anchor their imagination about the future on science fiction & this is part of the problem here. Lots of science fiction features a world with a bunch of human-level AIs walking around but where humans are still in comfortably in charge and non-obsolete, even though it’s hard to argue for why this would actually happen.

2Karl von Wendt2y

Yes, that's also true: There is always a lonely hero who in the end puts the AGI back into the box or destroys it. Nothing would be more boring than writing a novel about how in reality the AGI just kills everyone and wins. :( I think both is possible - that people imagine the wrong future and at the same time don't take it seriously.

Natural Categories Update

joebiden2y30

The MIT AI-futurists (Moravec/Minsky/Kurzweil) believed that AI would be our "mind children", absorbing our culture and beliefs by default

At this stage, this doesn’t seem obviously wrong,. If you think that the path from AGI will come via LLM extension rather than experiencing the world in an RL regime, it will only have our cultural output to make sense of the world.

Let’s talk about uncontrollable AI

joebiden2y20

I like “rogue AI” over “uncontrollable AI” because you could substitute a five syllable word for a one syllable word, but otherwise I agree.

Also my experience in talking with people about this topic is that most ”normies” find AI scary & would prefer it not be developed, but for whatever reason the argument for a singularity or intelligence explosion in which human-level artificial intelligence is expected to rapidly yield superhuman AGI is unconvincing or silly-seeming to most people outside this bubble, including technical people. I’m not really sure why.

3Karl von Wendt2y

That's what I have experienced as well. I think one reason is that people find it difficult to imagine exponential growth - it's not something our brains are made for. If we think about the future, we intuitively look at the past and project a linear trend we seem to recognize. I also think that if something is a frequent topic in science fiction books and movies, people see it as less likely to become real, so we SF writers may actually make it more difficult to think clearly about the future, even though sometimes developers are inspired by SF. Most of the time, people realize only in hindsight that some SF scenarios may actually come true. I think it's amazing how fast we go from "I don't believe that will ever be possible" to "that's just normal". I remember buying my first laptop computer with a color display in the nineties. If someone had told me that not much more than ten years later there would be an iPhone with the computing power of a supercomputer in my pocket, I'd have shaken my head in disbelief.

Possible miracles

joebiden2y150

I've read a lot of the doomer content on here about AGI and am still unconvinced that alignment seems difficult-by-default. I think if you generalize from the way humans are "aligned", the prospect of aligning an AGI well looks pretty good. The pessimistic views on this seem to all come to the opposite conclusion by arguing "evolution failed to align humans, by its own standards". However

Evolution isn't an agent attempting to align humans, or even a concrete active force acting on humans, instead it is merely the effect of a repeatedly applied filter

... (read more)

1Aaron_Scher2y

My understanding of deep learning is that training is also roughly the repeated application of a filter. The filter is some loss function (or, potentially the LLM evaluators like you suggest) which repeatedly selects for a set of model weights that perform well according to that function, similar to how natural selection selects for individuals who are relatively fit. Humans designing ML systems can be careful about how to craft our loss functions, rather than arbitrary environmental factors determining what "fitness" means, but this does not guarantee that the models produced by this process actually do what we want. See inner misalignment for why models might not do what we want even if we put real effort into trying to get them to. Even working in the analogy you propose, we have problems. Parents raising their kids often fail to instill important ideas they want to (many kids raised in extremely religious households later convert away).

9jacob_cannell2y

Which is just blatantly ridiculous; the human population of nearly 10B vs a few M for other primates is one of evolution's greatest successes - by its own standards of inclusive genetic fitness. Evolution solved alignment on two levels: intra-aligning brains with the goal of inclusive fitness (massively successful), and also inter-aligning the disposable soma brains to distributed shared kin gins via altruism.

More examples of goal misgeneralization

joebiden2y65

These examples seem like capabilities failures rather than alignment failures. Reading them doesn’t make me feel any more convinced that there will be rebellious AI, accidental paperclip maximizers, deceptive alignment, etc.

In the first example, the environment the AI is in suddenly changes, and the AI is not given the capability to learn and adapt to this change. So of course it fails.

In the second example, the AI is given the ability to continuously learn and adapt, and in this case, it actually succeeds at the intended goal. It almost depopulates the tr... (read more)

LESSWRONG
LW

All of joebiden's Comments + Replies