All of joebiden's Comments + Replies

I kind of feel like it’s the opposite, people actually do anchor their imagination about the future on science fiction & this is part of the problem here. Lots of science fiction features a world with a bunch of human-level AIs walking around but where humans are still in comfortably in charge and non-obsolete, even though it’s hard to argue for why this would actually happen.

2Karl von Wendt
Yes, that's also true: There is always a lonely hero who in the end puts the AGI back into the box or destroys it. Nothing would be more boring than writing a novel about how in reality the AGI just kills everyone and wins. :( I think both is possible - that people imagine the wrong future and at the same time don't take it seriously.

The MIT AI-futurists (Moravec/Minsky/Kurzweil) believed that AI would be our "mind children", absorbing our culture and beliefs by default

At this stage, this doesn’t seem obviously wrong,. If you think that the path from AGI will come via LLM extension rather than experiencing the world in an RL regime, it will only have our cultural output to make sense of the world.

I like “rogue AI” over “uncontrollable AI” because you could substitute a five syllable word for a one syllable word, but otherwise I agree.

Also my experience in talking with people about this topic is that most ”normies” find AI scary & would prefer it not be developed, but for whatever reason the argument for a singularity or intelligence explosion in which human-level artificial intelligence is expected to rapidly yield superhuman AGI is unconvincing or silly-seeming to most people outside this bubble, including technical people. I’m not really sure why.

3Karl von Wendt
That's what I have experienced as well. I think one reason is that people find it difficult to imagine exponential growth - it's not something our brains are made for. If we think about the future, we intuitively look at the past and project a linear trend we seem to recognize.  I also think that if something is a frequent topic in science fiction books and movies, people see it as less likely to become real, so we SF writers may actually make it more difficult to think clearly about the future, even though sometimes developers are inspired by SF. Most of the time, people realize only in hindsight that some SF scenarios may actually come true. I think it's amazing how fast we go from "I don't believe that will ever be possible" to "that's just normal". I remember buying my first laptop computer with a color display in the nineties. If someone had told me that not much more than ten years later there would be an iPhone with the computing power of a supercomputer in my pocket, I'd have shaken my head in disbelief.

I've read a lot of the doomer content on here about AGI and am still unconvinced that alignment seems difficult-by-default. I think if you generalize from the way humans are "aligned",  the prospect of aligning an AGI well looks pretty good. The pessimistic views on this seem to all come to the opposite conclusion by arguing "evolution failed to align humans, by its own standards". However

  • Evolution isn't an agent attempting to align humans, or even a concrete active force acting on humans, instead it is merely the effect of a repeatedly applied filter
... (read more)
1Aaron_Scher
My understanding of deep learning is that training is also roughly the repeated application of a filter. The filter is some loss function (or, potentially the LLM evaluators like you suggest) which repeatedly selects for a set of model weights that perform well according to that function, similar to how natural selection selects for individuals who are relatively fit. Humans designing ML systems can be careful about how to craft our loss functions, rather than arbitrary environmental factors determining what "fitness" means, but this does not guarantee that the models produced by this process actually do what we want. See inner misalignment for why models might not do what we want even if we put real effort into trying to get them to. Even working in the analogy you propose, we have problems. Parents raising their kids often fail to instill important ideas they want to (many kids raised in extremely religious households later convert away).
9jacob_cannell
Which is just blatantly ridiculous; the human population of nearly 10B vs a few M for other primates is one of evolution's greatest successes - by its own standards of inclusive genetic fitness. Evolution solved alignment on two levels: intra-aligning brains with the goal of inclusive fitness (massively successful), and also inter-aligning the disposable soma brains to distributed shared kin gins via altruism.

These examples seem like capabilities failures rather than alignment failures. Reading them doesn’t make me feel any more convinced that there will be rebellious AI, accidental paperclip maximizers, deceptive alignment, etc.

In the first example, the environment the AI is in suddenly changes, and the AI is not given the capability to learn and adapt to this change. So of course it fails.

In the second example, the AI is given the ability to continuously learn and adapt, and in this case, it actually succeeds at the intended goal. It almost depopulates the tr... (read more)