Let’s say you have an idea in mind for how to align an AI with human values.
Go prep a slide with some e-coli, put it under a microscope, and zoom in until you can see four or five cells. Your mission: satisfy the values of those particular e-coli. In particular, walk through whatever method you have in mind for AI alignment. You get to play the role of the AI; with your sophisticated brain, massive computing power, and large-scale resources, hopefully you can satisfy the values of a few simple e-coli cells.
Perhaps you say “this is simple, they just want to maximize reproduction rate.” Ah, but that’s not quite right. That’s optimizing for the goals of the process of evolution, not optimizing for the goals of the godshatter itself. The e-coli has some frozen-in values which have evolved to approximate evolutionary fitness maximization in some environments; your job is optimize for the frozen-in approximation, even in new environments. After all, we don’t want a strong AI optimizing for the reproductive fitness of humans - we want it optimizing for humans’ own values.
On the other hand, perhaps you say “these cells don’t have any consistent values, they’re just executing a few simple hardcoded algorithms.” Well, you know what else doesn’t have consistent values? Humans. Better be able to deal with that somehow.
Perhaps you say “these cells are too simple, they can’t learn/reflect/etc.” Well, chances are humans will have the same issue once the computational burden gets large enough.
This is the problem of AI alignment: we need to both define and optimize for the values of things with limited computational resources and inconsistent values. To see the problem from the AI’s point of view, look through a microscope.
It seems like this example would in some ways work better if the model organism was mice not bacteria because bacteria probably do not even have values to begin with (so inconsistency isn't the issue) nor any internal experience.
With say mice though (though perhaps roundworms might work here, since it's more conceivable that they could actually have preferences) the answer to how to satisfy their values seems almost certainly is just wireheading since they don't have a complex enough mind to have preferences about the world distinct from just their experiences.
So I'm not sure whether this type of approach works because you probably need more intelligent social animals in order for satisfying their preferences to not just be best achieved through wireheading.
Still I suppose this does raise the question of how one might best satisfy the preferences/values of animals like corvids or primates who lack some of the more complex human values but still share the most basic values like being socially validated (and caring about the mental states of other animals; which rules out experience machine like solutions).
While I consider wireheading only marginally better than oblivion the more general issue is the extent to which you can really call something alignment if it leads to behavior that the overwhelming majority of people consider egregious and terrible in every way. It really doesn't make sense to talk to talk about there being a "best" solution here anyway because that basically begs the question with regards to certain moral philosophy.
>I'm also assuming you think if bacteria somehow became as intelligent as humans, they would also ag... (read more)