If Evolution was allowed to continue undisturbed, it could, conceivably, one day, produce a pure inclusive genetic fitness maximizer by reworking our base desires. So the path would have been:
First replicator -> first brains -> base desires (roughly first reinforcement learners) -> first unaligned (by Evolution's standards) consequentialists -> aligned consequentialists by reworking base desires -> Evolution cannot optimize further because it reaches the global optimum or at least a very steep local optimum.
Does that mean that with sufficiently long training time and being careful not to produce some agent that gets stuck with an imperfectly aligned goal mid-training, we could also create, artificially, a perfectly aligned (to some goal specified by us) consequentialist?
How likely are these things to happen? Both during Evolution and during training of an ML model?
Thoughts inspired by Thou Art Godshatter.
I think they would become consequentialists smart enough that they could actually act to maximize inclusive genetic fitness. I think Thou Art Godshatter is convincing.
Yeah that's what I would expect.
I doubt that being governed by instincts can outperform a sufficiently smart agent reasoning from scratch, given sufficiently complicated environment. Instincts are just heuristics after all...
Ohhh interesting, I have no idea... it seems plausible that it could happen though!