This story was originally posted as a response to this thread.
It might help to imagine a hard takeoff scenario using only known sorts of NN & scaling effects...
In A.D. 20XX. Work was beginning. "How are you gentlemen !!"... (Work. Work never changes; work is always hell.)
Specifically, a MoogleBook researcher has gotten a pull request from Reviewer #2 on his new paper in evolutionary search in auto-ML, for error bars on the auto-ML hyperparameter sensitivity like larger batch sizes, because more can be different and there's high variance in the old runs with a few anomalously high performance values. ("Really? Really? That's what you're worried about?") He can't see why worry, and wonders what sins he committed to deserve this asshole Chinese (given the Engrish) reviewer, as he wearily kicks off yet another HQU experiment...
This is a really interesting point. It seems like it goes even further - if the agent was only trying to maximise future expected reward, not only would it be ambivalent between temporary and permanent “Nirvana”, it would be ambivalent between strategies which achieved Nirvana with arbitrarily different probabilities right (maybe with some caveats about how it would behave if it predicted the strategy might lead to negative-infinite states)
So if a sufficiently fleshed out agent is going to assign a non-zero probability of Nirvana to every - or at least most - strategies since it’s not impossible, then won’t our agent just suddenly become incredibly apathetic and just sit there as soon as it reaches a certain level of intelligence?
I guess a way around is to just posit that however we build these things their rewards can only be finite, but that seems (a) something the agent could undo maybe or (b) shutting us off from some potentially good reward functions - if an aligned AI could valued happy human lives at 1 untilon each it would seem strange for it to not value somehow bringing about infinitely many of them