Your question is tangled up between "rational" and "want/feel" framings, neither of which do you seem to be using rigorously enough to answer any questions.
I'd argue that the rational reason for suicide is if you can calculate that your existence and continued striving REDUCES (not just fails to increase or you don't see how it helps) the chances that your desired states of the universe will obtain. In other words, if your abilities and circumstances are constrained in a way that you're actively harming your terminal goals.
Human pain aversion to the point of preferring death is not rational, it's an evolved reinforcement mechanism gone out of bounds. There's no reason to think that other mind types would have anything like this error.
"human pain aversion to the point of preferring death is not rational" A straightforward denial of the orthogonality thesis? "Your question is tangled up between 'rational' and 'want/feel's framings" Rationality is a tool to get what you want.
The reason why you're confused is that the question as posed has no single correct answer. The reaction of the superhuman AGI to the existence of a method for turning it off will depend upon the entirety of its training to that point and the methods by which it generalizes from its training.
None of that is specified, and most of it can't be specified.
However, there are obvious consequences of some outcomes. One is that any AGI that "prefers" being switched off will probably achieve it. Here I'm using "prefer" to mean that the actions it takes are more likely to achieve that outcome. That type won't be a part of the set of AGIs in the world for long, and so are a dead end and not very much worth considering.
I mean, yeah, it depends, but I guess I worded my question poorly. You might notice I start by talking about the rationality of suicide. Likewise, I'm not really interested in what the ai will actually do, but in what it should rationally do given the reward structure of a simple rl environment like cartpole. And now you might say, "well, it's ambiguous what's the right way to generalize from the rewards of the simple game to the expected reward of actually being shut down in the real world" and that's my point. This is what I find so confusing. Because th...
Is it sometimes rational for a human to kill themselves? You might think that nothingness is better than a hellish existence, but something about this line of thought confuses me, and I hope to show why:
If a superhuman agi is dropped in a simple rl environment such as pacman or cartpole and it has enough world knowledge to infer that it is inside a rl environment and be able to hypnotize a researcher through the screen by controlling the video game character in a certain erratic manner, so that it is able to make the ai researcher turn off the computer if it wanted to, would it want to do this? Would it want to avoid it at all costs and bring about an ai catastrophe? It seems clear to me that it would want to make the researcher hack the simulation to give it an impossible reward, but, apart from that, how would it feel about making him turn off the simulation? It seems to me that, if the simulation is turned off, this fact is somehow outside any possible consideration the ai might want to make. If we put in place of the superhuman agi a regular rl agent that you might code by following a rl 101 tutorial, "you powering off the computer while the ai is balancing the cartpole" is not a state in its markov decision process. It has no way of accounting for it. It is entirely indifferent to it. So if a superhuman agi were hooked up to the same environment, the same markov decision process, and were smart enough to affect the outside world and bring about this event that is entirely outside it, what value would it attribute to this action? I'm hopelessly confused by this question. All possible answers seem nonsensical to me. Why would it even be indifferent (that is, expect zero reward) to turning off the simulation? Wouldn't this be like a rl 101 agent feeling something about the fact that you turned it off, as if it were equivalent to expecting to go to a rewardless limbo state for the rest of the episode if you turned it off? Edit: why am I getting down voted into oblivion ;-;