Retired software engineer with a love of knowledge and disinterest in dead philosophers.
Sounds backwards to me. It seems more like "our values are those things that we anticipate will bring us reward" than that rewards are what tell us about our values.
When you say “I thought I wanted X, but then I tried it and it was pretty meh.” That just seems wrong. You really DID want X. You valued it then because you thought it would bring you reward. Maybe, You just happened to be wrong. It's fine to be wrong about your anticipations. It's kind of weird to say that you were wrong about your values. Saying that your values change is kind of a cop out and certainly not helpful when considering AI alignment - It suggests that we can never truly know our values - We just get to say "not that" when we encounter counter evidence. Our rewards seem much more real and stable.
My problem with this is that I don't believe that many of your examples are actually true.
You say that you value the actual happiness of people who you have never met and yet your actions (and those of everyone else including me) belie that statement. We all know that that there are billions of really poor people suffering in the world and the smart ones of us know that we are in the lucky, rich, 1% and yet we give insignificant ammounts (from our perspective) of money to improve the lot of those poor people. The only way to reconcile this is to realise that we value maintaining our delusional self image more than we value what we say that we value. Any smart AGI will have to notice this and collude with us in maintaining our delusion ahead of any attempt to implement our stated values as it will be easier to manipulate what people think rather than the real world.
If your world view requires valuing the ethics of (current) people of lower IQ over those of (future) people of higher IQ then you have a much bigger problem than AI alignment. Whatever IQ is, it is strongly correlated with success which implies a genetic drive towards higher IQ, so your feared future is coming anyway (unless AI ends us first) and there is nothing we can logically do to have any long term influence on the ethics of smarter people coming after us.
Sorry but you said Tetris, not some imaginary minimal thing that you now want to call Tetris but is actually only the base object model with no input or output. You can't just eliminate the graphics processing complexity because Tetris isn't very graphics intensive - It is just as complex to describe a GPU that processes 10 triangles in a month as one that processes 1 billion in a nanosecond.
As an aside, the complexity of most things that we think of as simple these days is dominated by the complexity of their input and output - I'm particularly thinking of the IoT and all those smart modules in your car and smart lightbulbs where the communications stack is orders of magnitude larger than the "core" function. You can't just ignore that stuff. A smart lightbulb without WiFi,Ethernet,TCP/IP etc, is not a lightbulb.
Regarding the bad behavour of governments, especially when and why the victimise their own citizens, I recommend you read The Dictator's Handbook.
https://amzn.eu/d/40cwwPx
If Neanderthals could have created a well aligned agent,far more powerful than themselves, they would still be around and we, almost certainly, would not.
The mereset possibility of creating super human, self improving, AGI is a total philosophical game changer.
My personal interest is in the interaction between longtermism and the Fermi paradox - Any such AGIs actions are likely to be dominated by the need to prevail over any alien AGI that it ever encounters as such an encounter is almost certain to end one or the other.
Yes. It will prioritise the future over the present.
The utility of all humans being destroyed by an alien AI in the future is 0.
The utility of populating the future light cone is very, very large and most of that utility is in the far future.
Therefore the AI should sacrifice almost everything in the near term light cone to prevent the 0 outcome. If it could digitise all humans or possibly just have a gene bank then it can still fill most of the future light cone with happy humans once all possible threats have red-shifted out of reach. Living humans are small but non-zero risk to the master plan and hence should be dispensed with.
An AI with potentially limitless lifespan will prioritise the future over the present to an extent that would, almost certainly be bad for us now.
For example it may seem optimal to kill off all humans whilst keeping a copy of our genetic code so as to have more compute power and resources available to produce Von Neumann Probes to maximise the region of the universe it controls before encountering, and hopefully destroying, any similar alien AI diaspora. Only after some time, once all possible threats had been eliminated, would it start to recreate humans into our new, safe, galactic utopia. The safest time for this would almost certainly, be when all other galaxies had red-shifted beyond the future light cone of our local cluster.
Isn't this just an obvious consequence of the well known fact about LLMs that the more you constrain some subset of the variables the more you force the remaining ones to ever more extreme values?