I'm a bridge engineer living in Iowa. I like to forecast and write things.
https://www.ryanbeckauthor.com/
https://twitter.com/BeckRyooan
Yeah that's what I'd like to know, would an AI built on a number format that has a default maximum pursue numbers higher than that maximum, or would it be "fulfilled" just by getting its reward number as high as the number format its using allows?
Sorry I'm using informal language, I don't mean it actually "cares" and I'm not trying to anthropomorphize. I mean care in the sense that how does it actually know that its achieving a goal in the world and why would it actually pursue that goal instead of just modifying the signals of its sensors in a way that appears to satisfy its goal.
In the stamp collector example, why would an extremely intelligent AI bother creating all those stamps when its simulations show that if the AI just tweaks its own software or hardware it can make the signals it receives the same as if it had created all those stamps, which is much easier than actually turning matter into a bunch of stamps.
My use of reward was just shorthand for whatever signals it needs to receive to consider its goal met. At some point it has to receive electrical signals to quantify that its reward is met, right? So why wouldn't it just manipulate those electrical signals to match whatever its goal is?
How do you actually make its utility function over the state of the world? At some point the AI has to interpret the state of the world through electrical signals from sensors, so why wouldn't it be satisfied with manipulating those sensor electrical signals to achieve its goal/reward?
I'm confused about why it cares about m, if it can just manipulate its perception of what m is. Take your chess example, if m is which player wins at the end the AI system "understands" m via an electrical signal. So what makes it care about m itself as opposed to just manipulating the electrical signal? In practice I would think it would take the path of least resistance, which for something simple like chess would probably just be m itself as opposed to manipulating the electrical signal, but for my more complex scenario it seems like it would arrive at 2) before 1). What am I missing?
Your last paragraph is really interesting and not something I'd thought much about before. In practice is it likely to be unbounded? In a typical computer system aren't number formats typically bounded, and if so would we expect an AI system to be using bounded numbers even if the programmers forgot to explicitly bound the reward in the code?
But wouldn't it be way easier for a sufficiently capable AI to make itself think what's happening in m is what aligns with its reward function? Maybe not for something simple like chess, but if the goal requires doing something significant in the real world it seems like it would be much easier for a superintelligent AI to fake the inputs to its sensors than intervening in the world. If we're talking about paperclips or whatever the AI can either 1) build a bunch of factories and convert all different kinds of matter into paperclips, while fighting off humans who want to stop it or 2) fake sensor data to give itself the reward, or just change its reward function to something much simpler that receives the reward all the time. I'm having a hard time understanding why 1) would ever happen before 2).
I don't see how this gets around the wireheading. If it's superintelligent enough to actually substantially increase the number of paperclips in the world in a way that humans can't stop, it seems to me like it would be pretty trivial for it to fake how large m appears to its reward function, and that would be substantially easier than trying to increase m in the actual world.
I'm way out of my depth here, but my thought is it's very common for humans to want to modify their utility functions. For example, a struggling alcoholic would probably love to not value alcohol anymore. There are lots of other examples too of people wanting to modify their personalities or bodies.
It depends on the type of AGI too I would think, if superhuman AI ends up being like a paperclip maximizer that's just really good at following its utility function then yeah maybe it wouldn't mess with its utility function. If superintelligence means it has emergent characteristics like opinions and self-reflection or whatever it seems plausible it could want to modify its utility function, say after thinking about philosophy for a while.
Like I said I'm way out of my depth though so maybe that's all total nonsense.
Thanks, I appreciate you taking the time to answer my questions. I'm still skeptical that it could work like that in practice but I also don't understand AI so thanks for explaining that possibility to me.