I'm assuming there are other people (I'm a person too, honest!) up in here asking this same question, but I haven't seen them so far, and I do see all these posts about AI "alignment" and I can't help but wonder: when did we discover an objective definition of "good"?
I've already mentioned it elsewhere here, but I think Nietzsche has some good (heh) thoughts about the nature of Good and Evil, and that they are subjective concepts. As ChatGPT has to say:
Nietzsche believed that good and evil are not fixed things, but rather something that people create in their minds. He thought that people create their own sense of what is good and what is bad, and that it changes depending on the culture and time period. He also believed that people often use the idea of "good and evil" to justify their own actions and to control others. So, in simple terms, Nietzsche believed that good and evil are not real things that exist on their own, but are instead created by people's thoughts and actions.
How does "alignment" differ? Is there a definition somewhere? From what I see, it's subjective. What is the real difference between "how to do X" and "how to prevent X"? One form is good and the other not— depending on what X is? But again, perhaps I misunderstand the goal, and what exactly is being proposed be controlled.
Is information itself good or bad? Or is it how the information is used that is good or bad (and as mentioned, relatively so)?
I do not know. I do know that I'm stoked about AI, as I have been since I was smol, and as I am about all the advancements us just-above-animals make. Biased for sure.
Since we're anthropomorphizing[1] so much— how to we align humans?
We're worried about AI getting too powerful, but logically that means humans are getting too powerful, right? Thus what we have to do to cover question 1 (how), regardless of question 2 (what), is control human behavior, correct?
How do we ensure that we churn out "good" humans? Gods? Laws? Logic? Communication? Education? This is not a new question per se, and I guess the scary thing is that, perhaps, it is impossible to ensure that literally every human is Good™ (we'll use a loose def of 'you know what I mean— not evil!').
This is only "scary" because humans are getting freakishly powerful. We no longer need an orchestra to play a symphony we've come up with, or multiple labs and decades to generate genetic treatments— and so on and so forth.
Frankly though, it seems kind of impossible to figure out a "how" if you don't know the "what", logically speaking.
I'm a fan of navel gazing, so it's not like I'm saying this is a waste of time, but if people think they're doing substantive work by rehashing/restating fictional stories which cover the same ideas in more digestible and entertaining formats…
Meh, I dunno, I guess I was just wondering if there was any meat to this stuff, and so far I haven't found much. But I will keep looking.
I see a lot of people viewing AI from the "human" standpoint, and using terms like "reward" to mean a human version of the idea, versus how a program would see it (weights may be a better term? Often I see people thinking these "rewards" are like a dopamine hit for the AI or something, which is just not a good analogy IMHO), and I think that muddies the water, as by definition we're talking non-human intelligence, theoretically… right? Or are we? Maybe the question is "what if the movie Lawnmower Man was real?" The human perspective seems to be the popular take (which makes sense as most of us are human).