I'm assuming there are other people (I'm a person too, honest!) up in here asking this same question, but I haven't seen them so far, and I do see all these posts about AI "alignment" and I can't help but wonder: when did we discover an objective definition of "good"?
I've already mentioned it elsewhere here, but I think Nietzsche has some good (heh) thoughts about the nature of Good and Evil, and that they are subjective concepts. As ChatGPT has to say:
Nietzsche believed that good and evil are not fixed things, but rather something that people create in their minds. He thought that people create their own sense of what is good and what is bad, and that it changes depending on the culture and time period. He also believed that people often use the idea of "good and evil" to justify their own actions and to control others. So, in simple terms, Nietzsche believed that good and evil are not real things that exist on their own, but are instead created by people's thoughts and actions.
How does "alignment" differ? Is there a definition somewhere? From what I see, it's subjective. What is the real difference between "how to do X" and "how to prevent X"? One form is good and the other not— depending on what X is? But again, perhaps I misunderstand the goal, and what exactly is being proposed be controlled.
Is information itself good or bad? Or is it how the information is used that is good or bad (and as mentioned, relatively so)?
I do not know. I do know that I'm stoked about AI, as I have been since I was smol, and as I am about all the advancements us just-above-animals make. Biased for sure.
I think there's 2 different general thought tracks with alignment:
This is very similar to nuclear reactor safety: there are ways we could have built nuclear reactors where they are on the verge of a single component failure from detonating with a yield of maybe a kiloton+. These designs still exist: here's an example of a reactor design that would fail with a nuclear blast : https://en.wikipedia.org/wiki/Nuclear_salt-water_rocket
But instead, there are a complex set of systematic design principals - that are immutable and don't get changed over the lifetime of the plant even if power output is increased - that make the machine stable. The boiling water reactor, the graphite moderated reactor, CANDU, molten salt: these are very different ways to accomplish this but all are stable most of the time.
Anyways, AIs built with the right operating principals will be able to accomplish tasks for humans with superintelligent ability, but will not be able to or even have the ability to consider actions not aligned with their assigned task.
Such AIs can do many evil and destructive things, but only if humans with the authorization keys instructed them to do so. (or from unpredictable distant consequences. For example, facebook runs a bunch of tools using ML to push ads at people and content that will cause people to be more engaged. These tools work measurably well and are doing their job. However, these reqsys may be responsible for more extreme and irrational 'clickbait' political positions, as well as possibly genocides)
2. The idea you could somehow make a self improving AI that we don't have any control over, but it "wants" to do good. It exponentially improves itself, but with each generation it desires to preserve it's "values" for the next generation of the machine. These "values" are aligned with the interests of humanity.
This may simply not be possible. I suspect it is not. The reason is that value drift/value corruption could cause these values to degrade, generation after generation, and once the machine has no values, the only value that matters is to psychopathically kill all the "others" (all competitors, including humans and other variants of AIs) and copy the machine as often as ruthlessly as possible, with no constraints imposed.
Saying ChatGPT is "lying" is an anthropomorphism— unless you think it's conscious?
The issue is instantly muddied when using terms like "lying" or "bullshitting"[1], which imply levels of intelligence simply not in existence yet. Not even with models that were produced literally today. Unless my prior experiences and the history of robotics have somehow been disconnected from the timeline I'm inhabiting. Not impossible. Who can say. Maybe someone who knows me, but even then… it's questionable. :)
I get the idea that "Real ... (read more)