Not sure where "newbie" questions go, but this was a post about alignment. I had a basic question I posted in a group- I was wondering about this and figured others here would know the answer.
"I'm not a programmer, but I had a question regarding AI alignment. I read the example on ACT of the AI assigned to guard a diamond and how it could go for tricking the sensory apparatus. What if you different AIs that had different goals, i.e., different ways of detecting the diamond's presence, and a solution was only approved if it met all the AIs goals? For other things, you could have one goal be safety that is given higher priority, a second AI with the original goal, and a third to "referee" edge cases? In general, the problem seems to be that AIs are good at doing what we tell them but we're not good at telling them what we want. What about "Balance of powers"? Different AIs with different, competing goals, where a solution only "works" when it satisfies the goals of each who are wary of being fooled by other ais? It seems to be that balancing of goals is more akin to "actual" intelligence. You could even then have top AIs to evaluate how the combined processes of the competing AI worked and have it able to alter their program within parameters but have that Top AI checked by other AIs. I'm sure I'm not the first person to have thought of this, as the adversarial process is used in other ways. Why wouldn't this work for alignment?"
Hi. First time commenter.
Not sure where "newbie" questions go, but this was a post about alignment. I had a basic question I posted in a group- I was wondering about this and figured others here would know the answer.
"I'm not a programmer, but I had a question regarding AI alignment. I read the example on ACT of the AI assigned to guard a diamond and how it could go for tricking the sensory apparatus. What if you different AIs that had different goals, i.e., different ways of detecting the diamond's presence, and a solution was only approved if it met all the AIs goals? For other things, you could have one goal be safety that is given higher priority, a second AI with the original goal, and a third to "referee" edge cases? In general, the problem seems to be that AIs are good at doing what we tell them but we're not good at telling them what we want. What about "Balance of powers"? Different AIs with different, competing goals, where a solution only "works" when it satisfies the goals of each who are wary of being fooled by other ais? It seems to be that balancing of goals is more akin to "actual" intelligence. You could even then have top AIs to evaluate how the combined processes of the competing AI worked and have it able to alter their program within parameters but have that Top AI checked by other AIs. I'm sure I'm not the first person to have thought of this, as the adversarial process is used in other ways. Why wouldn't this work for alignment?"