CyberByte
8
Ω
3
2
CyberByte has not written any posts yet.

What does it mean for human values to be vulnerable to adversarial examples? When we say this about AI systems (e.g. image classifiers), I think it's either because their judgments on manipulated situations/images are misaligned with ours/humans, or perhaps because they get the "ground truth" wrong. But how can a value system be misaligned with itself or different from the ground truth? For alignment purposes, isn't it itself the ground truth? It could of course fail to match "objective morality" if you believe in that, but in that case we should probably be trying to make our AI align with that and not with someone's human values.
I could (easily) imagine that... (read more)
Thanks for your reply!
Okay, that helps me understand what you're talking about a bit better. It sounds like the concept of a partial function, and in the ML realm like the notorious brittleness that makes systems incapable of generalizing or extrapolating outside of a limited training set. I understand why you're approaching this from the adversarial angle though, because I suppose you're concerned about the AI just bringing about some state that's outside the domain of definition which just happens to yield a high "random" score.
... (read 683 more words →)