My new paper: Concept learning for safe autonomous AI

Kaj_Sotala

Abstract: Sophisticated autonomous AI may need to base its behavior on fuzzy concepts that cannot be rigorously defined, such as well-being or rights. Obtaining desired AI behavior requires a way to accurately specify these concepts. We review some evidence suggesting that the human brain generates its concepts using a relatively limited set of rules and mechanisms. This suggests that it might be feasible to build AI systems that use similar criteria and mechanisms for generating their own concepts, and could thus learn similar concepts as humans do. We discuss this possibility, and also consider possible complications arising from the embodied nature of human thought, possible evolutionary vestiges in cognition, the social nature of concepts, and the need to compare conceptual representations between humans and AI systems.

I just got word that this paper was accepted for the AAAI-15 Workshop on AI and Ethics: I've uploaded a preprint here. I'm hoping that this could help seed a possibly valuable new subfield of FAI research. Thanks to Steve Rayhawk for invaluable assistance while I was writing this paper: it probably wouldn't have gotten done without his feedback motivating me to work on this.

Comments welcome.

Obtaining desired AI behavior

Looks like you're making distinctions between how you're going to build something that has the desired behavior. That how would be the specification.

These concepts could be explicitly specified set theoretically as concepts, or specified by defining boundaries in some conceptual space, or more generally, specified algorithmically as the product of information processing system with learning behavior and learning environment, without initially explicitly creating a conceptual representation.

It's not that one way is rigorous, and one is not, but that they are different ways of creating something with the desired behavior, or in your particular case, different ways between creating the concepts you want to use in producing the desired behavior. The distinction between a conceptual specification and an algorithmic specification seems meaningful and useful to me,

I think this works as a drop in replacement for the first two sentences:

Sophisticated autonomous AI may need to base its behavior on fuzzy concepts such as well-being or rights to obtain desired AI behavior. These concepts are notoriously difficult to explicitly define conceptually, but we explore implicitly defining those concepts by algorithmically generating those concepts.

I assumed that the type of AI design you're exploring is structurally committed to creating those concepts, instead of simply creating algorithms with the desired behavior, or I would have made more general statements about functionality.

Whatever you think of my proposed wording, and even if you don't like the distinctions I've made, the crucial word that I've added is but - an adversity conjuction. But, while, instead, ... a word to balance the things you're trying to make the distinction between, thereby identifying them. The meaning you intended in the first two sentences was a tension or conflict, but the grammar and sentence structure didn't reflect that.

Thanks. I ended up going with:

Sophisticated autonomous AI may need to base its behavior on fuzzy concepts such as well-being or rights. These concepts cannot be given an explicit formal definition, but obtaining desired behavior still requires a way to instill the concepts in an AI system. To solve the problem, we review evidence suggesting that the human brain generates its concepts using a relatively limited set of rules and mechanisms. This suggests that it might be feasible to build AI systems that use similar criteria for generating their own concep

... (read more)

26

My new paper: Concept learning for safe autonomous AI

26

26

26

My new paper: Concept learning for safe autonomous AI

26

26