One way to think about AI (alignment) is like how you think about fixing a car, or about an ordinary computer program. Let's say this is the "objectual" way of thinking (objectual, just meaning, "as an object"). In the objectual way of thinking, you emphasize the object and deemphasize yourself. You're concerned with the self-contained causes of the object, its autonomous internal dynamics and laws. "How do I assemble the parts of this thing so that, without further intervention from me, the internal dynamics of it will produce the consequences I want?"
Another way of thinking is more like how you relate to another person. Let's say this is the "relational" way of thinking. When you relate to a person, you expect to reprogram yourself to communicate with them and accomodate their needs and get gains from joint strategies, and you expect them to do the same. You expect that in most cases they'll understand things roughly the way you understand things; and in many cases they'll want the same things as you, and behave similarly to you. You can understand them not by forming a little simulation of them, but by tuning some dials on yourself to match the settings in the other person, and then just looking at yourself. "How do I continuously rearrange myself in response to the other person's self-rearrangement, so that the other person's ongoing self-rearrangement in response to my self-rearrangement, will together have good results?"
This distinction is more of a fuzzy spectrum or dimension, than a dichotomy. In math, you're sort of programming yourself, and you're trying to understand something that isn't you (the mathematical objects), but the only access you have to the math is configurations of your thinking... More generally, in natural science, you're trying to think of things objectually, but to do that you have to think relationally: how can I get more information about the object, which concepts can I use to understand it, where am I confused. Training an animal is another intermediate case.
Relational thinking emphasizes different question from objectual thinking. The relational questions about AGI seem to me more interesting for alignment than the objectual questions.
For example, in the theory around HCH, IIUC (which I maybe don't), there's an emphasis on using learning theory to get as much mileage (info, values, robustness, judgement) out of the limited bandwidth channel of human approval. The relational viewpoint would emphasize as distinct (but intertwined) the problem of humans "learning to be a part of the approval system". E.g., humans learning to understand the mind-stuff of HCH, and humans learning to be the sort of thing that HCH can usefully, legibly imitate / be approved of by. (This doesn't mean the objectual thinking isn't relevant; maybe the strongest claim I'd venture, is that we won't be able to solve it without some serious relational thinking.)
[Hastily written, caveat emptor.]
One way to think about AI (alignment) is like how you think about fixing a car, or about an ordinary computer program. Let's say this is the "objectual" way of thinking (objectual, just meaning, "as an object"). In the objectual way of thinking, you emphasize the object and deemphasize yourself. You're concerned with the self-contained causes of the object, its autonomous internal dynamics and laws. "How do I assemble the parts of this thing so that, without further intervention from me, the internal dynamics of it will produce the consequences I want?"
Another way of thinking is more like how you relate to another person. Let's say this is the "relational" way of thinking. When you relate to a person, you expect to reprogram yourself to communicate with them and accomodate their needs and get gains from joint strategies, and you expect them to do the same. You expect that in most cases they'll understand things roughly the way you understand things; and in many cases they'll want the same things as you, and behave similarly to you. You can understand them not by forming a little simulation of them, but by tuning some dials on yourself to match the settings in the other person, and then just looking at yourself. "How do I continuously rearrange myself in response to the other person's self-rearrangement, so that the other person's ongoing self-rearrangement in response to my self-rearrangement, will together have good results?"
This distinction is more of a fuzzy spectrum or dimension, than a dichotomy. In math, you're sort of programming yourself, and you're trying to understand something that isn't you (the mathematical objects), but the only access you have to the math is configurations of your thinking... More generally, in natural science, you're trying to think of things objectually, but to do that you have to think relationally: how can I get more information about the object, which concepts can I use to understand it, where am I confused. Training an animal is another intermediate case.
Relational thinking emphasizes different question from objectual thinking. The relational questions about AGI seem to me more interesting for alignment than the objectual questions.
For example, in the theory around HCH, IIUC (which I maybe don't), there's an emphasis on using learning theory to get as much mileage (info, values, robustness, judgement) out of the limited bandwidth channel of human approval. The relational viewpoint would emphasize as distinct (but intertwined) the problem of humans "learning to be a part of the approval system". E.g., humans learning to understand the mind-stuff of HCH, and humans learning to be the sort of thing that HCH can usefully, legibly imitate / be approved of by. (This doesn't mean the objectual thinking isn't relevant; maybe the strongest claim I'd venture, is that we won't be able to solve it without some serious relational thinking.)