An Eliezerism my Eliezer-model
Who? I liked this post (I had heard of Moore's paradox, but hadn't thought about how it generalizes), but this unexplained reference is confusing. (The only famous person with that name I can find on Wikipedia is the Tamil mathematician C. J. Eliezer, but I can't figure out why his work would be relevant in this context.)
Well, if intuitions aren’t epistemically admissible in philosophy, philosophers would be out of a job!
If intuitions aren’t epistemically admissible anywhere, everyone is out of business, in the continued absence of an intuition-free epistemology.
"I've just won by two-boxing in Transparent Newcomb's problem, but I don't believe it actually happened."
(Some weird epistemic states are useful to consider/allow.)
How can you simultaneously recognize the non-epistemic generator of your belief and hold the belief?
Note that humans are not well modeled as single agents, we are somewhat better described as a collection of agent-like thought patterns that are interacting but compartmentalized.
Let us suppose that there is pirate treasure on an island. I have a map to the treasure, which I inherited from my mother. You have a different map to the same treasure that you inherited from your mother. Our mothers are typical flawed humans. Our maps are typical flawed pirate maps.
Because I have the map I have, I believe that the treasure is over here. The non-epistemic generator of that belief is who my mother was. If I had a different mother I would have a different map and a different belief. Your map says the treasure is over there.
To find the treasure, I follow my map. An outsider notices that I am following a map that I know I am only following for non-epistemic reasons and that I have Moorean confusion. Perhaps so. But I cannot follow your map, because I don't have it. So it's best to follow my map.
If we shared our maps perhaps we could find the treasure more quickly and split it between us. But maybe it is hard to share the map. Maybe I don't trust you not to defect. Maybe it is a zero-sum treasure. In pirate stories it is rarely so simple.
Similarly, Alice is a committed Christian and knows this is because she was raised to be Christian. If she was raised Muslim she would be a committed Muslim, and she knows this too. But her Christian "map" is really good and her Muslim "map" is a sketch from an hour long world religions class taught by a Confucian. It's rational to continue to use her Christian map even if the evidence indicates that Islam has higher probability of truth.
I anticipate the reply that Alice can by all means follow her Christian map as long as it is the most useful map she had, but she should not mistake the map for the territory. This is an excellent thought. It is also thousands of years old and already part of Alice's map.
Many of my beliefs have the non-epistemic generator "I was bored one afternoon (and started reading LessWrong)". It is very easy to recognize the non-epistemic generator of my belief and also have it. My confusion is how anyone could not recognize the same thing.
I agree that the latter two examples have Moorean vibes, but I don't think they strictly speaking can be classified as such (especially the last one). (Perhaps you are not saying this?) They could just be understood as instances of modus tollens, where the irrationality is not that they recognize that their belief has a non-epistemic generator, but rather that they have an absurdly high credence in , i.e. "my parents wouldn't be wrong" and "philosophers could/should not be out of jobs".
I know that the analogy is not in any way precise, but... isn't the whole Alignment problem, metaphorically, an attempt to resolve a Moorean statement?
"I know that the humans forced to smile are not happy (and I know all the mistakes they've made while programming me, I know what they should've done instead), but I don't believe that they are not happy."
Here's an interesting bit from wikipedia:
https://en.wikipedia.org/wiki/Moore%27s_paradox#Proposed_explanations
Another alternative view, due to Richard Moran,[15] views the existence of Moore's paradox as symptomatic of creatures who are capable of self-knowledge, capable of thinking for themselves from a deliberative point of view, as well as about themselves from a theoretical point of view. On this view, anyone who asserted or believed one of Moore's sentences would be subject to a loss of self-knowledge—in particular, would be one who, with respect to a particular 'object', broadly construed, e.g. person, apple, the way of the world, would be in a situation which violates, what Moran calls, the Transparency Condition: if I want to know what I think about X, then I consider/think about nothing but X itself. Moran's view seems to be that what makes Moore's paradox so distinctive is not some contradictory-like phenomenon (or at least not in the sense that most commentators on the problem have construed it), whether it be located at the level of belief or that of assertion. Rather, that the very possibility of Moore's paradox is a consequence of our status as agents (albeit finite and resource-limited ones) who are capable of knowing (and changing) their own minds.
Doesn't "solving Alignment" mean creating some sort of "Transparency Condition"? Maybe such conditions are the key to having a human-like consciousness and ability to think about your own goals.
I know that the humans forced to smile are not happy (and I know all the mistakes they've made while programming me, I know what they should've done instead), but I don't believe that they are not happy.
These are different senses of "happy." It should really read:
I know forcing humans to smile doesn't make them , and I know what they should've written instead to get me to optimize for as they intended, but they are .
They're different concepts, so there's no strangeness here. The AGI knows what you meant to do, it just cares about the different thing you accidently instilled in it, and so doesn't care about what you wanted.
I know that there's no strangeness from the formal point of view. But it doesn't mean there's no strangeness in general. Or that the situation isn't similar to the Moore paradox. Your examples are not 100% Moore statements too. Isn't the point of the discussion to find interesting connections between Moore paradox and other things?
The AGI knows what you meant to do, it just cares about the different thing you accidently instilled in it, and so doesn't care about what you wanted.
I know that the classical way to formulate it is "AI knows, but doesn't care".
I thought it may be interesting to formulate it as "AI knows, but doesn't believe". It may be interesting to think for what type of AI this formulation may be true. For such AI alignment would mean resolving the Moore paradox. For example, imagine an AI with a very strong OCD to make people smile.
Moorean statements are statements like:
These statements sound strange because any agent that outright tells you it's not raining must already at least tacitly represent that fact in their world model. That agent is plugged into their world model well enough to report what it says, but not well enough to accurately model their model. For this explicit a Moorean statement, the epistemic strangeness is so obvious that basically no one will have that combination of access to and confusion about their world model.
An Eliezerism my Eliezer-model often generates is that many social scripts involve expressing Moorean propositions. They're subtler, but the essential confusion is the same.
What? How can you simultaneously recognize the non-epistemic generator of your belief and hold the belief?
Can you generate more instances?