TheAncientGeek comments on An overall schema for the friendly AI problems: self-referential convergence criteria - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (110)
Yes, of course, but then the questions is: :what is the difference between modelling it correctly and solving moral philosophy? A correct model has to get a bunch of counterfactuals correct, and not just match an empirical dataset.
Well, attempting to account for your grammar and figure out what you meant...
Yes, and? Causal modelling techniques get counterfactuals right-by-design, in the sense that a correct causal model by definition captures counterfactual behavior, as studied across controlled or intervened experiments.
I mean, I agree that most currently-in-use machine learning techniques don't bother to capture causal structure, but on the upside, that precise failure to capture and compress causal structure is why those techniques can't lead to AGI.
I think it's more accurate to say that we're trying to dissolve moral philosophy in favor of a scientific model of human evaluative cognition. Surely to a moral philosopher this will sound like a moot distinction, but the precise difference is that the latter thing creates and updates predictive models which capture counterfactual, causal knowledge, and which thus can be elaborated into an explicit theory of morality that doesn't rely on intuition or situational framing to work.
As far as I can tell, human intuition is the territory you would be modelling, here. In particular, when dealing with counterfactuals, since it would be unethical to actually set up trolley problems.
BTW, there is nothing to stop moral philosophy being predictive, etc.
No, we're trying to capture System 2's evaluative cognition, not System 1's fast-and-loose, bias-governed intuitions.
Wrong kind of intuition
If you have an extenal standard, as you do with probability theory and logic, system 2 can learn utilitarianism, and its performance can be checked against the external standard.
But we don't have an agreed standard to compare system 1 ethical reasoning against, because we haven't solved ,moral philosophy. What we have is system 1 coming up with speculative theories,which have to be checked against intuition, meaning an internal standard
Again, the whole point of this task/project/thing is to come up with an explicit theory to act as an external standard for ethics. Ethical theories are maps of the evaluative-under-full-information-and-individual+social-rationality territory.
And that is the whole point of moral philosophy..... so it's sounding like a moot distinction.
You don't like the word intuition, but the fact remains that while you are building your theory, you will have to check it against humans ability to give answers without knowing how they arrived at them. Otherwise you end up with a clear, consistent theory that nobody finds persuasive.