eli_sennesh comments on An overall schema for the friendly AI problems: self-referential convergence criteria - Less Wrong

17 Post author: Stuart_Armstrong 13 July 2015 03:34PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (110)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 21 July 2015 12:37:01PM *  0 points [-]

Well, attempting to account for your grammar and figure out what you meant...

A correct model has to get a bunch of counterfactuals correct, and not just match an empirical dataset.

Yes, and? Causal modelling techniques get counterfactuals right-by-design, in the sense that a correct causal model by definition captures counterfactual behavior, as studied across controlled or intervened experiments.

I mean, I agree that most currently-in-use machine learning techniques don't bother to capture causal structure, but on the upside, that precise failure to capture and compress causal structure is why those techniques can't lead to AGI.

what is the difference between modelling it currently, and solving moral philosophy?

I think it's more accurate to say that we're trying to dissolve moral philosophy in favor of a scientific model of human evaluative cognition. Surely to a moral philosopher this will sound like a moot distinction, but the precise difference is that the latter thing creates and updates predictive models which capture counterfactual, causal knowledge, and which thus can be elaborated into an explicit theory of morality that doesn't rely on intuition or situational framing to work.

Comment author: TheAncientGeek 21 July 2015 01:25:35PM 0 points [-]

As far as I can tell, human intuition is the territory you would be modelling, here. In particular, when dealing with counterfactuals, since it would be unethical to actually set up trolley problems.

BTW, there is nothing to stop moral philosophy being predictive, etc.

Comment author: [deleted] 21 July 2015 01:32:03PM 0 points [-]

As far as I can tell, human intuition is the territory you would be modelling, here.

No, we're trying to capture System 2's evaluative cognition, not System 1's fast-and-loose, bias-governed intuitions.

Comment author: TheAncientGeek 21 July 2015 08:11:51PM *  0 points [-]

Wrong kind of intuition

If you have an extenal standard, as you do with probability theory and logic, system 2 can learn utilitarianism, and its performance can be checked against the external standard.

But we don't have an agreed standard to compare system 1 ethical reasoning against, because we haven't solved ,moral philosophy. What we have is system 1 coming up with speculative theories,which have to be checked against intuition, meaning an internal standard

Comment author: [deleted] 21 July 2015 11:23:20PM 0 points [-]

Again, the whole point of this task/project/thing is to come up with an explicit theory to act as an external standard for ethics. Ethical theories are maps of the evaluative-under-full-information-and-individual+social-rationality territory.

Comment author: TheAncientGeek 22 July 2015 07:45:58AM *  0 points [-]

Again, the whole point of this task/project/thing is to come up with an explicit theory to act as an external standard for ethics. 

And that is the whole point of moral philosophy..... so it's sounding like a moot distinction.

Ethical theories are maps of the evaluative-under-full-information-and-individual+social-rationality territory.

You don't like the word intuition, but the fact remains that while you are building your theory, you will have to check it against humans ability to give answers without knowing how they arrived at them. Otherwise you end up with a clear, consistent theory that nobody finds persuasive.