eli_sennesh comments on An overall schema for the friendly AI problems: self-referential convergence criteria - Less Wrong

17 Post author: Stuart_Armstrong 13 July 2015 03:34PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (110)

You are viewing a single comment's thread.

Comment author: [deleted] 16 July 2015 05:11:29AM *  2 points [-]

What do I mean by that? Well, imagine you're trying to reach reflective equilibrium in your morality. You do this by using good meta-ethical rules, zooming up and down at various moral levels, making decisions on how to resolve inconsistencies, etc... But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup.

Wait... what? No.

You don't solve the value-alignment problem by trying to write down your confusions about the foundations of moral philosophy, because writing down confusion still leaves you fundamentally confused. No amount of intelligence can solve an ill-posed problem in some way other than pointing out that the problem is ill-posed.

You solve it by removing the need to do moral philosophy and instead specifying a computation that corresponds to your moral psychology and its real, actually-existing, specifiable properties.

And then telling metaphysics to take a running jump to boot, and crunching down on Strong Naturalism brand crackers, which come in neat little bullet shapes.

Comment author: hairyfigment 19 July 2015 05:32:38PM 0 points [-]

Near as I can tell, you're proposing some "good meta-ethical rules," though you may have skipped the difficult parts. And I think the claim, "you stop when your morality is perfectly self-consistent," was more a factual prediction than an imperative.

Comment author: [deleted] 20 July 2015 01:19:03PM 0 points [-]

I didn't skip the difficult bits, because I didn't propose a full solution. I stated an approach to dissolving the problem.

Comment author: hairyfigment 22 July 2015 06:00:14AM 0 points [-]

And do you think that approach differs from the one you quoted?

Comment author: [deleted] 22 July 2015 12:43:21PM 0 points [-]

It involves reasoning about facts rather than metaphysics.

Comment author: TheAncientGeek 19 July 2015 04:35:40PM *  0 points [-]

And will that model have the right counteractfactuals? Will it evolve under changing conditions the same way that the original would.

Comment author: [deleted] 20 July 2015 01:18:35PM 0 points [-]

If you modelled the real thing correctly, then yes, of course it will.

Comment author: TheAncientGeek 21 July 2015 07:42:47AM *  0 points [-]

Yes, of course, but then the questions is: :what is the difference between modelling it correctly and solving moral philosophy? A correct model has to get a bunch of counterfactuals correct, and not just match an empirical dataset.

Comment author: [deleted] 21 July 2015 12:37:01PM *  0 points [-]

Well, attempting to account for your grammar and figure out what you meant...

A correct model has to get a bunch of counterfactuals correct, and not just match an empirical dataset.

Yes, and? Causal modelling techniques get counterfactuals right-by-design, in the sense that a correct causal model by definition captures counterfactual behavior, as studied across controlled or intervened experiments.

I mean, I agree that most currently-in-use machine learning techniques don't bother to capture causal structure, but on the upside, that precise failure to capture and compress causal structure is why those techniques can't lead to AGI.

what is the difference between modelling it currently, and solving moral philosophy?

I think it's more accurate to say that we're trying to dissolve moral philosophy in favor of a scientific model of human evaluative cognition. Surely to a moral philosopher this will sound like a moot distinction, but the precise difference is that the latter thing creates and updates predictive models which capture counterfactual, causal knowledge, and which thus can be elaborated into an explicit theory of morality that doesn't rely on intuition or situational framing to work.

Comment author: TheAncientGeek 21 July 2015 01:25:35PM 0 points [-]

As far as I can tell, human intuition is the territory you would be modelling, here. In particular, when dealing with counterfactuals, since it would be unethical to actually set up trolley problems.

BTW, there is nothing to stop moral philosophy being predictive, etc.

Comment author: [deleted] 21 July 2015 01:32:03PM 0 points [-]

As far as I can tell, human intuition is the territory you would be modelling, here.

No, we're trying to capture System 2's evaluative cognition, not System 1's fast-and-loose, bias-governed intuitions.

Comment author: TheAncientGeek 21 July 2015 08:11:51PM *  0 points [-]

Wrong kind of intuition

If you have an extenal standard, as you do with probability theory and logic, system 2 can learn utilitarianism, and its performance can be checked against the external standard.

But we don't have an agreed standard to compare system 1 ethical reasoning against, because we haven't solved ,moral philosophy. What we have is system 1 coming up with speculative theories,which have to be checked against intuition, meaning an internal standard

Comment author: [deleted] 21 July 2015 11:23:20PM 0 points [-]

Again, the whole point of this task/project/thing is to come up with an explicit theory to act as an external standard for ethics. Ethical theories are maps of the evaluative-under-full-information-and-individual+social-rationality territory.

Comment author: TheAncientGeek 22 July 2015 07:45:58AM *  0 points [-]

Again, the whole point of this task/project/thing is to come up with an explicit theory to act as an external standard for ethics. 

And that is the whole point of moral philosophy..... so it's sounding like a moot distinction.

Ethical theories are maps of the evaluative-under-full-information-and-individual+social-rationality territory.

You don't like the word intuition, but the fact remains that while you are building your theory, you will have to check it against humans ability to give answers without knowing how they arrived at them. Otherwise you end up with a clear, consistent theory that nobody finds persuasive.