Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

An overall schema for the friendly AI problems: self-referential convergence criteria

17 Post author: Stuart_Armstrong 13 July 2015 03:34PM

A putative new idea for AI control; index here.

After working for some time on the Friendly AI problem, it's occurred to me that a lot of the issues seem related. Specifically, all the following seem to have commonalities:

Speaking very broadly, there are two features all them share:

  1. The convergence criteria are self-referential.
  2. Errors in the setup are likely to cause false convergence.

What do I mean by that? Well, imagine you're trying to reach reflective equilibrium in your morality. You do this by using good meta-ethical rules, zooming up and down at various moral levels, making decisions on how to resolve inconsistencies, etc... But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup. In other words, the stopping point (and the the convergence to the stopping point) is entirely self-referentially defined: the morality judges itself. It does not include any other moral considerations. You input your initial moral intuitions and values, and you hope this will cause the end result to be "nice", but the definition of the end result does not include your initial moral intuitions (note that some moral realists could see this process dependence as a positive - except for the fact that these processes have many convergent states, not just one or a small grouping).

So when the process goes nasty, you're pretty sure to have achieved something self-referentially stable, but not nice. Similarly, a nasty CEV will be coherent and have no desire to further extrapolate... but that's all we know about it.

The second feature is that any process has errors - computing errors, conceptual errors, errors due to the weakness of human brains, etc... If you visualise this as noise, you can see that noise in a convergent process is more likely to cause premature convergence, because if the process ever reaches a stable self-referential state, it will stay there (and if the process is a long one, then early noise will cause great divergence at the end). For instance, imagine you have to reconcile your belief in preserving human cultures with your beliefs in human individual freedom. A complex balancing act. But if, at any point along the way, you simply jettison one of the two values completely, things become much easier - and once jettisoned, the missing value is unlikely to ever come back.

Or, more simply, the system could get hacked. When exploring a potential future world, you could become so enamoured of it, that you overwrite any objections you had. It seems very easy for humans to fall into these traps - and again, once you lose something of value in your system, you don't tend to get if back.

 

Solutions

And again, very broadly speaking, there are several classes of solutions to deal with these problems:

  1. Reduce or prevent errors in the extrapolation (eg solving the agent tiling problem).
  2. Solve all or most of the problem ahead of time (eg traditional FAI approach by specifying the correct values).
  3. Make sure you don't get too far from the starting point (eg reduced impact AI, tool AI, models as definitions).
  4. Figure out the properties of a nasty convergence, and try to avoid them (eg some of the ideas I mentioned in "crude measures", general precautions that are done when defining the convergence process).

 

Comments (110)

Comment author: shminux 13 July 2015 03:15:44PM 10 points [-]

As you mention, so far every attempt by humans to have a self-consistent value system (the process also known as decompartmentalization) results in less-than-desirable outcomes. What if the end goal of having a thriving long-lasting (super-)human(-like) society is self-contradictory, and there is no such thing as both "nice" and "self-referentially stable"? Maybe some effort should be put into figuring out how to live, and thrive, while managing the unstable self-reference and possibly avoid convergence altogether.

Comment author: Kaj_Sotala 15 July 2015 06:59:46AM 10 points [-]

A thought I've been thinking of lately, derived from a reinforcement learning view of values, and also somewhat inspired by Nate's recent post on resting in motion... - value convergence seems to suggest a static endpoint, with some set of "ultimate values" we'll eventually reach and have ever after. But so far societies have never reached such a point, and if our values are an adaptation to our environment (including the society and culture we live in), then it would suggest that as long as we keep evolving and developing, our values will keep changing and evolving with us, without there being any meaningful endpoint.

There will always (given our current understanding of physics) be only a finite amount of resources available, and unless we either all merge into one enormous hivemind or get turned into paperclips, there will likely be various agents with differing preferences on what exactly to do with those resources. As the population keeps changing and evolving, the various agents will keep acquiring new kinds of values, and society will keep rearranging itself to a new compromise between all those different values. (See: the whole history of the human species so far.)

Possibly we shouldn't so much try to figure out what we'd prefer the final state to look like, but rather what we'd prefer the overall process to look like.

(The bias towards trying to figure out a convergent end-result for morality might have come from LW's historical tendency to talk and think in terms of utility functions, which implicitly assume a static and unchanging set of preferences, glossing over the fact that human preferences keep constantly changing.)

Comment author: jacob_cannell 16 July 2015 08:38:46PM 0 points [-]

This. Values evolve, like everything else. Evolution will continue in the posthuman era.

Comment author: Lumifer 16 July 2015 08:52:03PM 1 point [-]

Evolution requires selection pressure. The failures have to die out. What will provide the selection pressure in the posthuman era?

Comment author: jacob_cannell 17 July 2015 03:51:31PM 1 point [-]

Economics. Posthumans still require mass/energy to store/compute their thoughts.

Comment author: gjm 17 July 2015 04:20:33PM 0 points [-]

The failures have to die out.

I'm not sure that's true. Imagine some glorious postbiological future in which people (or animals or ideas or whatever) can reproduce without limit. There are two competing replicators A and B, and the only difference is that A replicates slightly faster than B. After a while there will be vastly more of A around than of B, even if nothing dies. For many purposes, that might be enough.

Comment author: Lumifer 17 July 2015 04:22:29PM 0 points [-]

After a while there will be vastly more of A around than of B

So, in this scenario, what evolved?

Comment author: gjm 17 July 2015 04:53:08PM 1 point [-]

The distribution of A and B in the population.

Comment author: Lumifer 17 July 2015 04:54:58PM -1 points [-]

I don't think this is an appropriate use of the word "evolution".

Comment author: gjm 17 July 2015 06:09:36PM 4 points [-]

Why not? It's a standard one in the biological context. E.g.,

"In fact, evolution can be precisely defined as any change in the frequency of alleles within a gene pool from one generation to the next."

which according to a talk.origins FAQ is from this textbook: Helena Curtis and N. Sue Barnes, Biology, 5th ed. 1989 Worth Publishers, p.974

Comment author: hyporational 17 July 2015 05:52:08AM *  0 points [-]

If there are mistakes made or the environment requires adaptation, a sufficiently flexible intelligence can mediate the selection pressure.

Comment author: Lumifer 17 July 2015 02:39:08PM 1 point [-]

The end result still has to be for the failures to die or be castrated.

There is no problem with saying that values in future will "change" or "drift", but "evolve" is more specific and I'm not sure how will it work.

Comment author: jacob_cannell 17 July 2015 03:52:15PM 1 point [-]

Memetic evolution, not genetic.

Comment author: Lumifer 17 July 2015 03:55:20PM 0 points [-]

I understand that. Memes can die or be castrated, too :-/

Comment author: jacob_cannell 17 July 2015 08:34:26PM 0 points [-]

In your earlier comment you said "evolution requires selection pressure". There is of course selection pressure in memetic evolution. Completely eliminating memetic selection pressure is not even wrong - because memetic selection is closely connected to learning or knowledge creation. You can't get rid of it.

Comment author: gjm 17 July 2015 04:41:58PM 0 points [-]

"Evolve" has (at least) two meanings. One is the Darwinian one where heritable variation and selection lead to (typically) ever-better-adapted entities. But "evolve" can also just mean "vary gradually". It could be that values aren't (or wouldn't be, in a posthuman era) subject to anything much like biological evolution; but they still might vary. (In biological terms, I suppose that would be neutral drift.)

Comment author: Lumifer 17 July 2015 04:52:49PM 1 point [-]

Well, we are talking about the Darwinian meaning, aren't we? "Vary gradually", aka "drift" is not contentious at all.

Comment author: gjm 17 July 2015 04:57:34PM 1 point [-]

I'm not sure we are talking specifically about the Darwinian meaning, actually. Well, I guess you are, given your comment above! But I don't think the rest of the discussion was so specific. Kaj_Sotala said:

if our values are an adaptation to our environment (including the society and culture we live in), then it would suggest that as long as we keep evolving and developing, our values will keep changing and evolving with us, without there being any meaningful endpoint.

which seems to me to describe a situation of gradual change in our values that doesn't need to be driven by anything much like biological evolution. (E.g., it could happen because each generation's people constantly make small more-or-less-deliberate adjustments in their values to suit the environment they find themselves in.)

(Kaj's comment does actually describe a resource-constrained situation, but the resource constraints aren't directly driving the evolution of values he describes.)

Comment author: Lumifer 17 July 2015 05:06:58PM 2 points [-]

We're descending into nit-pickery. The question of whether values will change in the future is a silly one, as the answer "Yes" is obvious. The question of whether values will evolve in the Darwinian sense in the posthuman era (with its presumed lack of scarcity, etc.) is considerably more interesting.

Comment author: gjm 17 July 2015 06:13:21PM 2 points [-]

I agree that it's more interesting. But I'm not sure it was the question actually under discussion.

Comment author: David_Bolin 18 July 2015 01:56:42PM 1 point [-]

This sounds like Robin Hanson's idea of the future. Eliezer would probably agree that in theory this would happen, except that he expects one superintelligent AI to take over everything and impose its values on the entire future of everything. If Eliezer's future is definitely going to happen, then even if there is no truly ideal set of values, we would still have to make sure that the values that are going to be imposed on everything are at least somewhat acceptable.

Comment author: [deleted] 18 July 2015 07:26:55PM 0 points [-]

Possibly we shouldn't so much try to figure out what we'd prefer the final state to look like, but rather what we'd prefer the overall process to look like.

Well, the general Good Idea in that model is that events or actions shouldn't be optimized to drift faster or more discontinuously than people's valuations of those events, so that the society existing at any given time is more-or-less getting what it wants while also evolving towards something else.

Of course, a compromise between the different "values" (scare-quotes because I don't think the moral-philosophy usage of the word points at anything real) of society's citizens is still a vast improvement on "a few people dominate everyone else and impose their own desires by force and indoctrination", which is what we still have to a great extent.

Comment author: hairyfigment 13 July 2015 07:29:02PM 5 points [-]

Godsdammit, people, "thrive" is the whole problem.

Comment author: shminux 13 July 2015 09:23:51PM 1 point [-]

Yes, yes it is. Even once you can order all the central examples of thriving, the "mere addition" operation will tip them toward the noncentral repugnant ones. Hence why one might have to live with the lack of self-consistency.

Comment author: [deleted] 16 July 2015 04:05:19PM *  3 points [-]

You could just not be utilitarian, especially in the specific form of not maximizing a metaphysical quantity like "happy experience", thus leaving you with no moral obligations to counterfactual (ie: nonexistent) people, thus eliminating the Mere Addition Paradox.

Ok, I know that given the chemistry involved in "happy", it's not exactly a metaphysical or non-natural quantity, but it bugs me that utilitarianism says to "maximize Happy" even when, precisely as in the Mere Addition Paradox, no individual consciousness will actually experience the magnitude of Happy attained via utilitarian policies. How can a numerical measure of a subjective state of consciousness be valuable if nobody experiences the total numerical measure? It seems more sensible to restrict yourself to only moralizing about people who already exist, thus winding up closer to post-hoc consequentialism than traditional utilitarianism.

Comment author: shminux 16 July 2015 04:37:56PM 0 points [-]

How can a numerical measure of a subjective state of consciousness be valuable if nobody experiences the total numerical measure?

The mere addition paradox also manifests for a single person. Imagine the state you are in. Now imagine if it can be (subjectively) improved by some means (e.g. fame, company, drugs, ...). Keep going. Odds are, you would not find a maximum, not even a local one. After a while, you might notice that, despite incremental improvements, the state you are in is actually inferior to the original, if you compare them directly. Mathematically, one might model this as the improvement drive being non-conservative and so no scalar map from states to scalar utility exists. Whether it is worth pushing this analogy any further, I am not sure.

Comment author: [deleted] 17 July 2015 01:27:48AM 0 points [-]

The mere addition paradox also manifests for a single person. Imagine the state you are in. Now imagine if it can be (subjectively) improved by some means (e.g. fame, company, drugs, ...). Keep going. Odds are, you would not find a maximum, not even a local one.

Hill climbing always finds a local maximum, but that might well look very disappointing, wasteful of effort, and downright stupid when compared to some smarter means of spending the effort on finding a way to live a better life.

Comment author: chaosmage 13 July 2015 03:05:06PM *  3 points [-]

Impressive.

Couldn't another class of solutions be that resolutions of inconsistencies cannot reduce the complexity of the agent's morality? I.e. morality has to be (or tend to become) not only (more) consistent, but also (more) complex, sort of like an evolving body of law rather than like the Ten Commandments?

Comment author: Stuart_Armstrong 13 July 2015 03:58:56PM 2 points [-]

Actually, I have suggested something like that, I now recall... It's the line "Require them to be around the same expected complexity as human values." in Crude Measures.

Comment author: Stuart_Armstrong 13 July 2015 03:15:38PM 2 points [-]

That is a solution that springs to mind - once we've thought of it in these terms :-) To my knowledge, it hasn't been suggested before.

Comment author: Kaj_Sotala 13 July 2015 05:14:16PM 1 point [-]

morality has to be (or tend to become) not only (more) consistent, but also (more) complex

It's not clear to me that one can usefully distinguish between "more consistent" and "less complex".

Suppose that someone felt that morality dictated one set of behaviors for people of one race, and another set of behaviors for people of another race. Eliminating that distinction to have just one set of morals that applied to everyone might be considered by some to increase consistency, while reducing complexity.

That said, it all depends on what formal definition one adopts for consistency in morality: this doesn't seem to me a well-defined concept, even though people talk about it as if it was. (Clearly it can't be the same as consistency in logic. An inconsistent logical system lets you derive any conclusion, but even if a human is inconsistent WRT some aspect of their morality, it doesn't mean they wouldn't be consistent in others. Inconsistency in morality doesn't make the whole system blow up the way logical inconsistency does.)

Comment author: royf 17 July 2015 11:23:19AM *  2 points [-]

It seems that your research is coming around to some concepts that are at the basis of mine. Namely, that noise in an optimization process is a constraint on the process, and that the resulting constrained optimization process avoids the nasty properties you describe.

Feel free to contact me if you'd like to discuss this further.

Comment author: Stuart_Armstrong 17 July 2015 11:33:35AM 2 points [-]

I fear I will lack time for many months :-( Send me another message if you want to talk later.

Comment author: [deleted] 16 July 2015 05:11:29AM *  2 points [-]

What do I mean by that? Well, imagine you're trying to reach reflective equilibrium in your morality. You do this by using good meta-ethical rules, zooming up and down at various moral levels, making decisions on how to resolve inconsistencies, etc... But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup.

Wait... what? No.

You don't solve the value-alignment problem by trying to write down your confusions about the foundations of moral philosophy, because writing down confusion still leaves you fundamentally confused. No amount of intelligence can solve an ill-posed problem in some way other than pointing out that the problem is ill-posed.

You solve it by removing the need to do moral philosophy and instead specifying a computation that corresponds to your moral psychology and its real, actually-existing, specifiable properties.

And then telling metaphysics to take a running jump to boot, and crunching down on Strong Naturalism brand crackers, which come in neat little bullet shapes.

Comment author: hairyfigment 19 July 2015 05:32:38PM 0 points [-]

Near as I can tell, you're proposing some "good meta-ethical rules," though you may have skipped the difficult parts. And I think the claim, "you stop when your morality is perfectly self-consistent," was more a factual prediction than an imperative.

Comment author: [deleted] 20 July 2015 01:19:03PM 0 points [-]

I didn't skip the difficult bits, because I didn't propose a full solution. I stated an approach to dissolving the problem.

Comment author: hairyfigment 22 July 2015 06:00:14AM 0 points [-]

And do you think that approach differs from the one you quoted?

Comment author: [deleted] 22 July 2015 12:43:21PM 0 points [-]

It involves reasoning about facts rather than metaphysics.

Comment author: TheAncientGeek 19 July 2015 04:35:40PM *  0 points [-]

And will that model have the right counteractfactuals? Will it evolve under changing conditions the same way that the original would.

Comment author: [deleted] 20 July 2015 01:18:35PM 0 points [-]

If you modelled the real thing correctly, then yes, of course it will.

Comment author: TheAncientGeek 21 July 2015 07:42:47AM *  0 points [-]

Yes, of course, but then the questions is: :what is the difference between modelling it correctly and solving moral philosophy? A correct model has to get a bunch of counterfactuals correct, and not just match an empirical dataset.

Comment author: [deleted] 21 July 2015 12:37:01PM *  0 points [-]

Well, attempting to account for your grammar and figure out what you meant...

A correct model has to get a bunch of counterfactuals correct, and not just match an empirical dataset.

Yes, and? Causal modelling techniques get counterfactuals right-by-design, in the sense that a correct causal model by definition captures counterfactual behavior, as studied across controlled or intervened experiments.

I mean, I agree that most currently-in-use machine learning techniques don't bother to capture causal structure, but on the upside, that precise failure to capture and compress causal structure is why those techniques can't lead to AGI.

what is the difference between modelling it currently, and solving moral philosophy?

I think it's more accurate to say that we're trying to dissolve moral philosophy in favor of a scientific model of human evaluative cognition. Surely to a moral philosopher this will sound like a moot distinction, but the precise difference is that the latter thing creates and updates predictive models which capture counterfactual, causal knowledge, and which thus can be elaborated into an explicit theory of morality that doesn't rely on intuition or situational framing to work.

Comment author: TheAncientGeek 21 July 2015 01:25:35PM 0 points [-]

As far as I can tell, human intuition is the territory you would be modelling, here. In particular, when dealing with counterfactuals, since it would be unethical to actually set up trolley problems.

BTW, there is nothing to stop moral philosophy being predictive, etc.

Comment author: [deleted] 21 July 2015 01:32:03PM 0 points [-]

As far as I can tell, human intuition is the territory you would be modelling, here.

No, we're trying to capture System 2's evaluative cognition, not System 1's fast-and-loose, bias-governed intuitions.

Comment author: TheAncientGeek 21 July 2015 08:11:51PM *  0 points [-]

Wrong kind of intuition

If you have an extenal standard, as you do with probability theory and logic, system 2 can learn utilitarianism, and its performance can be checked against the external standard.

But we don't have an agreed standard to compare system 1 ethical reasoning against, because we haven't solved ,moral philosophy. What we have is system 1 coming up with speculative theories,which have to be checked against intuition, meaning an internal standard

Comment author: [deleted] 21 July 2015 11:23:20PM 0 points [-]

Again, the whole point of this task/project/thing is to come up with an explicit theory to act as an external standard for ethics. Ethical theories are maps of the evaluative-under-full-information-and-individual+social-rationality territory.

Comment author: TheAncientGeek 22 July 2015 07:45:58AM *  0 points [-]

Again, the whole point of this task/project/thing is to come up with an explicit theory to act as an external standard for ethics. 

And that is the whole point of moral philosophy..... so it's sounding like a moot distinction.

Ethical theories are maps of the evaluative-under-full-information-and-individual+social-rationality territory.

You don't like the word intuition, but the fact remains that while you are building your theory, you will have to check it against humans ability to give answers without knowing how they arrived at them. Otherwise you end up with a clear, consistent theory that nobody finds persuasive.

Comment author: Pentashagon 23 July 2015 03:57:50AM 1 point [-]

But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup.

Or once you lose your meta-mortal urge to reach a self-consistent morality. This may not be the wrong (heh) answer along a path that originally started toward reaching self-consistent morality.

Or, more simply, the system could get hacked. When exploring a potential future world, you could become so enamoured of it, that you overwrite any objections you had. It seems very easy for humans to fall into these traps - and again, once you lose something of value in your system, you don't tend to get if back.

Is it a trap? If the cost of iterating the "find a more self-consistent morality" loop for the next N years is greater than the expected benefit of the next incremental change toward a more consistent morality for those same N years, then perhaps it's time to stop. Just as an example, if the universe can give us 10^20 years of computation, at some point near that 10^20 years we might as well spend all computation on directly fulfilling our morality instead of improving it. If at 10^20 - M years we discover that, hey, the universe will last another 10^50 years that tradeoff will change and it makes sense to compute even more self-consistent morality again.

Similarly, if we end up in a siren world it seems like it would be more useful to restart our search for moral complexity by the same criteria; it becomes worthwhile to change our morality again because the cost of continued existence in the current morality outweighs the cost of potentially improving it.

Additionally, I think that losing values is not a feature of reaching a more self-consistent morality. Removing a value from an existing moral system does not make the result consistent with the original morality; it is incompatible with reference to that value. Rather, self-consistent morality is approached by better carving reality at its joints in value space; defining existing values in terms of new values that are the best approximation to the old value in the situations where it was valued, while extending morality along the new dimensions into territory not covered by the original value. This should make it possible to escape from siren worlds by the same mechanism; entering a siren world is possible only if reality was improperly carved so that the siren world appeared to fulfill values along dimensions that it eventually did not, or that the siren world eventually contradicted some original value due to replacement values being an imperfect approximation. Once this disagreement is noticed it should be possible to more accurately carve reality and notice how the current values have become inconsistent with previous values and fix them.

Comment author: Stuart_Armstrong 23 July 2015 09:59:09AM 0 points [-]

Or once you lose your meta-mortal urge to reach a self-consistent morality. This may not be the wrong (heh) answer along a path that originally started toward reaching self-consistent morality.

The problem is that un-self-consistent morality is unstable under general self improvement (and self-improvement is very general, see http://lesswrong.com/r/discussion/lw/mir/selfimprovement_without_selfmodification/ ).

The main problem with siren worlds is that humans are very vulnerable to certain types of seduction/trickery, and it's very possible AIs with certain structures and goals would be equally vulnerable to (different) tricks. Defining what is a legit change and what isn't is the challenge here.

Comment author: Pentashagon 25 July 2015 03:39:12AM 1 point [-]

The problem is that un-self-consistent morality is unstable under general self improvement

Even self-consistent morality is unstable if general self improvement allows for removal of values, even if removal is only a practical side effect of ignoring a value because it is more expensive to satisfy than other values. E.g. we (Westerners) generally no longer value honoring our ancestors (at least not many of them), even though it is a fairly independent value and roughly consistent with our other values. It is expensive to honor ancestors, and ancestors don't demand that we continue to maintain that value, so it receives less attention. We also put less value on the older definition of honor (as a thing to be defended and fought for and maintained at the expense of convenience) that earlier centuries had, despite its general consistency with other values for honesty, trustworthiness, social status, etc. I think this is probably for the same reason; it's expensive to maintain honor and most other values can be satisfied without it. In general, if U(moresatisfactionofvalue1) > U(moresatisfactionofvalue2) then maximization should tend to ignore value2 regardless of its consistency. If U(makevaluesselfconsistentvalue) > U(satisfyinganyother_value) then the obvious solution is to drop the other values and be done.

A sort of opposite approach is "make reality consistent with these pre-existing values" which involves finding a domain in reality state space under which existing values are self-consistent, and then trying to mold reality into that domain. The risk (unless you're a negative utilitarian) is that the domain is null. Finding the largest domain consistent with all values would make life more complex and interesting, so that would probably be a safe value. If domains form disjoint sets of reality with no continuous physical transitions between them then one would have to choose one physically continuous sub-domain and stick with it forever (or figure out how to switch the entire universe from one set to another). One could also start with preexisting values and compute a possible world where the values are self-consistent, then simulate it.

Comment author: Stuart_Armstrong 27 July 2015 04:02:02PM 2 points [-]

It is expensive to honor ancestors, and ancestors don't demand that we continue to maintain that value, so it receives less attention.

That's something different - a human trait that makes us want to avoid expensive commitments while paying them lip service. A self consistent system would not have this trait, and would keep "honor ancestors" in it, and do so or not depending on the cost and the interaction with other moral values.

If you want to look at even self-consistent systems being unstable, I suggest looking at social situations, where other entities reward value-change. Or a no-free-lunch result of the type "This powerful being will not trade with agents having value V."

Comment author: [deleted] 27 July 2015 03:53:40AM 0 points [-]

E.g. we (Westerners) generally no longer value honoring our ancestors (at least not many of them), even though it is a fairly independent value and roughly consistent with our other values. It is expensive to honor ancestors, and ancestors don't demand that we continue to maintain that value, so it receives less attention.

This sweeps the model-dependence of "values" under the rug. The reason we don't value honoring our ancestors is that we don't believe they continue to exist after death, and so we don't believe social relations of any kind can be carried on with them.

Comment author: CCC 27 July 2015 08:51:38AM *  0 points [-]

The reason we don't value honoring our ancestors is that we don't believe they continue to exist after death

This could be a case of typical mind fallacy. I can point to a number of statistical studies that show that a large number of Westerners claim that their ancestors do continue to exist after death.

Anyone who believes that some sort of heaven or hell exists.

And a lot of these people nonetheless don't accord their ancestors all that much in the way of honour...

Comment author: Jiro 27 July 2015 06:23:23PM 0 points [-]

I can point to a number of statistical studies that show that a large number of Westerners claim that their ancestors do continue to exist after death.

They may believe it, but they don't alieve it.

Comment author: Lumifer 27 July 2015 06:29:08PM 2 points [-]

How do you know?

Comment author: Jiro 27 July 2015 06:36:07PM *  0 points [-]

Because the things that people would do if they believed in and acted as though they believe in life after death are profoundly weird, and we don't see any of that around. Can you imagine the same people who say that the dead "went to a better place" being sad that someone has not died, for instance? (Unless they're suffering so much or causing so much suffering that death is preferable even without an afterlife.)

Comment author: Lumifer 27 July 2015 06:39:29PM 2 points [-]

Because the things that people would do if they believed in and acted as though they believe in life after death are profoundly weird, and we don't see any of that around.

I don't see why they need to be "profoundly weird". Remember, this subthread started with "honoring ancestors". The Chinese culture is probably the most obvious one where honoring ancestors is a big thing. What "profoundly weird" things does it involve?

Comment author: [deleted] 28 July 2015 12:30:52AM 0 points [-]

What "profoundly weird" things does it involve?

Given that this is the Chinese we're talking about, expecting one's ancestors to improve investment returns in return for a good sacrifice.

Comment author: Jiro 27 July 2015 06:53:25PM 0 points [-]

Sorry, I don't know enough about Chinese culture to answer. But I'd guess that either they do have weird beliefs (that I'm not familiar with so I can't name them), or they don't and honoring ancestors is an isolated thing they do as a ritual. (The answer may be different for different people, of course.)

Comment author: David_Bolin 28 July 2015 06:18:38AM 1 point [-]

You are assuming that human beings are much more altruistic than they actually are. If your wife has the chance of leaving you and having a much better life where you will never hear from her again, you will not be sad if she does not take the chance.

Comment author: CCC 28 July 2015 08:07:35AM 0 points [-]

Because the things that people would do if they believed in and acted as though they believe in life after death are profoundly weird

Okay, now I'm curious. What exactly do you think that people would do if they believed in life after death?

Comment author: Jiro 28 July 2015 02:39:43PM *  1 point [-]

-- Be happy that people have died and sad that they remain alive (same qualifiers as before: person is not suffering so much that even nothingness is preferable, etc.) and the reverse for people who they don't like

-- Want to kill people to benefit them (certainly, we could improve a lot of third world suffering by nuking places, if they have a bad life but a good afterlife. Note that the objection "their culture would die out" would not be true if there is an afterlife.)

-- In the case of people who oppose abortions because fetuses are people (which I expect overlaps highly with belief in life after death), be in favor of abortions if the fetus gets a good afterlife

-- Be less willing to kill their enemies the worse the enemy is

-- Do extensive scientific research trying to figure out what life after death is like.

-- Genuinely think that having their child die is no worse than having their child move away to a place where the child cannot contact them

-- Drastically reduce how bad they think death is when making public policy decisions; there would be still some effect because death is separation and things that cause death also cause suffering, but we act as though causing death makes some policy uniquely bad and preventing it uniquely good

-- Not oppose suicide

Edit: Support the death penalty as more humane than life imprisonment.

(Some of these might not apply if they believe in life after death but also in Hell, but that has its own bizarre consequences.)

Comment author: [deleted] 28 July 2015 12:29:16AM *  -1 points [-]

I can point to a number of statistical studies that show that a large number of Westerners claim that their ancestors do continue to exist after death.

No, they believe-in-the-belief that their ancestors continue to exist after death. They rarely, and doubtingly, if ever, generate the concrete expectation that anything they can do puts them in causal contact with the ghosts of their ancestors, such that they would expect to see something different from their ancestors being permanently gone.

Comment author: [deleted] 23 July 2015 12:15:10PM *  0 points [-]

The main problem with siren worlds

Actually, I'd argue the main problem with "Siren Worlds" is the assumption that you can "envision", or computationally simulate, an entire possible future country/planet/galaxy all at once, in detail, in such time that any features at all would jump out to a human observer.

That kind of computing power would require, well, something like the mass of a whole country/planet/galaxy and then some. Even if we generously assume a very low fidelity of simulation, comparable with mere weather simulations or even mere video games, we're still talking whole server/compute farms being turned towards nothing but the task of pretending to possess a magical crystal ball for no sensible reason.

Comment author: Pentashagon 25 July 2015 03:01:28AM 0 points [-]

tl;dr: human values are already quite fragile and vulnerable to human-generated siren worlds.

Simulation complexity has not stopped humans from implementing totalitarian dictatorships (based on divine right of kings, fundamentalism, communism, fascism, people's democracy, what-have-you) due to envisioning a siren world that is ultimately unrealistic.

It doesn't require detailed simulation of a physical world, it only requires sufficient simulation of human desires, biases, blind spots, etc. that can lead people to abandon previously held values because they believe the siren world values will be necessary and sufficient to achieve what the siren world shows them. It exploits a flaw in human reasoning, not a flaw in accurate physical simulation.

Comment author: [deleted] 26 July 2015 06:43:02PM 0 points [-]

That's shifting the definition of "siren world" from "something which looks very nice when simulated in high-resolution but has things horrendously wrong on the inside" to a very standard "Human beings imagine things in low-resolution and don't always think them out clearly."

You don't need to pour extra Lovecraft Sauce on your existing irrationalities just for your enjoyment of Lovecraft Sauce.

Comment author: Stuart_Armstrong 23 July 2015 04:12:05PM 0 points [-]

It depends a lot on how the world is being shown. If the AI is your "guide", it can show you the seductive features of the world, or choose the fidelity of the simulation in just the right ways in the right places, etc... Without needing a full fledged simulation. You can have a siren world in text, just through the AI's (technically accurate) descriptions, given your questions.

Comment author: [deleted] 24 July 2015 01:55:32AM 0 points [-]

You're missing my point, which is that proposing you've got "an AI" (with no dissolved understanding of how the thing actually works underneath what you'd get from a Greg Egan novel) which "simulates" possible "worlds" is already engaging in several layers of magical thinking, and you shouldn't be surprised to draw silly conclusions from magical thinking.

Comment author: Wei_Dai 25 July 2015 06:49:07AM 0 points [-]

I think I'm not getting your point either. Isn't Stuart just assuming standard decision theory, where you choose actions by predicting their consequences and then evaluating your utility function over your predictions? Are you arguing that real AIs won't be making decisions like this?

Comment author: [deleted] 27 July 2015 03:49:15AM 0 points [-]

Isn't Stuart just assuming standard decision theory, where you choose actions by predicting their consequences and then evaluating your utility function over your predictions? Are you arguing that real AIs won't be making decisions like this?

While I do think that real AIs won't make decisions in this fashion, that aside, as I had understood Stuart's article, the point was not to address decision theory, which is a mathematical subject, but instead that he hypothesized a scenario in which "the AI" was used to forecast possible future events, with humans in the loop doing the actual evaluation based on simulations realized in high detail, to the point that the future-world simulation would be as thorough as a film might be today, at which point it could appeal to people on a gut level and bypass their rational faculties, but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.

The "But also..." part is the bit I actually object to.

Comment author: Stuart_Armstrong 27 July 2015 01:08:47PM 1 point [-]

Let's focus on a simple version, without the metaphors. We're talking about an AI presenting humans with consequences of a particular decision, with humans then making the final decision to go along with it or not.

So what is happening is that various possible future worlds will be considered by the AI according to its desirability criteria, these worlds will be described to humans according to its description criteria, and humans will choose according to whatever criteria we use. So we have a combination of criteria that result in a final decision. A siren world is a world that ranks very high in these combined criteria but is actually nasty.

If we stick to that scenario and assume the AI is truthful, the main siren world generator is the ability of the AI to describe them in ways that sound very attractive to humans. Since human beliefs and preferences are not clearly distinct., this ranges from misleading (incorrect human beliefs) to actively seductive (influencing human preferences to favour these worlds).

The higher the bandwidth the AI has, the more chance it has of "seduction", or of exploiting known or unknown human irrationalities (again, there's often no clear distinction between exploiting irrationalities for beliefs or preferences).

One scenario - Paul Christiano's - is a bit different but has essentially unlimited bandwidth (or, more precisely, has an AI estimating the result of a setup that has essentially unlimited bandwidth).

but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.

This category can include irrationalities we don't yet know about, better exploitation of irrationalities we do know about, and a host of speculative scenarios about hacking the human brain, which I don't want to rule out completely at this stage.

Comment author: [deleted] 28 July 2015 12:27:47AM -2 points [-]

We're talking about an AI presenting humans with consequences of a particular decision, with humans then making the final decision to go along with it or not.

No. We're not. That's dumb. Like, sorry to be spiteful, but that is already a bad move. You do not treat any scenario involving "an AI", without dissolving the concept, as desirable or realistic. You have "an AI", without having either removed its "an AI"-ness (in the LW sense of "an AI") entirely or guaranteed Friendliness? You're already dead.

Comment author: Stuart_Armstrong 28 July 2015 10:23:55AM 1 point [-]

Can we assume, that since I've been working all this time on AI safety, that I'm not an idiot? When presenting a scenario ("assume AI contained, and truthful") I'm investigating whether we have safety within the terms of that scenario. Which here we don't, so we can reject attempts aimed at that scenario without looking further. If/when we find a safe way to do that within the scenario, then we can investigate whether that scenario is achievable in the first place.

Comment author: hairyfigment 28 July 2015 05:13:22AM 0 points [-]

I'm puzzled. Are you sure that's your main objection? Because,

  • you make a different objection (I think) in your response to the sibling, and

  • it seems to me that since any simulation of this kind will be incomplete, and I assume the AI will seek the most efficient way to achieve its programmed goals, the scenario you describe is in fact horribly dangerous; the AI has an incentive to deceive us. (And somewhat like Wei Dai, I thought we were really talking about an AI goal system that talks about extrapolating human responses to various futures.)

It would be completely unfair of me to focus on the line, "as thorough as a film might be today". But since it's funny, I give you Cracked.com on Independence Day.

Comment author: [deleted] 30 July 2015 03:00:12PM 0 points [-]

To be honest, I was assuming we're not talking about a "contained" UFAI, since that's, you know, trivially unsafe.

Comment author: Wei_Dai 27 July 2015 07:15:29AM 0 points [-]

as I had understood Stuart's article, the point was not to address decision theory, which is a mathematical subject, but instead that he hypothesized a scenario in which "the AI" was used to forecast possible future events, with humans in the loop doing the actual evaluation based on simulations realized in high detail, to the point that the future-world simulation would be as thorough as a film might be today, at which point it could appeal to people on a gut level and bypass their rational faculties

It's true that Stuart wrote about Oracle AI in his Siren worlds post, but I thought that was mostly just to explain the idea of what a Siren world is. Later on in the post he talks about how Paul Christiano's take on indirect normativity has a similar problem. Basically the problem can occur if an AI tries to model a human as accurately as possible, then uses the model directly as its utility function and tries to find a feasible future world that maximizes the utility function.

It seems plausible that even if the AI couldn't produce a high resolution simulation of a Siren world W, it could still infer (using various approximations and heuristics) that with high probability its utility function assigns a high score to W, and choose to realize W on that basis. It also seems plausible that an AI eventually would have enough computing power to produce high resolution simulations of Siren worlds, e.g., after it has colonized the galaxy, so the problem could happen at that point if not before.

but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.

What extra-scary features are you referring to? (Possibly I skipped over the parts you found objectionable since I was already familiar with the basic issue and didn't read Stuart's post super carefully.)

Comment author: [deleted] 26 July 2015 06:53:55PM 0 points [-]

Are you arguing that real AIs won't be making decisions like this?

Yes. I think that probabilistic backwards chaining, aka "planning as inference", is the more realistic way to plan, and better represented in the current literature.

Comment author: ChristianKl 23 July 2015 12:28:36PM 0 points [-]

Actually, I'd argue the main problem with "Siren Worlds" is the assumption that you can "envision", or computationally simulate, an entire possible future country/planet/galaxy all at once, in detail, in such time that any features at all would jump out to a human observer.

That's not needed for a siren world. Putting human brains into vats and stimulating their pleasure centers doesn't require much computing power.

Comment author: [deleted] 23 July 2015 03:05:55PM *  0 points [-]

Wireheading isn't a siren world, though. The point of the concept is that it looks like what we want, when we look at it from the outside, but actually, on the inside, something is very wrong. Example: a world full of people who are always smiling and singing about happiness because they will be taken out and shot if they don't (Lilly Weatherwax's Genua comes to mind). If the "siren world" fails to look appealing to (most) human sensibilities in the first place, as with wireheading, then it's simply failing at siren.

The point is that we're supposed to worry about what happens when we can let computers do our fantasizing for us in high resolution and real time, and then put those fantasies into action, as if we could ever actually do this, because there's a danger in letting ourselves get caught up in a badly un-thought-through fantasy's nice aspects without thinking about what it would really be like.

The problem being, no, we can't actually do that kind of "automated fantasizing" in any real sense, for the same reason that fantasies don't resemble reality: to fully simulate some fantasy in high resolution (ie: such that choosing to put it into action would involve any substantial causal entanglement between the fantasy and the subsequent realized "utopia") involves degrees of computing power we just won't have and which it just wouldn't even be efficient to use that way.

Backwards chaining from "What if I had a Palantir?" does lead to thinking, "What if Sauron used it to overwhelm my will and enthrall me?", which sounds wise except that, "What if I had a Palantir?" really ought to lead to, "That's neither possible nor an efficient way to get what I want."

Comment author: TheAncientGeek 19 July 2015 04:27:41PM 0 points [-]

The common thread I am noticing is the assumption of singletonhood.

Technologically, if you have a process that could go wrong, you run several in parallel.

In human society, an ethical innovator can run an idea past the majority to seems it sounds like an improved version of what they believe already.

It's looking, again, like group rationality is better.

.

Comment author: Stuart_Armstrong 20 July 2015 09:39:05AM 0 points [-]

Groups converge as well. We can't assume AI groups will have the barriers to convergence that human groups currently do (just as we can't assume that AIs have the barriers to convergence that humans do).

Comment author: TheAncientGeek 21 July 2015 08:24:54AM *  0 points [-]

I'm not doubting that groups converge, I am arguing that when a group achieves reflective equilibrium, that is much more meaningful than a singleton doing so, at least as long as there is variation within the group.

Comment author: Stuart_Armstrong 21 July 2015 10:35:45AM 0 points [-]

There are bad ways to achieve group convergence.

Comment author: TheAncientGeek 21 July 2015 12:38:11PM 0 points [-]

In absolute terms, maybe, but that doesn't stop it being relatively better.

Comment author: Stuart_Armstrong 21 July 2015 01:49:43PM 0 points [-]

What you are trying to do is import positive features from the convergence of human groups (eg the fact that more options are likely to have been considered, the fact that productive discussion is likely to have happened...) into the convergence of AI groups, without spelling them out precisely. Unless we have a clear handle on what, among humans, causes these positive features, we have no real reason to suspect they will happen in AI groups as well.

Comment author: TheAncientGeek 21 July 2015 04:49:25PM *  0 points [-]

The two concrete examples you gave weren't what I had in mind. I was addressing the problem of an AI "losing" values during extrapolation,and it looks like a real reason to me. If you want to prevent an AI undergoing value drift during extrapolation, keep an extrapolated one as a reference. Two is a group minimally.

There may well be other advantages to doing rationality and ethics in groups, and yes, that needs research, and no, that isnt a show stopper.

Comment author: Sniffnoy 14 July 2015 12:08:14AM *  0 points [-]

I don't think anyone has proposed any self-referential criteria as being the point of Friendly AI? It's just that such self-referential criteria as reflective equilibrium are a necessary condition which lots of goal setups don't even meet. (And note that just because you're trying to find a fixpoint, doesn't necessarily mean you have to try to find it by iteration, if that process has problems!)

Comment author: dankane 14 July 2015 04:55:42PM 2 points [-]

It's just that such self-referential criteria as reflective equilibrium are a necessary condition

Why? The only example of adequately friendly intelligent systems that we have (i.e. us) don't meet this condition. Why should reflective equilibrium be a necessary condition for FAI?

Comment author: Stuart_Armstrong 15 July 2015 09:53:31AM 0 points [-]

Because FAI's can change themselves very effectively in ways that we can't.

It might be that human brain in computer software would have the same issues.

Comment author: Kaj_Sotala 15 July 2015 01:02:16PM *  2 points [-]

Because FAI's can change themselves very effectively in ways that we can't.

Doesn't mean the FAI couldn't remain genuinely uncertain about some value question, or consider it not worth solving at this time, or run into new value questions due to changed circumstances, etc.

All of those could prevent reflective equilibria, while still being compatible with the ability for extensive self-modification.

Comment author: Stuart_Armstrong 15 July 2015 03:34:23PM 0 points [-]

All of those could prevent reflective equilibria, while still being compatible with the ability for extensive self-modification.

It's possible. They feel very unstable, though.