Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: DataPacRat 04 August 2015 12:59:25AM 0 points [-]

What are the forms of math called where you can compare numbers, such as to say that 3 is bigger than 2, but can't necessarily add numbers - that is, 2+2 may or may not be bigger than 3?

Comment author: Stuart_Armstrong 04 August 2015 09:06:03AM 0 points [-]

As banx said, take ordinal numbers (and remove ordinal addition). Classical ordinal numbers can be added, but they can't be scaled - you can't generally have x% of an ordinal number.

Comment author: CronoDAS 03 August 2015 11:34:03PM *  3 points [-]

My own resolution to the "repugnant conclusion" is that the goodness of a population isn't a state function: you can't know if a population is better than another simply by looking at the well-being of each currently existing person right now. Instead, you have to know the history of the population as well as its current state.

Comment author: Stuart_Armstrong 04 August 2015 09:02:23AM 1 point [-]

Very integral reasoning ^_^

Comment author: Dagon 03 August 2015 03:29:40PM *  1 point [-]

I suspect much of the problem is that humans aren't very good at consistency nor calculation. Scope insensitivity (and other errors) cause us to accept steps that lead to incorrect results once aggregated. If you can actually define your units and measurements, I strongly expect that the sum of the steps will equal the conclusion, and you will be able to identify the steps which are unacceptible (or accept the conclusion).

I'd advise against the motivated reasoning that "if you don't like the conclusion, you have to find a step to reject", but rather the "I notice I'm confused that I have different evaluations of the steps and the aggregate, so I've probably miscalculated."

And if this is the case (the mismatch is caused by compounded rounding errors rather than a fundamental disconnect), then it seems unlikely to be a useful solution to AI problems. Unless we fix the problems in our calculation, not just use the method we've proven doesn't work.

Comment author: Stuart_Armstrong 04 August 2015 09:01:41AM 0 points [-]

"I notice I'm confused that I have different evaluations of the steps and the aggregate, so I've probably miscalculated."

But then you have to choose either to correct the steps to get in line with the aggregate (integral reasoning), or the aggregate to get in line with the steps (differential reasoning).

Integral vs differential ethics, continued

5 Stuart_Armstrong 03 August 2015 01:25PM

I've talked earlier about integral and differential ethics, in the context of population ethics. The idea is that the argument for the repugnant conclusion (and its associate, the very repugnant conclusion) is dependent on a series of trillions of steps, each of which are intuitively acceptable (adding happy people, making happiness more equal), but reaching a conclusion that is intuitively bad - namely, that we can improve the world by creating trillions of people in torturous and unremitting agony, as long as balance it out by creating enough happy people as well.

Differential reasoning accepts each step, and concludes that the repugnant conclusions are actually acceptable, because each step is sound. Integral reasoning accepts that the repugnant conclusion is repugnant, and concludes that some step along the way must therefore be rejected.

Notice that key word, "therefore". Some intermediate step is rejected, but not for intrinsic reasons, but purely because of the consequence. There is nothing special about the step that is rejected, it's just a relatively arbitrary barrier to stop the process (compare with the paradox of the heap).

Indeed, things can go awry when people attempt to fix the repugnant conclusion (a conclusion they rejected through integral reasoning) using differential methods. Things like the "person-affecting view" have their own ridiculousness and paradoxes (it's ok to bring a baby into the world if it will have a miserable life; we don't need to care about future generations if we randomise conceptions, etc...) and I would posit that it's because they are trying to fix global/integral issues using local/differential tools.

The relevance of this? It seems that integral tools might be better suited to deal with the bad convergence of AI problem. We could set up plausibly intuitive differential criteria (such as self-consistency), but institute overriding integral criteria that can override these if they go too far. I think there may be some interesting ideas in that area, potentially. The cost is that integral ideas are generally seen as less elegant, or harder to justify.

Comment author: eli_sennesh 03 August 2015 04:09:25AM 0 points [-]

Firstly, let me state that I'm being entirely serious here:

This is a reason to suspect it will not be easy to distinguish human beliefs and values ^_^

What makes you think that any such thing as "values" actually exists, ie: that your brain actually implements some function that assigns a real number to world-states?

Comment author: Stuart_Armstrong 03 August 2015 11:22:47AM 0 points [-]

It's clear that people have ordinal preferences over certain world-states, and that many of these preferences are quite stable from day to day. And people have some ability to trade these off with probabilities, suggesting cardinal preferences as well. It seems correct and useful to refer to this as "values", at least approximately.

On the other hand, it's clear that our brains do not implement some function that assigns a real number to world-states. That's one of the reasons that it's so hard to distinguish human values in the first place.

Comment author: eli_sennesh 30 July 2015 03:56:35PM 0 points [-]

Ah. Then here's the difference in assumptions: I don't believe a contained, truthful UFAI is safe in the first place. I just have an incredibly low prior on that. So low, in fact, that I didn't think anyone would take it seriously enough to imagine scenarios which prove it's unsafe, because it's just so bloody obvious that you do not build UFAI for any reason, because it will go wrong in some way you didn't plan for.

Comment author: Stuart_Armstrong 31 July 2015 08:34:53AM 0 points [-]

See the point on Paul Christiano's design. The problem I discussed applies not only to UFAIs but to other designs that seek to get round it, but use potentially unrestricted search.

Comment author: eli_sennesh 28 July 2015 12:27:47AM -2 points [-]

We're talking about an AI presenting humans with consequences of a particular decision, with humans then making the final decision to go along with it or not.

No. We're not. That's dumb. Like, sorry to be spiteful, but that is already a bad move. You do not treat any scenario involving "an AI", without dissolving the concept, as desirable or realistic. You have "an AI", without having either removed its "an AI"-ness (in the LW sense of "an AI") entirely or guaranteed Friendliness? You're already dead.

Comment author: Stuart_Armstrong 28 July 2015 10:23:55AM 1 point [-]

Can we assume, that since I've been working all this time on AI safety, that I'm not an idiot? When presenting a scenario ("assume AI contained, and truthful") I'm investigating whether we have safety within the terms of that scenario. Which here we don't, so we can reject attempts aimed at that scenario without looking further. If/when we find a safe way to do that within the scenario, then we can investigate whether that scenario is achievable in the first place.

Comment author: Stuart_Armstrong 27 July 2015 04:12:22PM 2 points [-]

I've always consider the psychological critiques of AI risk (eg "the singularity is just rapture of the nerds") to be very weak ad hominems. However, they might be relevant for parts of the AI risk thesis that depend on the judgements of the people presenting it. The most relevant part would be in checking whether people have fully considered the arguments against their position, and gone out to find more such arguments.

Comment author: Pentashagon 25 July 2015 03:39:12AM 1 point [-]

The problem is that un-self-consistent morality is unstable under general self improvement

Even self-consistent morality is unstable if general self improvement allows for removal of values, even if removal is only a practical side effect of ignoring a value because it is more expensive to satisfy than other values. E.g. we (Westerners) generally no longer value honoring our ancestors (at least not many of them), even though it is a fairly independent value and roughly consistent with our other values. It is expensive to honor ancestors, and ancestors don't demand that we continue to maintain that value, so it receives less attention. We also put less value on the older definition of honor (as a thing to be defended and fought for and maintained at the expense of convenience) that earlier centuries had, despite its general consistency with other values for honesty, trustworthiness, social status, etc. I think this is probably for the same reason; it's expensive to maintain honor and most other values can be satisfied without it. In general, if U(moresatisfactionofvalue1) > U(moresatisfactionofvalue2) then maximization should tend to ignore value2 regardless of its consistency. If U(makevaluesselfconsistentvalue) > U(satisfyinganyother_value) then the obvious solution is to drop the other values and be done.

A sort of opposite approach is "make reality consistent with these pre-existing values" which involves finding a domain in reality state space under which existing values are self-consistent, and then trying to mold reality into that domain. The risk (unless you're a negative utilitarian) is that the domain is null. Finding the largest domain consistent with all values would make life more complex and interesting, so that would probably be a safe value. If domains form disjoint sets of reality with no continuous physical transitions between them then one would have to choose one physically continuous sub-domain and stick with it forever (or figure out how to switch the entire universe from one set to another). One could also start with preexisting values and compute a possible world where the values are self-consistent, then simulate it.

Comment author: Stuart_Armstrong 27 July 2015 04:02:02PM 2 points [-]

It is expensive to honor ancestors, and ancestors don't demand that we continue to maintain that value, so it receives less attention.

That's something different - a human trait that makes us want to avoid expensive commitments while paying them lip service. A self consistent system would not have this trait, and would keep "honor ancestors" in it, and do so or not depending on the cost and the interaction with other moral values.

If you want to look at even self-consistent systems being unstable, I suggest looking at social situations, where other entities reward value-change. Or a no-free-lunch result of the type "This powerful being will not trade with agents having value V."

Comment author: eli_sennesh 27 July 2015 03:49:15AM 0 points [-]

Isn't Stuart just assuming standard decision theory, where you choose actions by predicting their consequences and then evaluating your utility function over your predictions? Are you arguing that real AIs won't be making decisions like this?

While I do think that real AIs won't make decisions in this fashion, that aside, as I had understood Stuart's article, the point was not to address decision theory, which is a mathematical subject, but instead that he hypothesized a scenario in which "the AI" was used to forecast possible future events, with humans in the loop doing the actual evaluation based on simulations realized in high detail, to the point that the future-world simulation would be as thorough as a film might be today, at which point it could appeal to people on a gut level and bypass their rational faculties, but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.

The "But also..." part is the bit I actually object to.

Comment author: Stuart_Armstrong 27 July 2015 01:08:47PM 1 point [-]

Let's focus on a simple version, without the metaphors. We're talking about an AI presenting humans with consequences of a particular decision, with humans then making the final decision to go along with it or not.

So what is happening is that various possible future worlds will be considered by the AI according to its desirability criteria, these worlds will be described to humans according to its description criteria, and humans will choose according to whatever criteria we use. So we have a combination of criteria that result in a final decision. A siren world is a world that ranks very high in these combined criteria but is actually nasty.

If we stick to that scenario and assume the AI is truthful, the main siren world generator is the ability of the AI to describe them in ways that sound very attractive to humans. Since human beliefs and preferences are not clearly distinct., this ranges from misleading (incorrect human beliefs) to actively seductive (influencing human preferences to favour these worlds).

The higher the bandwidth the AI has, the more chance it has of "seduction", or of exploiting known or unknown human irrationalities (again, there's often no clear distinction between exploiting irrationalities for beliefs or preferences).

One scenario - Paul Christiano's - is a bit different but has essentially unlimited bandwidth (or, more precisely, has an AI estimating the result of a setup that has essentially unlimited bandwidth).

but also have a bunch of other extra-scary features above and beyond other scenarios of people being irrational, just because.

This category can include irrationalities we don't yet know about, better exploitation of irrationalities we do know about, and a host of speculative scenarios about hacking the human brain, which I don't want to rule out completely at this stage.

View more: Next