Comment author: agilecaveman 12 May 2015 05:56:53PM 0 points [-]

hmm, looks like the year is wrong and the delete button has failed to work :(

Comment author: agilecaveman 11 March 2015 04:59:20AM 8 points [-]

Maybe this have been said before, but here is a simple idea:

Directly specify a utility function U which you are not sure about, but also discount AI's own power as part of it. So the new utility function is U - power(AI), where power is a fast growing function of a mix of AI's source code complexity, intelligence, hardware, electricity costs. One needs to be careful of how to define "self" in this case, as a careful redefinition by the AI will remove the controls.

One also needs to consider the creation of subagents with proper utilities as well, since in a naive implementation, sub-agents will just optimize U, without restrictions.

This is likely not enough, but has the advantage that the AI does not have a will to become stronger a priori, which is better than boxing an AI which does.

Comment author: Vaniver 08 February 2015 12:45:58AM 1 point [-]

life expectancy(DALY or QALY), since to me, it is easier to measure than happiness.

Whoa, how are you measuring the disability/quality adjustment? That sounds like sneaking in 'happiness' measurements, and there are a bunch of challenges: we already run into issues where people who have a condition rate it as less bad than people who don't have it. (For example, sighted people rate being blind as worse than blind people rate being blind.)

if you could be born in any society on earth today, what one number would be most congruent with your preference? Average life expectancy captures very well which societies are good to be born at.

There's a general principle in management that really ought to be a larger part of the discussion of value learning: Goodhart's Law. Right now, life expectancy is higher in better places, because good things are correlated. But if you directed your attention to optimizing towards life expectancy, you could find many things that make life less good but longer (or your definition of "QALY" needs to include the entirety of what goodness is, in which case we have made the problem no easier).

However, i'd rather have an approximate starting point for direct specification, rather than give up on the approach all-together.

But here's where we come back to Goodhart's Law: regardless of what simple measure you pick, it will be possible to demonstrate a perverse consequence of optimizing for that measure, because simplicity necessarily cuts out complexity that we don't want to lose. (If you didn't cut out the complexity, it's not simple!)

Comment author: agilecaveman 08 February 2015 06:53:25AM 0 points [-]

Well, i get where you are coming from with Goodhart's Law, but that's not the question. Formally speaking, if we take the set of all utility functions with complexity < N = FIXED complexity number, then one of them is going to be the "best", i.e. most correlated with the "true utility" function which we can't compute.

As you point out, with we are selecting utilities that are too simple, such as straight up life expectancy, then even the "best" function is not "good enough" to just punch into an AGI because it will likely overfit and produce bad consequences. However we can still reason about "better" or "worse" measures of societies. People might complain about un-employment rate, but it's a crappy metric to base your decision about which societies are over-all better than others, plus it's easier to game.

The use of at least "trying" to formalize values means we can at least have a set of metrics, that's not too large that we might care about in arguments like: "but the AGI reduced GDP, well it also reduced suicide rate"? Which is more important? Without a simple guidance of simply something we value, it's going to be a long and UN-productive debate.

Comment author: agilecaveman 07 February 2015 11:27:18PM 0 points [-]

Regarding 2: So, I am a little surprised that step 2: Valuable goals cannot be directly specified is taken as a given.

If we consider an AI as rational optimizer of the ONE TRUE UTILITY FUNCTION, we might want to look for best available approximations of it short term. The function i have in mind is life expectancy(DALY or QALY), since to me, it is easier to measure than happiness. It also captures a lot of intuition when you ask a person the following hypothetical:

if you could be born in any society on earth today, what one number would be most congruent with your preference? Average life expectancy captures very well which societies are good to be born at.

I am also aware of a ton of problems with this, since one has to be careful to consider humans vs human/cyborg hybrids, time spent in cryo-sleep or normal sleep vs experiential mind-moments. However, i'd rather have an approximate starting point for direct specification, rather than give up on the approach all-together.

Regarding 5: There is an interesting "problem" with "do what i would want if i had more time to think" that happens not in the case of failure, but in the case of success. Let's say we have our happy go lucky life expectancy maximizing death-defeating FAI. It starts to look at society and sees that some widely accepted acts are totally horrifying from its perspective. It's "morality" surpasses ours, which is just an obvious consequence of it's intelligence surpassing ours. Something like the amount of time we make children sit at their desks at school destroys their health to the point of disallowing immortality. This particular example might not be so hard to convince people of, but there could be others. At this point, they would go against a large number of people, to try and create its own schools which teach how bad the other schools are (or something). The governments don't like this and shut it down because we still can for some reason.

Basically the issue is: this AI behaving in a friendly manner, which we would understand if we had enough time and intelligence. But we don't. So we don't have enough intelligence to determine if it is actually friendly or not.

Regarding 6: I feel that you haven't even begun to approach the problem of a sub-group of people controlling the AI. The issue gets into the question of peaceful transitions of that power over the long term. There is also an issue of if you come up with a scheme of who gets to call the shots around the AI that's actually a good idea, convincing people that it is a good idea instead of the default "let the government do it" is in itself a problem. It's similar in principle to 5.

Comment author: agilecaveman 17 January 2015 08:08:03PM 2 points [-]

Note: I may be over my head here in math logic world:

For procrastination paradox:

There seems to be a desire to formalize

T proves G => G, which messes with completeness. Why not straight up try to formalize:

T proves G at time t => T proves G at time t+1 for all t > 0

That way: G => button gets pressed at time some time X and wasn't pressed at X-1

However, If T proves G at X-1, it must also prove G at X, for all X > 1 therefore it won't press the button, unless X = 1.

Basically instead of reasoning of whether proving something makes it true, reason whether proving something at some point leads to re-proving it again at another point or just formalizing the very intuition that makes us understand the paradox.