All of red75's Comments + Replies

Probabilistic inference for general belief networks is NP-hard (see The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks (PDF)). Thus straitforward approach is not an option. The problem is more like finding computationally tractable yet sufficiently powerful subtype of belief networks.

1moridinamael
This only implies that the required computation time scales poorly with the number of graph nodes. It seems like for any reasonable number of beliefs that could be input by a single person you wouldn't run into any practical difficulty. Perhaps if one tried to extend this to a web based application with a world-wide, constantly updated belief net you would run into scaling issues, and then you simply make practical decisions about how complex you're willing to let things get.

What bothers me in The Basic AI Drives is a complete lack of quantitativeness.

Temporal discount rate isn't even mentioned. No analysis of self-improvement/getting-things-done tradeoff. Influence of explicit / implicit utility function dichotomy on self-improvement aren't considered.

2multifoliaterose
I find some of your issues with the piece legitimate but stand by my characterization of the most serious existential threat from AI being of the type described in the therein.

Diversity of a population plays a role too. If I'm well below Feynman level (and I am), then there's a possibility that I can slightly improve my cognitive abilities without any negative consequences.

My experience with nootropics (racetams) seems to support this, as far as it is possible for anecdotal evidence.

It is valuable information, thanks. I underestimated relative weight of communication style in the feedback I got.

Thank you. It is something I can use for improvement.

Can you point at the flaws? I can see that the structure of sentences is overcomplicated, but I don't know how it feels to native English speakers. Foreigner? Dork? Grammar Illiterate? I appreciate any feedback. Thanks.

0homunq
Actually, a bit of all three. The one you can control the most is probably "dork", which unpacks as "someone with complex ideas who is too impatient/show-offy to explain their idiosyncratic jargon". I'm a native English speaker, and I know that I still frequently sound "dorky" in that sense when I try to be too succinct.
0TimS
Respectfully, I don't know what this sentence means. In particular, I don't know what "most common still life" meant. That made it difficult to decipher the rest of the comment. ETA: Thanks to the comment below, I understand a little better, but now I'm not sure what motivates invoking the possibility of other agents, given that the discussion was about proving Friendliness.

One year and one level-up (thanks to ai-class.com) after this comment I'm still in the dark about the cause of downvoting the above comment.

I'm sorry for whining, but my curiosity took me over. Any comments?

4homunq
It wasn't me, but I suspect the poor grammar didn't help. It makes it hard to understand what you were getting at.
0[anonymous]
Since you asked, your downvoted comment seems like word salad to me, I don't understand sensible reasons that would motivate it.

Problem 2 by Bayes rule.

N is a random variable (RV) of number of filled envelopes.

C is a RV of selected envelope contains coin. P(C) means P(C=true) when appropriate.

Prior distribution

P(N=n) = 1/(m+1)

by the problem setup

P(C|N=n) = n/m 

by the rule of total probability

P(C)=sum_n P(C|N=n)P(N=n) = sum_n (n/m/(m+1))=m(m+1)/2/m/(m+1)=1/2

by Bayes rule

P(N=n|C) = P(C|N=n)P(N=n)/P(C) = 2n/m/(m+1)

Let C' is a RV of picking filled envelope second time.

by the problem statement

P(C'|N=n,C) = (n-1)/m

by the rule of total probability

P(C'|C)=sum_n P(C'|N=n,C)P(N=n|
... (read more)

I suspect it's highly relevant that if someone were to actually grow up in a grayscale environment, they wouldn't be capable of experiencing blue.

Results of gene therapy for color blindness suggest otherwise. Maybe those monkeys and mice cannot experience colors, but they react as if they can.

I'm really want to try this myself. Infrared sensitive opsin in a retina, isn't it wonderful?

I don't understand what the question is getting at.

I am getting there. There's a phenomenon called blindsight type 1. Try to imagine that you have "color blindsight", i.e. you can't differentiate between colors, but you can guess above chance what color it is. In this condition you lack qualia of colors.

I doubt that you think about rods and cones when you are deciding if it's safe to cross the road. The question is: is there something in your perception of illuminated traffic light, that allows you to say that it is red or green or yellow? Or maybe you just know that it is green or yellow, but you can't see any differences but position and luminosity?

0[anonymous]
I don't understand what the question is getting at. You're right that I don't think about cones when I check which color a light is, but this is the mechanism by which it enters my brain: since different lights enter my brain in different ways it is no surprise I can differentiate between them.

Maybe it's better to start from obvious things. Color experience, for example. Can you tell which light of traffic lights is illuminated while you are not using position of light and you aren't asking himself which color it is? Is there something in your perception of different lights that allows you to tell that they are different?

0[anonymous]
The cones in the eye detect three different aspects of light (redness, greenness, blueness) and these are sent to the brain in three different fibers. By this mechanism we see there's nothing magic going on in telling the difference between two colors. I guess the rods (which detect variation in brightness) are more relevant to the question of which light is on though.

I think communication cost isn't main reason for P's failure. O, for example, defects on 3 last turns even when playing against itself (rule 1 is of highest priority). Reasons are too strong punishment of other strategies (and consequently itself) and too strict check for self identity.

Strategy I described here should perform much better, when n is in range 80-95.

Huh? Do you think that selfishness unambiguously means: dominate Earth (or what left of it) as fast as possible?

0wedrifid
No.

Try another strategy. I[n] - TfT until turn n, defect on turn n, on later turns check if result on turn n was (defect,defect) and play TfT, otherwise defect. Idea is selfcooperation.

Moreover, simulations I ran using your rules for evolutionary tournament show that one strategy quickly dominates and others go extinct. Defectbot is among strategies which are fastest to go extinct (even in presence of cooperatebot) as it feeds off overaltruist strategies, which in turn fail to compete with tit-for-tat. So I doubt that at least evolutionary tournament will converge into Nash.

I predict that strategy that tit-for-tats 99 turns and defects on 100-th one will win in evolutionary tournament, given that tit-for-tat is also in the population.

ETA: I've sent another strategy.

In the meantime I've run my own simulation, studying a group of strategies, which perform as tit-for-tat except that at specific turn they defect and then they use result of this turn to switch to defect stone or continue tit-for-tatting. Thus they recognize copies of itself and cooperate with them. Such strategy can be exploited by switching to defect stone before it does, or by mimicking its behavior (second defect check after first. This case I didn't analyze).

It leads to interesting results in evolutionary tournament. Second fairest (second longest per... (read more)

0prase
Define strategy S[n] as TfT until turn n and defect ever since. In the limit of infinite population having non-zero initial number of S[n] for each n, S[0], i.e. DefectBot, eventually dominates. Starting with equal subpopulations, initially most successful is S[99] which preys on S[100] and finally drives it to extinction. But then, S[98] gains advantage over S[99] and so on. With not so big population however, the more defectorish strategies die out sooner than the environment becomes suitable for them. (I have done it with population of 2000 strategies and the lowest surviving after several hundred generations was S[80] or so).

Then the goal of lesswrong (in this framework) seems to make brain act like it contains command and control center which corrects for errors caused by another parts of brain. And the list of errors includes the idea that brain contains command and control center. Sophisticated.

0DSimon
Careful, you're merging two different metaphors from the article. As you point out, the brain does not have a central module that is in control of all the others. But the brain does have a large collection semi-distinct modules, many of which appear to have significant control over various other modules. So yeah, to become more rational, you're adjusting some parts of your brain's modules to compensate for and/or override some of the lousy data coming out of other modules. But that doesn't make the adjusted modules take command over the non-adjusted ones; a sense of irrational fear of spiders might come from your hindbrain and be adjusted by your forebrain, but that doesn't mean that your forebrain is also taking over or overriding the hindbrain's job of noticing when you've stubbed your toe.
1Nisan
Hm, yes. The brain is like an egalitarian cooperative, some of whose members are literate. We want the cooperative to write down goals and policies in a guiding document (or several documents, in several languages), which the literate members can consult and use to guide their behavior and the behavior of their peers.

I wonder why rational consequentialist agent should do anything but channel all available resources into instrumental goal of finding a way to circumvent heat death. Mixed strategies are obviously suboptimal as expected utility of heat death circumvention is infinite.

Below is very unpolished chain of thoughts, which is based on vague analogy with symmetrical state of two indistinguishable quantum particles.

When participant is said ze is decider, ze can reason: let's suppose that before coin was flipped I changed places with someone else, will it make difference? If coin came up heads, than I'm sole decider and there are 9 swaps which make difference in my observations. If coin came up tails then there's one swap that makes difference. But if it doesn't make difference it is effectively one world, so there's 20 worlds I... (read more)

-2cousin_it
Um, the probability-updating part is correct, don't spend your time attacking it.

I'm still unsure if it is something more than intuition pump. Anyway, I'll share any interesting thoughts.

7 of 10. I underestimated Asian (Eurasian?) continent area by factor 4 (safety margin one order of magnitude), quantity of US dollars by factor 10 (safety margin 3 orders of magnitude) and volume of gr. lakes by factor 0.1 (safety margin 3 orders of magnitude). Other safety margins were 3 orders of magnitude for Titanic, Pacific coast (fractal-like curves can be very long), book titles, and 0.5 from mean value for others. Sigh, I thought I'll have 90%.

Hm, I estimated area of Asian continent as area of triangle with 10000km base (12 timezones for 20000 km and factor of 0.5 for pole proximity) and 10000km height (north pole to equator), and lost one order of magnitude in calculation.

Do you have in mind something like 0.9 1000/9 + 0.1 100/1 = 110? This doesn't look right

This can be justified by change of rules: deciders get their part of total sum (to donate it of course). Then expected personal gain before:

for "yea": 0.5*(0.9*1000/9+0.1*0)+0.5*(0.9*0+0.1*100/1)=55  
for "nay": 0.5*(0.9*700/9+0.1*0)+0.5*(0.9*0+0.1*700/1)=70

Expected personal gain for decider:

for "yea": 0.9*1000/9+0.1*100/1=110
for "nay": 0.9*700/9+0.1*700/1=140

Edit: corrected error in value of first expected benefit.

Edit:... (read more)

And here's a reformulation of Counterfactual Mugging in the same vein. Find two subjects who don't care about each other's welfare at all. Flip a coin to choose one of them who will be asked to give up $100. If ze agrees, the other one receives $10000.

This is very similar to a rephrasing of the Prisoner's Dilemma known as the Chocolate Dilemma. Jimmy has the option of taking one piece of chocolate for himself, or taking three pieces and giving them to Jenny. Jenny faces the same choice: take one piece for herself or three pieces for Jimmy. This formulation... (read more)

0[anonymous]
It's pure coordination game.
2cousin_it
This is awesome! Especially the edit. Thanks.

all else is fantasy

I am not sure that I am correct. But there seems to be another possibility.

If we assume that the world is a model of some formal theory, then counterfactuals are models of different formal theories, whose models have finite isomorphic subsets (reality accessible to the agent before it makes a decision).

Thus counterfactuals aren't inconsistent as they use different formal theories, and they are important because agent cannot decide the one that applies to the world before it makes a decision.

The person in the space ship will experience time twice as slow as people on earth. So the person in the spaceship would expect people on earth to age twice as quickly.

I targeted this part of your reasoning. Time on spaceship is moving slower (in a sense) than time on earth in reference frame where earth is stationary, yes, but it doesn't follow that time on earth therefore moves faster than time on spaceship in reference frame of spaceship, quite opposite.

t'=\gamma(t-vx/c^2) 

It is both valid when t is measured in reference frame of spaceship and in ... (read more)

If we stick to situations where special relativity is applicable, then we have no way to directly measure difference between time passed on earth and on spaceship, as their clocks can be synchronized only once (when they are in the same place). Thus it has no meaning to question where time goes slower.

What they will see is different question. When spaceship goes away from earth astronauts will see that processes on earth take longer than usual (simply from Doppler's effect with relativistic corrections), and so do earthlings. When spaceship goes toward earth, astronauts see that processes on earth go faster than usual.

Edit: Sorry for very tangential post.

-3Davorak
Are you saying your argument is true with the strict application of only SR or that in is true in reality? I would say it can not be true in reality because muons and other particles take a measurably longer amount of time to decay as their speed increases.

I'm not sure I understand you. Values of the original agent specify a class of programs it can become. Which program of this class should deal with observations?

It's not better to forget some component of values.

Forget? Is it about "too smart to optimize"? This meaning I didn't intend.

When computer encounters borders of universe, it will have incentive to explore every possibility that it is not true border of universe such as: active deception by adversary, different rules of game's "physics" for the rest of universe, possibility ... (read more)

Then it seems better to demonstrate it on toy model as I've done for no closed form already.

[...] computer [operating within Conway's game of life universe] is given a goal of tiling universe with most common still life in it and universe is possibly infinite.

One way I can think of to describe closed/no closed distinction is that latter does require unknown amount of input to be able to compute final/unchanging ordering over (internal representations of) world-states, former doesn't require input at all or requires predictable amount of input to do the... (read more)

2Vladimir_Nesov
I understand the grandparent comment now. Open/closed distinction can in principle be extracted from values, so that values of the original agent only specify what kind of program the agent should self-improve into, while that program is left to deal with any potential observations. (It's not better to forget some component of values.)

Direct question. I cannot infer answer from you posts. If human values do not exist in closed form (i.e. do include updates on future observations including observations which in fact aren't possible in our universe), then is it better to have FAI operating on some closed form of values instead?

4Vladimir_Nesov
I don't understand the question. Unpack closed form/no closed form, and where updating comes in. (I probably won't be able to answer, since this deals with observations, which I don't understand still.)

Also, interesting thing happens if by the whim of the creator computer is given a goal of tiling universe with most common still life in it and universe is possibly infinite. It can be expected, that computer will send slower than light "investigation front" for counting encountered still life. Meanwhile it will have more and more space to put into prediction of possible treats for its mission. If it is sufficiently advanced, then it will notice possibility of existence of another agents, and that will naturally lead it to simulating possible int... (read more)

2red75
One year and one level-up (thanks to ai-class.com) after this comment I'm still in the dark about the cause of downvoting the above comment. I'm sorry for whining, but my curiosity took me over. Any comments?

Shouldn't AI researchers precommit to not build AI capable of this kind of acausal self-creation? This will lower chances of disaster both causally and acausally.

And please, define how do you tell moral heuristics and moral values apart. E.g. which is "don't change moral values of humans by wireheading"?

They advocate (possible wrong) opinions to signal that they stand out of the crowd. Did I unpack that right?

6Desrtopa
Not to ones that they themselves suspect are wrong, if that's what you mean. But it's hard to signal high intellectual status while expressing the same beliefs as all of your peers, so if you want to signal, you have a motive to find a point of disagreement.

I see two overlapping problems with application of litany of Tarski in this context.

First. Litany should be relatively short for practical reasons, and as such its statement is simplification of real state of affairs when it is applied to complex system such as human and his/her social interactions. Thus litany implicitly suggest to believe in this simplified version, even if it was supposed to represent some complex mental image. And that leads us to

Second. Beliefs about oneself is tricky thing, as if they aren't compartmentalized (and we don't want them... (read more)

If you don't mind, count me in. PM'd email to cousin_it. Primary skill: programmer.

Not exactly. My version is incorrect, yes. But there is, uhm, controversial way of consistent assignment of truth values to Yablo's statements.

In my version n-th step of loop unrolling is

S'(n) = not not ... {n times} ... S

or

S'(n)=not S'(n+1)

Yablo's version

S(n)=not exists m>n such that S(m)=true

or

S(n)=(not S(n+1)) && (not exists m>n+1 such that S(m)=true)

If we extend set of natural numbers by element omega such that

forall n in N : (omega>n),
not exists n in N : (n+1=omega),
omega=omega+1

Than we can assign S(n)=false for all n in ... (read more)

Yablo's version looks like unrolled infinite loop of function

s :: Bool
s=not s
2cousin_it
Not to me it doesn't. Yablo's version has a "forall" that your translation misses. So in Yablo's version there's no consistent way to assign truth values to S(n), but in your version we could make S(n) = "n is odd" or something.

Ok. There's no one who can truthfully say "I am both copies".

Let's ban word truth. There's no one who can say "I am both copies" and prove it has same sense as in "I am I (this living thing that speaks those words)" (e.g. I can control my body. I can react to external events in a way I'm previously described or agreed to do.)

There is no uncertainty about which copy is you, you are both.

However there's no one who can say "I am both copies".

-1Vladimir_Nesov
I am both copies.

Shouldn't it be: If some or all branches of technology in current sociopolitical environment bring more harm than good according to shared values of group X, then we should want to believe it?

[...] you just make that choice.

It works for pure consequentialist, but if one's values have a deontology in the mix, then your suggestion effectively requires changing of one's values.

And I doubt than instrumental value that will change terminal values can be called instrumental. Agent that adopts this value (persistence of values) will end up with different terminal values than agent that does not.

  1. The case of agents in conflict. Keep your values and be destroyed, or change them and get the world partially optimized for your initial values.

  2. The case of unknown future. You know class of worlds you want to be in. What you don't know yet is that to reach them you must make choices incompatible with your values. And, to make things worse, all choices you can make ultimately lead to worlds you definitely don't want to be in.

2wedrifid
1. Yes. That is the general class that includes 'Omega rewards you if you make your decision irrationally'. It applies whenever the specific state of your cognitive representation interacts significantly with the environment by means independent of your behaviour. 2. No. You don't need to edit yourself to make unpleasant choices. Whenever you wish you were are different person than who you are so that you could make a different choice you just make that choice.

Let Pg be RAI's estimation of probability of successful completion of computation without conversion of Earth into computronium. Let Pb be RAI's estimation of probability of success if it converts Earth into computronium.

Case 1. RAI modifies itself into non-understanding of simulation threats.

When RAI is not simulated, it ignores all threats and proceeds to converting Earth into computronium, thus increasing its probability of success by Pb-Pg (it becomes protected from accidental/intentional turning-offs, etc.).

When RAI is simulated, it fails bargaining a... (read more)

I mentioned this problem already. And I (07/2010) thought about ways to ensure that FAI will prefer my/our/rational way of extrapolating.

Now I think it would be better if FAI will select coherent subset of volitions of all reflectively consistent extrapolations. As I suspect it will be something like: protect humanity from existential risk, but don't touch it beyond that.

It [FAI] doesn't have a judgement of its own.

[...]

And if it [FAI] can reliably figure out what a wiser version of us would say, it substitutes that person's judgement for ours.

[...]

I would direct the person to LessWrong, [...] until they're a good enough rationalist [...] -- then ask them again.

It seems you have a flaw in your reasoning. You will direct a person to LessWrong, someone else will direct a person to church. And FAI should figure out somehow which direction a person should take to be wiser, without a judgment of its own.

3nshepperd
That's true. According to the 2004 paper, Eliezer thinks (or thought, anyway) "what we would decide if we knew more, thought faster, were more the people we wished we were, had grown up farther together..." would do the trick. Presumably that's the part to be hard-coded in. Or you could extrapolate (using the above) what people would say "wisdom" amounts to and use that instead. Actually, I can't imagine someone who knew and understood both the methods of rationality (having been directed to LessWrong) and all the teachings of the church (having been directed to church) would then direct a person to church. Maybe the FAI can let a person take both directions to become wiser. ETA: Of course, in FAI 'maybe' isn't good enough...

So in your design, you'd have to figure out a way to prevent self-halting under all possible input conditions, under all possible self-modifications of the machine.

Self-modifications are being performed by the machine itself. Thus we (and/or machine) don't need to prove that all modifications aren't "suicidal". Machine can be programmed to perform only provably (in reasonable time) non-suicidal self-modifications. Rice's theorem doesn't apply in this case.

Edit: However this leaves meta-level unpatched. Machine can self-modify into non-suicidal... (read more)

0stanislavzza
The kind of constraint you propose would be very useful. We would have to first prove that there is a kind of topology in under general computation (because the machine can change its own language, so the solution can't be language specific) that only allows non-suicidal trajectories under all possible inputs and self-modifications. (or perhaps at least with low probability, but this is not likely to be computable). I have looked, but not found such a thing in existing theory. There is work on topology of computation, but it's something different from this. I may just be unaware of it, however. Note that in the real-world scenario, we also have to worry about entropy battering around the design, so we need a margin of error for that too. Finally, the finite-time solution is practical, but ultimately not satisfying. The short term solution to being in a building on fire may be to stay put. The long term solution may be to risk short-term harm for long-term survival. And so with only short-term solutions, one may end up in a dead end down the road. A practical limit on short-term advance simulation is that one still has to act in real time while the simulation runs. And if you want the simulation to take into account that simulations are occurring, we're back to infinite regress...

If anyone is interested. This extension doesn't seem to lead to anything of interest.

If we map continuum of UDASSA multiverses into [0;1) then Lebesgue measure of set of multiverses which run particular program is 1/2.

Let binary number 0.b1 b2 ... bn ... be representation of multiverse M if for all n: (bn=1 iff M runs program number n, and bn=0 otherwise).

It is easy to see that map of set of multiverses which run program number n is a collection of intervals [i/2^n;2i/2^n) for i=1..2^(n-1). Thus its Lebesgue measure is 2^(n-1)/2^n=1/2.

Should we stop on UDASSA? Can we consider universe that consists of continuum of UDASSAs each running some (infinite) subset of set of all possible programs.

2red75
If anyone is interested. This extension doesn't seem to lead to anything of interest. If we map continuum of UDASSA multiverses into [0;1) then Lebesgue measure of set of multiverses which run particular program is 1/2. Let binary number 0.b1 b2 ... bn ... be representation of multiverse M if for all n: (bn=1 iff M runs program number n, and bn=0 otherwise). It is easy to see that map of set of multiverses which run program number n is a collection of intervals [i/2^n;2i/2^n) for i=1..2^(n-1). Thus its Lebesgue measure is 2^(n-1)/2^n=1/2.

It even depends on philosophy. Specifically on whether following equality holds.

I survive = There (not necessarily in our universe) exists someone who remembers everything I remember now plus failed suicide I'm going to conduct now.

or

I survive = There exists someone who don't remember everything I remember now, but he acts as I would acted if I remember what he remembers. (I'm not sure whether I correctly expressed subjunctive mood)

It seems that what I call indirectly self-referential value function can be a syntactic preference as defined by Vladimir Nesov.

Well, I tried to precisely define toy model I use. As for utilons, I took the word that is common here, without much thinking about it. It doesn't seem to blur the meaning of post significantly.

6Jack
Yeah, I don't have a problem with you using what is common around here. I just would like to change what is common around here.

Future is not a world-state, it is a sequence of world-states. Thus your statement must be reformulated somehow.

Either (1) we must define utility function over a set of (valid) sequences of world-states or (2) we must define what it means that sequence of world-states is optimized for given U, [edit] and that means that this definition should be a part of U itself as U is all we care about. [/edit]

And option 1 is either impossible if rules of world don't permit an agent to hold full history of world or we can define equivalent utility function over world-s... (read more)

Load More