User Comment Replies

Counterarguments to the basic AI x-risk case

2y50

I predict that this would look at least as weird and nonhuman as those deep dream images if not more so

This feels like something we should just test? I don’t have access to any such model but presumably someone does and can just run the experiment? Bcos it seems like peoples hunches are varying a lot here

Counterarguments to the basic AI x-risk case

acgt

2y50

if you ask them for the faciest possible thing, it's not very human!facelike

Is this based on how these models actually behave or just what the OP expects? Because is seems to just be begging the question if the latter

4Quintin Pope2y

Also, “ask for the the most X-like thing” is basically how classifier guided diffusion models work, right?

Nate Soares on the Ultimate Newcomb's Problem

acgt

3y10

This doesn’t seem true, at least in the sense of strict ranking? In the EDT case: if Omega’s policy is to place a prime in Box 1 whenever Omicron chooses a composite number (instead of matching Omicron when possible), then it predicts the EDT agent will choose only Box 1 and so is a stable equilibrium. But since it also always places a different prime whenever Omicron chooses a prime, EDT never sees matching numbers and so always one-boxes, therefore its expected earnings are no less than FDT

What are the best elementary math problems you know?

acgt

3y-10

The answer to all of them is 1/e?

2Ege Erdil3y

That's right :) There's a type of problem that is recognizably a "1/e problem", in the sense that you expect the answer will somehow involve 1/e, but I haven't been able to make this intuition precise. What properties about a problem make it likely that 1/e shows up? Some of the above problems involve use of the inclusion-exclusion principle, which is a typical way 1/e can show up in these problems, but e.g. the secretary problem does not.

What are the best elementary math problems you know?

acgt

3y20

Curious what the solution to this one is? Couldn’t figure it out

2paulfchristiano3y

I posted it in another reply.

It Looks Like You're Trying To Take Over The World

acgt

3y*40

Doesn’t this argument also work against the idea that they would self-modify in the “normal” finite way? It can’t currently represent the number which it’s building a ton of new storage to help contain, so it can’t make a pairwise comparison to say the latter is better, nor can it simulate the outcome of doing this and predict the reward it would get

Maybe you say it’s not directly making a pairwise comparison but making a more abstract step of reasoning like “I can’t predict that number but I know it’s gonna be bigger that what I have now, me with augmente... (read more)

It Looks Like You're Trying To Take Over The World

acgt

3y20

This is a really interesting point. It seems like it goes even further - if the agent was only trying to maximise future expected reward, not only would it be ambivalent between temporary and permanent “Nirvana”, it would be ambivalent between strategies which achieved Nirvana with arbitrarily different probabilities right (maybe with some caveats about how it would behave if it predicted the strategy might lead to negative-infinite states)

So if a sufficiently fleshed out agent is going to assign a non-zero probability of Nirvana to every - or at least mos... (read more)

I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness

acgt

3y130

Your comments here and some comments Eliezer had made elsewhere seem to imply he believes he has at least in large party “solved” consciousness. Is this fair? And if so is there anywhere he has written up this theory/analysis in depth - because surely if correct this would be hugely important

I’m kind of assuming that whatever Eliezer’s model is, the bulk of the interestingness isn’t contained here and still needs to be cashed out, because the things you/he list (needing to examine consciousness through the lens of the cognitive algorithms causing our discu... (read more)

Question/Issue with the 5/10 Problem

acgt

3y10

Yeah sure, like there's a logical counterfactual strand of the argument but that's not the topic I'm really addressing here - I find those a lot less convincing so my issue here is around the use of Lobian uncertainty specifically. There's an step very specific to this species of argument that proving that □P will make P true when P is about the outcomes of the bets, because you will act based on the proof of P.

This is invoking Lob's Theorem in a manner which is very different from the standard counterpossible principle of explosion stuff. And I'm re... (read more)

A Semitechnical Introductory Dialogue on Solomonoff Induction

acgt

4y10

I think there’s a sense in which some problems can be uncomputable even with infinite compute no? For example if the Halting problem were computable even with literally infinite time, then we could construct a machine that halted when given its own description iff it ran forever when given its own description. I do think theres a distinction beyond just “arbitrarily large finite compute vs. infinite compute”. It seems like either some problems have to be uncomputable even by a hyper-computer, or else the concept of infinite compute time is less straightfor... (read more)

2gjm4y

The halting problem is computable with literally-infinite time. But, to be precise, what this means is that a hypercomputer could determine whether a (nonhyper)computer halts; in a universe containing hypercomputers, we would not be very interested in that, and we'd be asking for something that determines whether a given hypercomputer halts (or something like that; I haven't given much thought to what corresponds to "halting" for any given model of hypercomputation...) which would be impossible for the same sort of reasons as the ordinary halting problem is impossible for ordinary computers. But I think it's only fair to describe this by saying "the halting problem is impossible even with infinite computational resources" if you acknowledge that then "the halting problem" isn't a single problem, it's a thing that varies according to what computational resources you've got, getting harder when you have more resources to throw at it.

A Semitechnical Introductory Dialogue on Solomonoff Induction

acgt

4y10

I think the point is even stronger than that - Solomonoff induction requires not just infinite compute/time but doing something literally logically impossible - the prior is straight up uncomputable, not in any real-world tractability sense but as uncomputable as the Halting problem is. There’s a huge qualitative gulf between “we can’t solve this problem without idealised computers with unbounded time” and “we can’t solve this on a computer by definition”. Makes a huge difference to how much use the approach is for “crispening” ideas IMO

2gjm4y

Yup, actual Solomonoff induction is uncomputable. I'm not sure what you mean by "not just infinite compute/time", though; given truly infinite computation you absolutely could do it. (Though in a world where that was possible, you'd really want your Solomonoff inductor to consider possible explanations that likewise require an infinite amount of computation, and then you'd be back with the same problem again.) I guess the distinction you're making is between "requires a finite but absurdly large amount of computation" and "requires literally infinite computation", and I agree that the latter is what Solomonoff induction requires. I think that reduces the credibility of the claim "Solomonoff induction is the One True Way to do inference, at least ideally speaking". But I think the following weaker proposal is intact: * Given two rival explanations of our observations, in the happy (and pretty much unheard-of) case where they have both been expressed precisely in terms of computer programs, all else equal we should consider the shorter program "simpler", "better", and "more likely right". * One way for all else not to be equal is if one program is shorter than the other just because it's been more highly optimized in some way that doesn't have much to do with the actual theory it embodies. So rather than "shorter program better" we should really say something more like "shorter program, after making it as small as possible, better". * Obviously, coming up with an actual program that perfectly explains all our observations is unrealistic; those observations include reading scientific papers, so it seems like the program would need to include a complete Theory Of Everything in physics; those observations include interactions with other humans, so it seems like the program would need to include at least as much intelligence as any person we encounter; these are both famously hard problems that the human race has not yet cracked. * But given two proposals for explai

Finite Factored Sets

acgt

4y*30

On the last example with the XOR temporal inference - since the partitions/queries we’re asking about are also possible factors, doesn’t the temporal data in terms of history etc depend on which choice of factorisation we go with?

We have a choice of 2 out of 3 factors each of which corresponds to one of the partitions in question, so surely by factorising in different ways we can make any two of the variables have history of 1 and thus automatically orthogonal?

2Scott Garrabrant4y

So we are allowing S to have more than 4 elements (although we dont need that in this case), so it is not just looking at a small number of factorizations of a 4 element set. This is because we want an FFS model, not just a factorization of the sample space. If you factor in a different way, X will not be before Y, but if you do this it will not be the case that X is orthogonal to X XOR Y. The theorem in this example is saying that X being orthogonal to X XOR Y implies that X is before Y.

Finite Factored Sets

acgt

4yΩ030

I’m confused what necessary work the Factorisation is doing in these temporal examples - in your example A and B are independent and C is related to both - the only assignment of “upstream/downstream” relations that makes sense is that C is downstream of both.

Is the idea that factorisation is what carves your massive set of possible worlds up into these variables in the first place? Feel like I’m in a weird position where the math makes sense but I’m missing the motivational intuition for why we want to switch to this framework in the first place

4Scott Garrabrant4y

I are note sure what you are asking (indeed I am not sure if you are responding to me or cousin_it.) One thing that I think is going on is that I use "factorization" in two places. Once when I say Pearl is using factorization data, and once where I say we are inferring a FFS. I think this is a coincidence. "Factorization" is just a really general and useful concept. So the carving into A and B and C is a factorization of the world into variables, but it is not the kind of factorization that shows up in the FFS, because disjoint factors should be independent in the FFS. As for why to switch to this framework, the main reason (to me) is that it has many of the advantages of Pearl with also being able to talk about some variables being coarse abstract versions of other variables. This is largely because I am interested in embedded agency applications. Another reason is that we can't tell a compelling story about where the variables came from in the Pearlian story. Another reason is that sometimes we can infer time where Pearl cannot.

Finite Factored Sets

acgt

4yΩ130

What would such a distribution look like? The version where X XOR Y is independent of both X and Y makes sense but I’m struggling to envisage a case where it’s independent of only 1 variable.

4Scott Garrabrant4y

It looks like X and V are independent binary variables with different probabilities in general position, and Y is defined to be X XOR V. (and thus V=X XOR Y).

LESSWRONG
LW

All of acgt's Comments + Replies