LESSWRONG
LW

All of dankane's Comments + Replies

I feel like your discussion of predictors makes a few not-necessarily-warranted assumptions about how the predictor deals with self-reference. Then again, I guess anything that doesn't do this fails as a predictor in a wide range of useful cases. It predicts a massive fire will kill 100 people, and so naturally this prediction is used to invalidate the original prediction.

But there is a simple-ish fix. What if you simply ask it to make predictions about what would happen if it (and say all similar predictors) suddenly stopped functioning immediately before this prediction was returned?

Markets are Anti-Inductive

dankane8y00

Unless you can explain to me how prediction markets are going to break the pattern that two different shares of the same stock have correlated prices.

I'm actually not sure how prediction markets are supposed to have an effect on this issue. My issue is not that people have too much difficulty recognizing patterns. My issue is that some patterns once recognized do not provide incentives to make that pattern disappear. Unless you can tell me how prediction markets might fix this problem, your response seems like a bit of a non-sequitur.

Markets are Anti-Inductive

dankane8y10

This seems like too general a principle. I agree that in many circumstances, public knowledge of a pattern in pricing will lead to effects causing that pattern to disappear. However, it is not clear to me that this is always to case, or that the size of the effect will be sufficient to complete cancel out the original observation.

For example, I observe that two different units of Google stock have prices that are highly correlated with each other. I doubt that this observation will cause separate markets to spring up giving wildly divergent prices to diffe... (read more)

1a gently pricked vein8y

Is it unfair to say that prediction markets will deal with all of these cases? I understand that's like responding to "This is a complicated problem that may remain unsolved, it is not clear that we will be able to invent the appropriate math to deal with this." with "But Church-Turing thesis!". But all I'm saying is that it does apply generally, given the right apparatus.

The Strangest Thing An AI Could Tell You

dankane9y10

We probably couldn't even talk ourselves out of this box.

I don't know... That sounds a lot like what an AI trying to talk itself out of a box would say.

The Mystery of the Haunted Rationalist

dankane9y30

Hmm... I would probably explain the threshold for staying in the house not as an implicit expected probability computation, but an evaluation of the price of the discomfort associated with staying in a location that you find spooky. At least for me, I think that the part of my mind that knows that ghosts do not exist would have no trouble controlling whether or not I remain in the house or not. However, it might well decide that it is not worth the $10 that I would receive to spend the entire night in a place where some other piece of my mind is constantly yelling at me to run away screaming.

An overall schema for the friendly AI problems: self-referential convergence criteria

dankane10y40

It's just that such self-referential criteria as reflective equilibrium are a necessary condition

Why? The only example of adequately friendly intelligent systems that we have (i.e. us) don't meet this condition. Why should reflective equilibrium be a necessary condition for FAI?

0Stuart_Armstrong10y

Because FAI's can change themselves very effectively in ways that we can't. It might be that human brain in computer software would have the same issues.

Taking Effective Altruism Seriously

dankane10y20

That may be true (at least to the degree to which it is sensible to assign a specific cause to a given util). However, it is not very good evidence that investment in first world economies is the most effective way to generate utils in Africa.

Taking Effective Altruism Seriously

dankane10y30

OK. So suppose that I grant your claim that donations to sub-Saharan Africa will not substantially affect the size of the future economic pie, but that other investments will. I claim that there may still be reason to donate there.

I grant that such a donation will produce fewer dollars of value than investing in capitol infrastructure. On the other hand dollars is not the objective, utils are. We can reasonably assume that marginal utility of an extra dollar for a given person is decreasing as that person's wealth increases. We can reasonably expect that w... (read more)

2drethelin10y

I think the vast majority of utils created in sub-saharan africa are a byproduct of wealth created elsewhere.

The Truly Iterated Prisoner's Dilemma

dankane10y00

[I realize that I missed the train and probably very few people will read this, but here goes]

So in non-iterated prisoner's dilemma, defect is a dominant strategy. No matter what the opponent is doing, defecting will always give you the best possible outcome. In iterated prisoner's dilemma, there is no longer a dominant strategy. If my opponent is playing Tit-for-Tat, I get the best outcome by cooperating in all rounds but the last. If my opponent ignores what I do, I get the best outcome by always defecting. It is true that all defects is the unique Nash ... (read more)

An Introduction to Löb's Theorem in MIRI Research

dankane10y00

I think that the way that humans predict other humans is the wrong way to look at this, and instead consider how humans would reason about the behavior of an AI that they build. I'm not proposing simply "don't use formal systems", or even "don't limit yourself exclusively to a single formal system". I am actually alluding to a far more specific procedure:

Come up with a small set of basic assumptions (axioms)
Convince yourself that these assumptions accurately describe the system at hand
Try to prove that the axioms would imply the des

dankane10y00

Yes, obviously. We solve the Lobstacle by not ourselves running on formal systems and sometimes accepting axioms that we were not born with (things like PA). Allowing the AI to only do things that it can prove will have good consequences using a specific formal system would make it dumber than us.

4orthonormal10y

I think, rather, that humans solve decision problems that involve predicting other human deductive processes by means of some evolved heuristics for social reasoning that we don't yet fully understand on a formal level. "Not running on formal systems" isn't a helpful answer for how to make good decisions.

An Introduction to Löb's Theorem in MIRI Research

dankane10y40

Actually, why is it that when the Lobian obstacle is discussed that it seem to always be in reference to an AI trying to determine if a successor AI is safe, and not an AI trying to determine whether or not it, itself, is safe?

3orthonormal10y

Because we're talking about criteria for action, not epistemology. The heart of the Lobstacle problem is that straightforward ways of evaluating the consequences of actions start to break down when those consequences involve the outcomes of deductive processes equal to or greater than the one brought to bear.

An Introduction to Löb's Theorem in MIRI Research

dankane10y20

Question: If we do manage to build a strong AI, why not just let it figure this problem out on its own when trying to construct a successor? Almost definitionally, it will do a better job of it than we will.

4orthonormal10y

The biggest problem with deferring the Lobstacle to the AI is that you could have a roughly human-comparable AI which solves the Lobstacle in a hacky way, which changes the value system for the successor AI, which is then intelligent enough to solve the Lobstacle perfectly and preserve that new value system. So now you've got a superintelligent AI locked in on the wrong target.

0hairyfigment10y

If you want to take that as a definition, then we can't build a strong AI without solving the Lobstacle!

Newcomblike problems are the norm

dankane11y20

Relatedly, with your interview example, I think that perhaps a better model is that whether a person is confident or shy is not depending on whether they believe that they will be bold or not, but upon the degree to which they care about being laughed at. If you are confident, you don't care about being laughed at and might as well be bold. If you are afraid of being laughed at, you already know that you are shy and thus do not gain anything by being bold.

Newcomblike problems are the norm

dankane11y20

Newcomblike problems are the norm

dankane11y150

Newcomblike problems occur whenever knowledge about what decision you will make leaks into the environment. The knowledge doesn't have to be 100% accurate, it just has to be correlated with your eventual actual action.

This is far too general. The way in which information is leaking into the environment is what separates Newcomb's problem from the smoking lesion problem. For your argument to work you need to argue that whatever signals are being picked up on would change if the subject changed their disposition, not merely that these signals are correlated with the disposition.

2dankane11y

Relatedly, with your interview example, I think that perhaps a better model is that whether a person is confident or shy is not depending on whether they believe that they will be bold or not, but upon the degree to which they care about being laughed at. If you are confident, you don't care about being laughed at and might as well be bold. If you are afraid of being laughed at, you already know that you are shy and thus do not gain anything by being bold.

2dankane11y

I think my bigger point is that you don't seem to make any real argument as to which case we are in. For example, consider the following model of how people's perception of my trustworthiness might be correlated to my actual trustworthiness: There are two causal chains: My values -> Things I say -> Peoples' perceptions My values -> My actions So if I value trustworthiness, I will not, for example talk much about wanting to avoid being sucker (in contexts where it would refer to be doing trustworthy things). This will influence peoples' perceptions of whether or not I am trustworthy. Furthermore, if I do value trustworthiness, I will want to be trustworthy. This setup makes things look very much like the smoking lesion problem. A CDT agent that values trustworthiness will be trustworthy because they place intrinsic value in it. A CDT agent that does not value trustworthiness will be perceived as being untrustworthy. Simply changing their actions will not alter this perception, and therefore they will fail to be trustworthy in situations where it benefits them, and this is the correct decision. Now you might try to break the causal link: My values -> Things that I say And doing so is certainly possible (I mean you can have spies that successfully pretend to be loyal for extended periods without giving themselves away). On the other hand, it might not happen often for several possible reasons: A) Maintaining a facade at all times is exhausting (and thus imposes high costs) B) Lying consistently is hard (as in too computationally expensive) C) The right way to lie consistently, is to simulate the altered value set, but this may actually lead to changing your values (standard advice for become more confident is pretending to be confident, right?). So yes, in this model an non-trust-valuing and self-modifying CDT agent will self-modify, but it will need to self-modify its values rather than its decision theory. Using a decision theory that is trustworthy despite not in

5So8res11y

Right you are. Edited for clarity.