User Comment Replies

In both cases, the AI behaves (during training) in a way that looks a lot like trying to make people happy. Then the AI described in (1) is unfriendly because it was optimizing the wrong concept of "happiness", one that lined up with yours when the AI was weak, but that diverges in various edge-cases that matter when the AI is strong. By contrast, the AI described in (2) was never even really trying to pursue happiness; it had a mixture of goals that merely correlated with the training objective, and that balanced out right around where you wanted the

... (read more)

What is Evidence?

alokja2y10

Therefore rational beliefs are contagious, among honest folk who believe each other to be honest. And it’s why a claim that your beliefs are not contagious—that you believe for private reasons which are not transmissible—is so suspicious. If your beliefs are entangled with reality, they should be contagious among honest folk.

I don't get this inference. seems like the belief itself is the evidence -- and you entangle your friend with the object of your belief just by telling them your belief -- regardless if you can explain the reasons? (private beliefs seem to me suspicious on other grounds)

2Rafael Harth2y

If your friend trusts that you arrived at your belief through rational means, you are correct. But often when someone can't give a reason, it's because there is no good reason. Hence "suspicious".

New 80,000 Hours problem profile on existential risks from AI

alokja3y1-1

A knowledge explosion itself -- to the extent that that is happening -- seems like it could be a great thing. So for what it's worth my guess would be that it does make sense to focus on mitigating the specific threats that it creates (insofar as it does) so that the we get the benefits too.

1Phil Tanny3y

It's certainly true that many benefits will continue to flow from the knowledge explosion, no doubt about it. The 20th century is a good real world example of the overall picture. * TONS of benefits from the knowledge explosion, and... * Now a single human being can destroy civilization in just minutes. This pattern illustrates the challenge presented by the knowledge explosion. As the scale of the emerging powers grows, the room for error shrinks, and we are ever more in the situation where one bad day can erase all the very many benefits the knowledge explosion has delivered. In 1945 we saw the emergence of what is arguably the first existential threat technology. To this day, we still have no idea how to overcome that threat. And now in the 21st century we are adding more existential threats to the pile. And we don't really know how to manage those threats either. And the 21st century is just getting underway. With each new threat that we add to the pile of threats, the odds of us being able to defeat each and every existential threat (required for survival) goes down. Footnote: I'm using "existential threat" to refer to a possible collapse of civilization, not human extinction, which seems quite unlikely short of an astronomical event.

Trapped Priors As A Basic Problem Of Rationality

alokja4y20

I feel like I don't understand how this model explains the biggest mystery of expereinces sometimes having the reverse impact on your beliefs vs. what they should.

The more technical version of this same story is that habituation requires a perception of safety, but (like every other perception) this one depends on a combination of raw evidence and context. The raw evidence (the Rottweiler sat calmly wagging its tail) looks promising. But the context is a very strong prior that dogs are terrifying. If the prior is strong enough, it overwhelms the real exper

... (read more)

1Evzen4y

It could also be that the brain uses weights that are greater than 1 when weighting the priors. That way, we don't lose the gradation.

LESSWRONG
LW

All of alokja's Comments + Replies