Sébastien Larivée

A sufficiently paranoid paperclip maximizer

I like this story, it touches on certainty bounds relative to superintelligence, which I think is an underexplored concept. That is: if you're an AGI that knows every bit of information relevant to a certain plan, and has the "cognitive" power to make use of this info, what would you estimate your plan's chance of success to be? Can it ever reach 100%?

To be fair, I don't think answers to this question are very tractable right now, afaict we don't have detailed enough physics to usefully estimate our world's level of determinism. But this feature-of-universe seems especially relevant to how likely an AGI is to attempt complex plans (both aligned and unaligned) which carry some probability of it being turned off forever.

If anyone knows of existing works which discuss this in more depth I'd love some recommendations!

What should you change in response to an "emergency"? And AI risk

Sébastien Larivée3y41

Insight volume/quality doesn't seem meaningfully correlated with hours worked (see June Huh for an extreme example), high-insight people tend to have work schedules optimized for their mental comfort. I don't think encouraging someone who's producing insights at 35 hours per week to work 60 hours per week is positive will result in more alignment progress, and I also doubt that the only people producing insight are those working 60 hours per week.

EDIT: this of course relies on the prior belief that more insights are what we need for alignment right now.

DeepMind: Generally capable agents emerge from open-ended play

Sébastien Larivée4y130

This seems to support Reward is Enough.

More specifically:

DM simulates a lower fidelity version of real world physics -> Applies real world AI methods -> Achieves generalised AI performance.

This is a pretty concrete demonstration that current AI methods are sufficient to achieve generality, just need more real world data to match the more complex physics of reality.

LESSWRONG
LW

Posts

Wiki Contributions

Comments