Tom Davidson - LessWrong

What is it to solve the alignment problem?

I enjoyed reading this, thanks.

I think your definition of solving alignment here might be too broad?

If we have superintelligent agentic AI that tries to help its user but we end up missing out of the benefits of AI bc of catastrophic coordination failures, or bc of misuse, then I think you're saying we didn't solve alignment bc we didn't elicit the benefits?

You discuss this, but I prefer to separate out control and alignment. Where I wouldn't count us as having solved alignment if we only elicit behavior via intense/exploitative control schemes. So I'd adjust your alignment definition with the extra requirement that we avoided takeover while not doing super-intense control schemes relative to what is acceptable to do to humans today. Which is a higher bar, and separates it from the thing we care about --avoiding takeover and eliciting benefits-- but I think that's a better def

Conflicts between emotional schemas often involve internal coercion

Tom Davidson3mo20

I enjoyed it, and think that ideas are important, but found it hard to follow at points

Some suggestions:

explain more why self criticism allows one part to assert control
give more examples throughout, especially the second half. I think some paragraphs don't have examples and are harder to understand
flesh out examples to make them longer and more detailed

An illustrative model of backfire risks from pausing AI research

Tom Davidson1y41

I think your model will underestimate the benefits of ramping up spending quickly today.

You model the size of the $ overhang as constant. But in fact it's doubling every couple of years as global spending on producing on AI chips grows. (The overhang relates to the fraction of chips used in the largest training run, not the fraction of GWP spent on the largest training run.) That means that ramping up spending quickly (on training runs or software or hardware research) gives that $ overhang less time to grow

But why would the AI kill us?

Tom Davidson1y10

Why are you at 50% ai kills >99% ppl given the points you make in the other direction?

Richard Ngo's Shortform

Tom Davidson2y10

So far causally upstream of the human evaluator's opinion? Eg an AI counselor optimizing for getting to know you

Richard Ngo's Shortform

Tom Davidson2y10

I think the "soup of heuristics" stories (where the AI is optimizing something far causally upstream of reward instead of something that is downstream or close enough to be robustly correlated) don't lead to takeover in the same way

Why does it not lead to takeover in the same way?

On the Diplomacy AI

Tom Davidson2y60

AI understands that the game ends after 1908 and modifies accordingly.

Does it? In the game you link it seems like the bot doesn't act accordingly in the last move phase. Turkey misses a chance to grab Rumania, Germany misses a chance to grab London, and I think France misses something as well.

Towards a Formalisation of Returns on Cognitive Reinvestment (Part 1)

Tom Davidson2y20

Glad you added these empirical research directions! If I were you I'd prioritize these over the theoretical framework.

What can the principal-agent literature tell us about AI risk?

Tom Davidson5y20

So either one must claim that AI-related unawareness is of a very different type or scale from ordinary human cases in our world today, or one must implicitly claim that unawareness modeling would in fact be a contribution to the agency literature.

I agree that the Bostrom/Yudkowsky scenario implies AI-related unawareness is of a very different scale from ordinary human cases. From an outside view perspective, this is a strike against the scenario. However, this deviation from past trends does follow fairly naturally (though not necessarily) from the hypothesis of a sudden and massive intelligence gap

What can the principal-agent literature tell us about AI risk?

Tom Davidson5y20

Re the difference between Monopoly rents and agency rents: monopoly rents would be eliminated by competition between firms whereas agency rents would be eliminated by competition between workers. So they're different in that sense.

LESSWRONG
LW

Posts

Wiki Contributions

Comments