Tom Davidson

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

I meant at any point, but was imagining the period around full automation yeah. Why do you ask?

I'll post about my views on different numbers of OOMs soon

Sorry, for my comments on this post I've been referring to "software only singularity?" only as "will the parameter r >1 when we f first fully automate AI RnD", not as a threshold for some number of OOMs. That's what Ryan's analysis seemed to be referring to. 

 

I separately think that even if initially r>1 the software explosion might not go on for that long

Obviously the numbers in the LLM case are much less certain given that I'm guessing based on qualitative improvement and looking at some open source models,

Sorry,I don't follow why they're less certain? 

 

based on some first principles reasoning and my understanding of how returns diminished in the semi-conductor case

I'd be interested to hear more about this. The semi conductor case is hard as we don't know how far we are from limits, but if we use Landauer's limit then I'd guess you're right. There's also uncertainty about how much alg progress we will and have met

Why are they more recoverable? Seems like a human who seized power would seek asi advice on how to cement their power

Tom Davidson*Ω7145

Thanks for this!

Compared to you, I more expect evidence of scheming if it exists. 

You argue weak schemers might just play nice. But if so, we can use them to do loads of intellectual labour to make fancy behavioral red teaming and interp to catch out the next gen of AI. 

More generally, the plan of bootstrapping to increasingly complex behavioral tests and control schemes seems likely to work. It seems like if one model has spent a lot of thinking time designing a scheme then another model would have to be much smarter to zero shot cause a catastrophe without the scheme detecting it. Eg. analogies with humans suggest this.

I agree the easy vs hard worlds influence the chance of AI taking over. 

But are you also claiming it influences the badness of takeover conditional on it happening? (That's the subject of my post)

So you predict that if Claude was in a situation where it knew that it had complete power over you and could make you say that you liked it then it would stop being nice? I think would continue to be nice in any situation of that rough kind which suggests it's actually nice not just narcissistically pretending

But a human could instruct an aligned ASI to help it take over and do a lot of damage

That structural difference you point to seems massive. The reputational downsides of bad behavior will be multiplied 100-fold+ for AI as it reflects on millions of instances and the company's reputation. 

 

And it will be much easier to record and monitor ai thinking and actions to catch bad behaviour. 


Why unlikely we can detect selfishness? Why can't we bootstrap from human-level? 

Load More