Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
Nnotm10

I think it approaches it from a different level of abstraction though. Alignment faking is the strategy used to achieve goal guarding. I think both can be useful framings.

Nnotm63

Hah, I've listened to Half an hour before Dawn in San Francisco a lot and I only realized just now that the AI by itself read "65,000,000" as "sixty-five thousand thousand", which I always thought was an intentional poetic choice

Nnotm23

Note that the penultimate paragraph of the post says

> We do still need placebo groups.

Nnotm4-2

In principle, I prefer sentient AI over non-sentient bugs. But the concern that is if non-sentient superintelligent AI is developed, it's an attractor state, that is hard or impossible to get out of. Bugs certainly aren't bound to evolve into sentient species, but at least there's a chance.

Nnotm5-1

Bugs could potentially result in a new sentient species many millions of years down the line. With super-AI that happens to be non-sentient, there is no such hope.

Nnotm20

Thank you for this, I had listened to the lesswrong audio of the last one just before seeing your comment about making your version, and now waited before listening to this one hoping you would post one

Nnotm117

Missed opportunity to replace "if the box contains a diamond" with the more thematically appropriate "if the chest contains a treasure", though

Nnotm21

FWIW the AI audio seems to not take that into account

Nnotm21

Thanks, I've found this pretty insightful. In particular, I hadn't considered that even fully understanding static GPT doesn't necessarily bring you close to understanding dynamic GPT - this makes me update towards mechinterp being slightly less promising than I was thinking.

Quick note:
> a page-state can be entirely specified by 9628 digits or a 31 kB file.
I think it's a 31 kb file, but a 4 kB file?

Nnotm30

I think an important difference between humans and these Go AIs is memory: If we find a strategy that reliably beats human experts, they will either remember losing to it or hear about it and it won't work the next time someone tries it. If we find a strategy that reliably beats an AI, that will keep happening until it's retrained in some way.

Load More