Nnotm - LessWrong

“Alignment Faking” frame is somewhat fake

I think it approaches it from a different level of abstraction though. Alignment faking is the strategy used to achieve goal guarding. I think both can be useful framings.

LessWrong's (first) album: I Have Been A Good Bing

Nnotm8mo63

Hah, I've listened to Half an hour before Dawn in San Francisco a lot and I only realized just now that the AI by itself read "65,000,000" as "sixty-five thousand thousand", which I always thought was an intentional poetic choice

Why I don't believe in the placebo effect

Nnotm10mo23

Note that the penultimate paragraph of the post says

> We do still need placebo groups.

"If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

Nnotm11mo4-2

In principle, I prefer sentient AI over non-sentient bugs. But the concern that is if non-sentient superintelligent AI is developed, it's an attractor state, that is hard or impossible to get out of. Bugs certainly aren't bound to evolve into sentient species, but at least there's a chance.

"If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

Nnotm11mo5-1

Bugs could potentially result in a new sentient species many millions of years down the line. With super-AI that happens to be non-sentient, there is no such hope.

AI #64: Feel the Mundane Utility

Nnotm11mo20

Thank you for this, I had listened to the lesswrong audio of the last one just before seeing your comment about making your version, and now waited before listening to this one hoping you would post one

LessWrong's (first) album: I Have Been A Good Bing

Nnotm1y117

Missed opportunity to replace "if the box contains a diamond" with the more thematically appropriate "if the chest contains a treasure", though

Comp Sci in 2027 (Short story by Eliezer Yudkowsky)

Nnotm1y21

FWIW the AI audio seems to not take that into account

Remarks 1–18 on GPT (compressed)

Nnotm2y*21

Thanks, I've found this pretty insightful. In particular, I hadn't considered that even fully understanding static GPT doesn't necessarily bring you close to understanding dynamic GPT - this makes me update towards mechinterp being slightly less promising than I was thinking.

Quick note:
> a page-state can be entirely specified by 9628 digits or a 31 kB file.
I think it's a 31 kb file, but a 4 kB file?

Human beats SOTA Go AI by learning an adversarial policy

Nnotm2y30

I think an important difference between humans and these Go AIs is memory: If we find a strategy that reliably beats human experts, they will either remember losing to it or hear about it and it won't work the next time someone tries it. If we find a strategy that reliably beats an AI, that will keep happening until it's retrained in some way.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments