"Ever wanted to mindwipe an LLM? Our method, LEAst-squares Concept Erasure (LEACE), provably erases all linearly-encoded information about a concept from neural net activations. It does so surgically, inflicting minimal damage to other concepts. ... LEACE has a closed-form solution that fits on a T-shirt. This makes it orders of...
Does anyone know why GPT 4.5 is seemingly getting stuck on the word "explicitly", repeating it continuously after it encounters it once? Is this only happening in ChatGPT? Seems like some sort of context collapse.
Sightings in the wild: https://x.com/KelseyTuoc/status/1902132078378189198 https://x.com/Josikinz/status/1901840144363082047 https://x.com/4confusedemoji/status/1895613332662730832 https://x.com/Westoncb/status/1895615564313448781 https://x.com/noself86/status/1901230843240370287 https://x.com/0x440x46/status/1900855229068829139 https://x.com/GusarichOnX/status/1900184434806059072
"Ever wanted to mindwipe an LLM?
Our method, LEAst-squares Concept Erasure (LEACE), provably erases all linearly-encoded information about a concept from neural net activations. It does so surgically, inflicting minimal damage to other concepts.
...
LEACE has a closed-form solution that fits on a T-shirt. This makes it orders of magnitude faster than popular concept erasure methods like INLP and R-LACE, which require gradient-based optimization. And the solution can be efficiently updated to accommodate new data."
+1ing 5 specifically
mfw you didn't add the final addendum (https://twitter.com/ESYudkowsky/status/1642216007552106496)
What I do not understand is why Apple and Google haven’t taken care of this for us.
Palmer Luckey has this talking point about how China has all the big tech companies (Apple in particular) by the balls. That + Google maybe not wanting to seem monopolistic by banning their competition seems to be a sufficient explanation.
Why was this promoted to the frontpage?
Is "behavior vector space" referencing something? If not, what do you mean by it?
Unrelated to the post's content itself: will LW get in trouble for hosting this excerpt?
Responding to the last line: to be clear, I'm not claiming I have one. More wondering if the AI risk community should try to find one as a desperate hail mary given they have ~0 hope for their current research directions.
aka I'm wondering if trying to find one even is a desperate hail mary
Wait, what? Do you mean colloquial hieratic (just literally priestly) or his hieratic:
hieratic, adj. Of computer documentation, impenetrable because the author never sees outside his own intimate knowledge of the subject and is therefore unable to identify or meet the expository needs of newcomers. It might as well be written in hieroglyphics.
Cuz the latter seems extremely close to sazeny, if maybe additionally connoting blame on the author.
I'm in the middle of writing a nonfiction book whose central conceit is something like "an abridged dictionary of Kadhamic." Not literally the actual canonical Alexandrian Kadhamic, but the idea is to present some hundred-or-so concepts that are long and complicated and difficult to convey in English, but which are not fundamentally more complicated than things we sum up with a single word like "basketball" or "gaslighting" or "cringe."
Very interested for when this comes out :O