All of charlieoneill's Comments + Replies

Thank you for laying out a perspective that balances real concerns about misaligned AI with the assurance that our sense of purpose needn’t be at risk. It’s a helpful reminder that human value doesn’t revolve solely around how “useful” we are in a purely economic sense.

If advanced AI really can shoulder the kinds of tasks that drain our energy and attention, we might be able to redirect ourselves toward deeper pursuits—whether that’s creativity, reflection, or genuine care for one another. Of course, this depends on how seriously we approach ethical issues... (read more)

You raise a good point: sometimes relentlessly pursuing a single, rigid “point of it all” can end up more misguided than having no formal point at all. In my more optimistic moments, I see a parallel in how scientific inquiry unfolds.

What keeps me from sliding into pure nihilism is the notion that we can hold meaning lightly but still genuinely. We don’t have to decide on a cosmic teleology to care deeply about each other, or to cherish the possibility of building a better future—especially now, as AI’s acceleration broadens our horizons and our worries. Perhaps the real “point” is to keep exploring, keep caring, and keep staying flexible in how we define what we’re doing here.

I really appreciate your perspective on how much of our drive for purpose is bound up in social signalling and the mismatch between our rational minds and the deeper layers of our psyche. It certainly resonates that many of the individuals gathered at NeurIPS (or any elite technical conference) are restless types, perhaps even deliberately so. Still, I find a guarded hope in the very fact that we keep asking these existential questions in the first place—that we haven’t yet fully succumbed to empty routine or robotic pursuit of prestige.

The capacity to ref... (read more)

I agree - you need to actual measure the specificity and sensitivity of your circuit identification. I'm currently doing this with attention heads specifically, rather than just the layers. However, I will object to the notion of "overfitting" because the VQ-VAE is essentially fully unsupervised - it's not really about the DT overfitting because as long as training and eval error are similar then you are simply looking for codes that distinguish positive from negative examples. If iterating over these codes also finds the circuit responsible for the positi... (read more)

1wassname
It's just that eval and training are so damn similar, and all other problems are so different't. So while it is technical not overfitting (to this problem), if is certainly overfitting to this specific problems, and it certainly isn't measuring generalization in any sense of the word. Certainly not in the sense of helping us debug alignment for all problems. This is an error that, imo, all papers currently make though! So it's not a criticism so much as an interesting debate, and a nudge to use a harder test or OOD set in your benchmarks next time. Yeah, good point. I just can't help but think there must be a way of using unsupervised learning to force a compressed human-readable encoding. Going uncompressed just seems wasteful, and like it won't scale. But I can't think of a machine learnable, unsupervised learning, human-readable coding. Any ideas?

@Ruby @Raemon @RobertM I've had a post waiting to be approved for almost two weeks now (https://www.lesswrong.com/posts/gSfPk8ZPoHe2PJADv/can-quantised-autoencoders-find-and-interpret-circuits-in, username: charlieoneill). Is this normal? Cheers!

4habryka
Huh, definitely not normal, and I don't remember anything in the queue. It seems to be approved now.