I'm confused because this sounds extremely trivial, and that doesn't seem right. It sounds to me like the theorem is just saying:
It just sounds like the theorem is assuming the conclusion. Am I missing something?
I was about to try this, but then realized the Internal Double Crux was a better tool for my specific dilemma. I guess here's a reminder to everyone that IDC exists.
I've talked to a lot of people about mech interp so I can enumerate some counterarguments. Generally I've been surprised by how well people in AI safety can defend their own research agendas. Of course, deciding whether the counterarguments outweigh your arguments is a lot harder than just listing them, so that'll be an exercise for readers.
Interp is hard
I think researchers already believe this. Recently I read https://www.darioamodei.com/post/the-urgency-of-interpretability, and in it, Dario expects mech interp to take 5-10 years before it's as good as an MRI.
Forall quantifiers
Forall quantifiers are nice, but a lot of empirical sciences like medicine or economics have been pretty successful without them. We don't really know how most... (read more)
99% of random[3] reversible circuits , no such exists.
Do you mean 99% of circuits that don't satisfy P? Because there probably are distributions of random reversible circuits that satisfy P exactly 1% of the time, and that would make V's job as hard as NP = coNP.
Have you felt this from your own experience trying to get funding, or from others, or both? Also, I'm curious what you think is their specific kind of bullshit, and if there's things you think are real but others thought to be bullshit.
I disagree because to me this just looks like LLMs are one algorithmic improvement away from having executive function, similar to how they couldn't do system 2 style reasoning until this year when RL on math problems started working.
For example, being unable to change its goals on the fly: If a kid kept trying to go forward when his pokemon were too weak. He would keep losing, get upset, and hopefully in a moment of mental clarity, learn the general principle that he should step back and reconsider his goals every so often. I think most children learn some form of this from playing around as a toddler, and reconsidering goals is still something we improve at as adults.
Unlike us, I don't think Claude has training data for executive functions like these, but I wouldn't be surprised if some smart ML researchers solved this in a year.
There's a lot of discussion about evolution as an example of inner and outer alignment.
However, we could instead view the universe as the outer optimizer that maximizes entropy, or power, or intelligence. From this view, both evolution and humans are inner optimizers, and the difference between evolution's and our optimization targets is more of an alignment success than a failure.
Before evolution, the universe increased entropy by having rocks in space crash into each other. When life and evolution finally came around, it was way more effective than rock collisions at increasing entropy even though entropy isn't in its optimization target. If there were an SGD loop around the universe to maximize entropy,... (read more)
I've had caps lock remapped to escape for a few years now, and I also remapped a bunch of symbol keys like parentheses to be easier to type when coding. On other people's computers it is slower for me type text with symbols or use vim, but I don't mind since all of my deeply focused work (when the mini-distraction of reaching for a difficult key is most costly) happens on my own computers.
I'm skeptical of the claim that the only things that matter are the ones that have to be done before AGI.
Ways it could be true:
Small note: A circuit of random toffoli gates (without any ancilla 1-bits) always maps all 0s to all 0s and would make the no-coincidence conjecture trivially true. I'm not sure how Gay et al. 2025 constructs random toffoli circuits to avoid this, but they must do it somehow in order for their Theorem 2 to be true.