Nah, I buy that they're up so some wild stuff in the gradient descent dynamics / singular learning theory subfield, but solenoidal flux correction has to be a bit. The emperor has no clothes!
The issue is that they are getting better at making the slop convincing, and in the predicted ways - ways that got reward in training due to under-specified goals. The canonical example is Claude Code's tendency to delete tests, or make tests pass by mocking the part that we wanted to check.
This is one of the nastiest aspects of the LLM surge, that there's no way to opt out of having this prank pulled on me, over and over again.
What should I do if I had a sudden insight, that the common wisdom was right the whole time, if maybe for the wrong reasons? The truth- the honest to god real resolution to a timeless conundrum- is also something that people have been loon-posting to all comments sections of the internet. Posting the truth about this would be incredibly low status. I know that LessWrong is explicitly a place for posting low status truths, exactly as long as I am actually right, and reasoning correctly. Even though I fit those conditions I still fear that I'm going too far.
Here goes- the airplane actually can't take off from the treadmill.
For this bit to be funny, I do actually have to prove the claim. Obviously, I am using a version of the question that specifies that the treadmill speed dynamically matches the wheel radius * wheel angular velocity (probably via some variety of powerful servo). Otherwise, if the treadmill is simply set to the airplane's typical takeoff speed, the airplane moves forward as if on a normal runway (see the mythbusters episode)
Doing the math for a 747 with everything starting stationary: as soon as the airplane brakes release to initiate takeoff, the treadmill smoothly accelerates from 0 to 300 mph in a little under a quarter second. During this quarter second, the jet is held exactly stationary. At around 300 mph, the wheels mega-explode, and what happens after that is under-specified, fiery, and unlikely to be describable as "takeoff"
The key is that bearing friction is completely irrelevant- the dynamics are dominated by wheel angular momentum. With this it's an easy Newtonian physics problem- the forces on the bearings are norminal (comparable to full thrust with brakes on,) the tires aren't close to slipping on the treadmill, etc.
It's not a technological problem at this stage. Humanity has to look at the big button labelled "end the world" and say "actually, I won't press that." It's hard, but it's physically possible, a very clear plan, and we've done it before.
Well, the more duplicated stuff from last generation composes a larger fraction of the training data. In the long term that's plenty, although it's suspicious that it only took a single digit number of generations.
Do you have a realistic seeming story in mind?
I must report, with the appropriate degree of shame, that the "purple and commutes" joke was the only one to get me to physically laugh.
I think there is a bit of a rhetorical issue here with the necessity argument: I agree that a powerful program aligned to a person would have an accurate internal model of that person, but I think that this is true by default whenever a powerful, goal seeking program interacts with a person- it’s just one of the default instrumental subgoals, not alignment specific.
The object level analysis is fine, but object level analysis would not predict that the interaction of sex reassignment surgery and womens sports leagues would be a visible fraction of all nightly news coverage for the last decade.