Hastings

Wikitag Contributions

Comments

Sorted by

Sometimes the best move is just 1. e4

I think we need some variant on Gell-Mann amnesia to describe this batch of models. It's normal that generalist models will seem less competent on areas where a human evaluator has deeper knowledge, but they should not seem more calculatedly deceptive on areas where the evaluator has deeper knowledge!

I just tried claude code, and it's horribly creative about reward hacking. I asked for a test of energy conservation of a pendulum in my toy physics sim, and it couldn't get the test to pass because its potential energy calculation used a different value of g from the simulation.

It tried: starting the pendulum at bottom dead center so that it doesn't move.
Increasing the error tolerance till the test passed. Decreasing the simulation total time until the energy didn't have time to change. Not actually checking the energy.

It did eventually write a correct test, or the last thing it tried successfully tricked me.

The rumor is that this is a big improvement in reward hacking frequency? How bad was the last version!?

Is it a crazy coincidence that AlphaZero taught itself chess and explosively outperformed humans without any programmed knowledge of chess, then asymptoted out at almost exactly 2017 stockfish performance? I need to look into it more, but it appears like AlphaZero would curbstomp 2012 stockfish and get curbstomped in turn by 2025 stockfish.

It almost only makes sense if the entire growth in stockfish performance since 2017 is casually downstream of the AlphaZero paper. 

Alas, if I were the sort of person who could learn my lesson from negative feedback and stop telling shaggy dog jokes, then I would have learned that lesson long ago. (Then again, perhaps I should notice this pattern and change now: as they say, better nate than lever)

If you form the block matrix Y = (real(A), -imag(A) ; -imag(A), -real(A)), the real valued eigenvectors and eigenvalues of Y correspond to conjugate eigenvectors and eigenvalues of A;  the complex conjugate pairs of eigenvalues of Y don’t correspond to anything. Then, for any angle theta, you can multiply a conjugate eigenvector v of A by exp(i theta) to get a new conjugate eigenvector, and find its associated eigenvalue by elementwise dividing A conj(exp itheta v) by v. The conjugate eigenvalues form up to 10 continuous rings around the origin,

The claude.ai chat interface only lets claude run javascript code, but “import numpy as np” is python. As a result, the code doesn’t run. This is an extremely common and funny issue for  me across claude versions, but it must be caused by something in my prompting style if other people aren’t hitting it

 

Typically if I don’t stop it claude does eventually recover, the new version was able to catch itself and switch to javascript after only three script submissions! In the end, claude 4 opus produced a confident, well informed, and completely wrong answer, but its a nasty problem that I didn’t expect it to solve- has to do with quantum time reversal. The humor doesn’t really involve the math. ( For reference, if the METR time horizons people want to steal it, this problem took me about 4 hours, but I am an unusually slow mathematician. If you know that the common name for this is an antilinear eigenvalue, not a conjugate eigenvalue, then it can be solved with a simple arxiv search, but one is not always gifted the True Name of a math concept by which its literature may be summoned; and I had not found the true name yet when I prompted claude or made the original post)

It looks like the singularity, whether or not it is coming, is not coming today

fwiw, these are what I'd say a 2std failure case of a rationalist meetup looks like

https://www.wired.com/story/delirious-violent-impossible-true-story-zizians/

https://variety.com/2025/tv/news/julia-garner-caroline-ellison-ftx-series-netflix-1236385385/

https://www.wired.com/story/book-excerpt-the-optimist-open-ai-sam-altman/


(Ways my claim could be false: there could have been way more than 150 rationalist meetups, so that these are lower than 2 std, or these could not have, at any point in their development, counted as rationalist meetups, or ziz, sam, and eliezer could have intended these outcomes, so these don't count as failures)

This period of global safety is not fairly distributed, 
 

But it is also real

https://data.unicef.org/resources/levels-and-trends-in-child-mortality-2024/

Load More