Apparently, some (compelling?) evidence of life on an exoplanet has been found.
I have no ability to judge how seriously to take this or how significant it might be. To my untrained eye, it seems like it might be a big deal! Does anybody with more expertise or bravery feel like wading in with a take?
Link to a story on this:
https://www.nytimes.com/2025/04/16/science/astronomy-exoplanets-habitable-k218b.html
simply instruct humans to kill themselves
This is obviously not the most important thing in this post, but it confused me a little. What do you mean by this? That an ASI would be persuasive enough to make humans kill themselves or what?
Note: I am extremely open to other ideas on the below take and don't have super high confidence in it
It seems plausible to me that successfully applying interpretability techniques to increase capabilities might be net-positive for safety.
You want to align the incentives of the companies training/deploying frontier models with safety. If interpretable systems are more economically valuable than uninterpretable systems, that seems good!
It seems very plausible to me that if interpretability never has any marginal benefit to capabilities, the little nuggets of interpretability we do have will be optimized away.
For instance, if you can improve capabilities slightly by allowing models to reason in latent space instead of in a chain of thought, that will probably end up being the default.
There's probably a good deal of path dependence on the road to AGI and if capabilities are going to inevitably increase, perhaps it's a good idea to nudge that progress in the direction of interpretable systems.
o3 lies much more blatantly and confidently than other models, in my limited experiments.
Over a number of prompts, I have found that it lies, and when corrected on those lies, apologies, and tells some other lies.
This is obviously not scientific, more of a vibes based analysis, but its aggressive lying and fabricating of sources is really noticeable to me in a way it hasn’t been for previous models.
Has anyone else felt this way at all?