Hjalmar_Wijk
Autonomous replication and adaptation: an attempt at a concrete danger threshold
Note: This is a rough attempt to write down a more concrete threshold at which models might pose significant risks from autonomous replication and adaptation (ARA). It is fairly in the weeds and does not attempt to motivate or contextualize the idea of ARA very much, nor is it developed...
Tabooing 'Agent' for Prosaic Alignment
This post is an attempt to sketch a presentation of the alignment problem while tabooing words like agency, goals or optimization as core parts of the ontology.[1] This is not a critique of frameworks which treat these topics as fundamental, in fact I end up concluding that this is likely...
I do threat modeling and 'risk assessment' at METR, and often get asked what threat models I'm most focused on. I recently wrote a quick tweet thread with some rough thoughts which may be of interest to people here:
https://x.com/HjalmarWijk/status/1988070278149353894
Note that, as mentioned in the thread, I expect many of my colleagues disagree with my perspectives here.