For ELK truth is mostly a distraction
Epistemic Status: Pretty confident in the central conclusions, and very confident in the supporting claims from meta-logic. Any low confidence conclusions are presented as such. NB: I give an intentionally revisionary reading of what ELK is (or should be) about. Accordingly, I assume familiarity with the ELK report. Summary here. Executive Summary Eliciting Latent Knowledge (ELK) collapses into either the automation of science or the automation of mechanistic interpretability. I promote the latter. Abstract After reframing ELK from the perspective of a logician, I highlight the problem of cheap model-theoretic truth: by default reporters will simply learn (or search for) interpretations of the predictor’s net that make the teacher’s answers “true” in the model-theoretic sense, whether or not they are True (correspond with reality)! This will be a problem, even if we manage to avoid human simulators and are guaranteed an honest translator. The problem boils down to finding a way to force the base optimizer (e.g. gradient descent) to pay attention to the structure of the predictor’s net, instead of simply treating it like putty. I argue that trying to get the base optimizer to care about the True state of affairs in the vault is not a solution to this problem, but instead the expression of a completely different problem – something like automating science. Arguably, this is not the problem we should be focused on, especially if we’re just trying to solve intent alignment. Instead I tentatively propose the following solution: train the reporter on mechanistic interpretability experts, in the hope that it internalizes and generalizes their techniques. I expand this proposal by suggesting we interpret in parallel with training, availing ourselves of the history of a predictor’s net in order to identify and track the birth of each term in its ontology. The over-arching hope here is that if we manage to fully interpret the predictor at an earlier stage in its developm
Right so you're worried about moral hazard generated by insurance (in the case where we have liability in place). For starters, the government arguably generates moral hazard for disasters of a certain size by default: it can't credibly commit ex ante to not bail out a critical economic sector or not provide relief to victims in the event of a major disaster: the government is always implicitly on the hook (see Moss, D. A. When All Else Fails: Government as the Ultimate Risk Manager. See the too-big-to-fail effect for an example). Charging a risk-priced premium for that service can only help.
But you're probably more worried about private insurer's ability to mitigate the moral... (read more)