Transformers Represent Belief State Geometry in their Residual Stream

Adam Shai

Comment Permalink

I have only just come across this discussion (the original article referred to my work). The article

Fenton, N.E. and Neil, M. (2011), 'Avoiding Legal Fallacies in Practice Using Bayesian Networks'

was published in the Australian Journal of Legal Philosophy 36, 114-151, 2011 (Journal ISSN 1440-4982) A pre-publication pdf can be found here:

https://www.eecs.qmul.ac.uk/~norman/papers/fenton_neil_prob_fallacies_June2011web.pdf

The point about the use of the likelihood ratio (to enable us to evaluate the probative value of evidence without having to propose subjective prior probabilities) is something that I am increasingly having grave doubts about. This idea has been oversold by the forensic statistics community. I am currently writing a paper which will show that, in practice, the likelihood ratio as a measure of evidence value can be fundamentally wrong. The example I focus on is the Barry George case. Here is a summary of what the article says:

One way to determine the probative value of any piece of evidence E (such as some forensic match of an item found at the crime scene to an item belonging to the defendant) is to use the likelihood ratio (LR). This is the ratio of two probabilities, namely the probability of E given the prosecution hypothesis (which might be ‘item at crime scene belongs to defendant’) divided by the probability of E given the alternative defence hypothesis (which might be ‘item at crime scene does not belong to defendant’). By Bayes’ theorem, if the LR is greater than 1 then the evidence supports the prosecution hypothesis and if it is less than 1 it supports the defence hypothesis. If the LR is 1, i.e. the probabilities are equal, then the evidence is considered to be ‘neutral’ – it favours neither hypothesis over the other and so offers no probative value. The simple relationship between the LR and the notion of ‘probative value of evidence’ actually only works when the two alternative hypotheses are mutually exclusive and exhaustive (i.e. each is the negation of the other). This is often not clearly stated by proponents of the LR leading to widespread confusion about the notion of value of evidence. In many realistic situations it is extremely difficult to determine suitable hypotheses that are mutually exclusive. Often an LR analysis is performed against hypotheses that are assumed to be mutually exclusive but which are not. In such cases the LR has a much more complex impact on the probative value of evidence than assumed. We show (using Bayes’ theorem and Bayesian networks applied to simple, non-contentious examples) that for sensible alternative hypotheses – which are not exactly mutually exclusive – it is possible to have evidence with an LR of 1 that still has significant probative value. It is also possible to have evidence whose LR strongly favours one hypothesis, but whose probative value strongly favours the alternative hypothesis. We consider the ramifications on the case of Barry George. The successful appeal against his conviction for the murder of Jill Dando was based primarily on the argument that the firearm discharge residue (FDR) evidence that was assumed to support the prosecution hypothesis at the original trial actually had an LR equal to 1 and hence was ‘neutral’. However, our review of the appeal transcript shows numerous inconsistencies and poorly defined hypotheses and evidence such that it is not clear that the relevant elicited probabilities could have been based on mutually exclusive hypotheses. Hence, contrary to the Appeal conclusion, the probative value of the FDR evidence may not have been neutral

30

Bayesian justice

30

30