Contrast Pairs Drive the Empirical Performance of Contrast Consistent Search (CCS)
tl;dr Contrast consistent search (CCS)[1] is a method by Burns et al. that consists of two parts: 1. Generate contrast pairs by adding pseudolabels to an unlabelled dataset. 2. Use the contrast pairs to search for a direction in representation space that satisfies logical consistency properties. In discussions with other researchers, I've repeatedly heard (2) as the explanation for how CCS works; I've heard almost no mention of (1). In this post, I want to emphasize that the contrast pairs drive almost all of the empirical performance in Burns et al. Once we have the contrast pairs, standard unsupervised learning methods attain comparable performance to the new CCS loss function. In the paper, Burns et al. do a nice job comparing the CCS loss function to different alternatives. The simplest such alternative runs principal component analysis (PCA) on contrast pair differences, and then it uses the top principal component as a classifier. Another alternative runs linear discriminant analysis (LDA) on contrast pair differences. These alternatives attain 97% and 98% of CCS's accuracy! "[R]epresentations of truth tend to be salient in models: ... they can often be found by taking the top principal component of a slightly modified representation space," Burns et al. write in the introduction. If I understand this statement correctly, it's saying the same thing I want to emphasize in this post: the contrast pairs are what allow Burns et al. to find representations of truth. Empirically, once we have the representations of contrast pair differences, their variance points in the direction of truth. The new logical consistency loss in CCS isn't needed for good empirical performance. Notation We'll follow the notation of the CCS paper. Assume we are given a data set {x1,x2,…,xn} and a feature extractor ϕ(⋅), such as the hidden state of a pretrained language model. First, we will construct a contrast pair for each datapoint xi. We add “label: positive” and “label: ne