I'm missing a connection somewhere—who was assuming this? You mean people at the AI companies evaluating the results? Other researchers? The general public?
The companies who tried to fight bias via fine-tuning their models. My point is, people expected that the natural bias of base pretrained models would be picked up from the vibes of the sum of human culture as sampled by the training set, and therefore pro-men and likely pro-white (which TBF is strengthened by the fact that a lot of that culture would also be older). I don't think that expectation was incorrect.
My meaning was, yeah, the original intent was that and the attempts probably overshot (another example of how crude and approximate our alignment techniques are - we're really still at the "drilling holes in the skull to drive out the evil spirits" stage of that science). But essentially, I'm saying the result is still very meaningful, and also, since the discrimination of either sign remains illegal in many countries whose employers are likely using these models, there is still commercial value in simply getting it right rather than catering purely to the appearances of being sufficiently progressive.
The entire market is quite fucked right now. But the thing is, if you have more and more applicants writing their applications with AI, and more and more companies evaluating them with AI, we get completely away from any kind of actual evaluation of relevant skills, and it becomes entirely a self-referential game with its own independent rules. To be sure this is generally a problem in these things, and the attempt to fix it by bloating the process even more is always doomed to failure, but AI risks putting it on turbo.
Of course it's hard to make sure the applicants don't use AI, so if only the employer is regulated that creates an asymmetry. I'm not sure how to address that. Maybe we should just start having employment speed-dating sessions where you get a bunch of ten minutes in-person interviews with prospective employers looking for people and then you get paired up at the end for a proper interview. At least it's fast, efficient, and no AI bullshit is involved. And even ten minutes of in person talking can probably tell more than a hundred CVs/cover letters full of the same nonsense.
I'm not particularly surprised that Chain-of-Thought's faithfulness is very hit-or-miss. The point of CoT it seems to me is to allow the LLM to have more "memory" to store multi-step reasoning, but that still doesn't remove the fact that when the final answer is a "yes" or "no" it'll also include an element of snap decision right as it predicts that last token.
Which actually makes me curious about this element, for example if the model has reached its final conclusion and has written "we have decided that the candidate does", what is the probability that the next word will be "not" for each of these scenarios? How significantly does it vary given different contexts?
Finally, the real-world relevance of this problem is clear. 82% of companies are already using LLMs for resume screening and there are existing regulations tied to bias in automated hiring processes.
To be fair, I think they should just be banned from having no-human-in-the-loop screenings, full stop. Not to mention how idiotic it is to let an LLM do your job for you to save a few hours of reading over a decision that can be worth hundreds of thousands or millions of dollars to your company.
Just because the assumption was that the problem would be discrimination in favour of white men doesn't mean that:
it's not still meaningful that this seems to have generated an overcorrection (after all it's reasonable that the bias would have been present in the original dataset/base model, and it's probably fine tuning and later RLHF that pushed in the other direction), especially since it's not explicitly brought up in the CoT, and;
it's not still illegal for employers to discriminate this way.
Corporations as collective entities don't care about political ideology quite as much as they do about legal liability.
A fair point, but more relevant to the issue at hand is - is it sociality that gives rise to consciousness, or is it having to navigate social strategy? Even though there is likely no actual single "beehivemind", so to speak, is consciousness more necessary when you're so social that simply going along with very well established hierarchies and patterns of behaviour is all you need to do to do your part, or is it superfluous at that point since distinction between self and other and reflection on it aren't all that important?
The argument also doesn't rely on any of this? It just relies on it being possible to compare the value of two different world-states.
I hold it that in general trying to sum the experiences of a bunch of living beings into a single utility function is nonsense, but in particular I'd say it does matter even without that. My point is that we judge wild animal welfare from the viewpoint of our own baseline. We think "oh, always on the run, half starved, scared of predators/looking for prey, subject to disease and weather of all sorts? What a miserable life that would be!" but that's just us imagining ourselves in the animal's shoes, while still holding onto our current baseline. The animals have known nothing else, in fact have evolved in those specific conditions for millions of years, so it would actually be strange if they experienced nothing but pain and fear and stress all the time - what would be the point of evolving different emotional states at all if the dial is always on "everything is awful"? So my guess is, no, that's not how it works, those animals do have lives with some alternation of bad and good mental states, and may even fall on the net positive end of the utility scale. Factory farming is different because those are deeply unnatural conditions that happen to be all extreme stressors in the wild, meaning the animals, even with some capability to adjust, are thrown into an out-of-distribution end of the scale, just like we have raised ourselves to a different out-of-distribution end (where even the things that were just daily occurrences for us at the inception of our species look like intolerable suffering because we've raised our standard of living so high).
The whole "wild animals suffer, therefore they should be eradicated for their own good" argument is obviously broken to me. To wit - if an alien civilization reached Earth in antiquity, would they have been right to eradicate humanity to free it from its suffering since everyone was toiling the whole day on the fields and suffering from hunger and disease? What if they reached us now but found our current lifestyle similarly horrible compared to their lofty living standards?
Living beings have some kind of adjustable happiness baseline level. Making someone happy isn't as simple as triggering their pleasure centres all the time and making someone not unhappy isn't as simple as preventing their pain centres to ever be triggered (even if this means destroying them).
Bees are at the other end, like ants, where they are so social that you have to start wondering where the individual bee ends and the hivemind begins. We go to those questions of how does consciousness relate simply to complexity of information processing vs integration.
Also, to be fair, most of this seems addressable with somewhat more sustainable apiculture practices. Unlike with meat, killing the bees isn't a necessary step of the process, it's just a side effect of carelessness or excessively cheap shortcuts. Bee suffering free honey would just cost a bit more and that's it.
Classic problem, but I see a lot of that happening already. Less of a problem for non-specialized jobs, but for tech jobs (like what I'm familiar with), it would have to be another tech person, yeah. Honestly for the vast majority of jobs anything other than the technical interview (like the pre-screening by a HR guy who doesn't know the difference between SQL and C++, or the "culture fit" that is either just validation of some exec's prejudices or an exercise in cold reading and bullshitting on the fly for the candidate) is probably useless fluff. So basically that's a "companies need to actually recognise who is capable of identifying a good candidate quickly and accept that getting them to do that is a valuable use of their time" problem, which exists already regardless of the screening methodologies adopted.