Why should the smell yield to the argument instead of vice versa?
That's the $64,000(-a-year) question, and I don't have an answer I'm happy with for the general case. Here's roughly what I think for this specific situation.
As you say, my nose can't describe what it smells. It might be a genuine problem or a false alarm. To find out which, I have to consciously poke around for an overlooked counterargument or a weak spot in the original argument, something to corroborate the bad smell. I did that here and couldn't find a killer gap in 80,000 Hours' arguments, nor a strong counterargument for why I should disregard them.
For simplicity, consider a stripped down version of the decision problem where I have only two options: becoming a rich banker vs. getting a normal job paying the median salary.
Suppose I disapprove of the banker option for whatever reason. If I hold my nose and become a banker anyway, it seems very likely to me that (1) I would nonetheless prefer that to having someone with different values in my place instead, and (2) that even if taking a banking job worked against my values or goals, I could compensate for that by hiring other people to further them.
I had thought that reference class forecasting might warn against the get-rich-and-give strategy: people with more income give a smaller percentage of it to charity, so by entering banking one might opt into a less generous reference class. But quick Googling reveals that people with higher incomes give more in absolute terms, at least in the UK, the US, and Canada.
Putting aside the chance of my being wrong, what about the disutility of being wrong? Well, I agree with your final paragraph, so that doesn't seem to weigh heavily against the 80,000 Hours point of view either.
All in all my nose seems to have overreacted on this one. Maybe it raised the alarm because 80,000 Hours' conclusion failed a quick universalizability test, namely "would this still be the best choice if everyone else in the same boat made it too?" But that test itself seems to fail here.
I doubt my thoughts on this are bulletproof; there's a good chance I'm missing part of the puzzle or just plain wrong on some fundamental issue. Maybe I've built a convenient, clever meta-argument for arguing myself into something I'd probably want to do anyway! Still, ultimately I can devote only so much thought (and self-distrust) to this. I have to make a judgement call, and this is the best one I can make, whatever the risk of motivated cognition.
The current issue of the Oxford Left Review has a debate between socialist Pete Mills and two 80,000 hours people, Ben Todd and Sebastian Farquhar: The Ethical Careers Debate, p4-9. I'm interested in it because I want to understand why people object to the ideas of 80,000 hours. A paraphrasing:
As a socialist, Mills really doesn't like the argument that the best way to help the world's poor is probably to work in heavily capitalist industries. He seems to be avoiding engaging with Todd and Farquhar's arguments, especially replaceability. He also really doesn't like looking at things in terms of numbers, I think because numbers suggest certainty. When I calculate that in 50 years of giving away $40K a year you save 1000 lives at $2K each, that's not saying the number is exactly 1000. It's saying 1000 is my best guess, and unless I can come up with a better guess it's the estimate I should use when choosing between this career path and other ones. He also doesn't seem to understand prediction and probability: "every revolution is impossible, until it is inevitable" may be how it feels for those living under an oppressive regime but it's not our best probability estimate. [1]
In a previous discussion a friend also was mislead calculations. When I said "one can avert infant deaths for about $500 each" their response was "What do they do with the 500 dollars? That doesn't seem to make sense. Do they give the infant a $500 anti-death pill? How do you know it really takes a constant stream of $500 for each infant?". Have other people run into this? Bad calculations also tend to be distributed widely, with people saying things like "one pint of blood can save up to three lives" when the expected marginal lives saved is actually tiny. Maybe we should focus less on estimates of effectiveness in smart-giving advocacy? Is there a way to show the huge difference in effect between the best charities and most charities without using these?
Maybe I should have way more of these discussions, enough that I can collect statistics on what arguments and examples work and which don't.
(I also posted this on my blog)
[1] Which is not to say you can't have big jumps in probability estimates. I could put the chance of revolution at 5% somewhere based on historical data but then hear some new information about how one has just started and sounds really promising which bumps my estimate up to 70%. But expected value calculations for jobs can work with numbers like these, it's just "impossible" and "inevitable" that break estimates.