I believe your Amazon rankings example refers to Ryan North's (and his coauthors') Machine of Death.
Thanks for explaining - I think I understand your view better now.
I guess I just don't see the trolley problem as asking "Is it right or wrong, under all possible circumstances matching this description, to pull the lever?" I agree that would be an invalid question, as you rightly demonstrated. My interpretation is that it asks "Is it right or wrong, summed over all possible circumstances matching this description, weighted by probability, to pull the lever?" I.e. it asks for your prior, absent any context whatsoever, which is a valid question.
Under that interpretation, the correct answer of "sometimes pull the lever" gets split into "probably pull the lever" and "probably don't pull the lever", which are the same in effect as "pull" and "don't pull". The supposition is that you have a preference in most cases, not that your answer is the same in all cases.
(This is still a false dichotomy - there's also the option of "pull the lever equally as often as not", but I've never heard of anyone genuinely apathetic about the trolley problem.)
The first interpretation seems sensible enough, though, in the sense that many people who pose the trolley problem probably mean it that way. The correct response to those people is to reject the question as invalid. But I don't think most people mean it that way. Most people ask for your best guess.
Edit: On further reflection I think the usual interpretation is closer to "Is it better to follow a general policy, over all possible circumstances matching this description, to pull the lever or not?" I think this is closer to your interpretation but I don't think it should produce a different answer to mine.
Thanks for the welcome!
I disagree that anyone who poses an ethical thought experiment has a burden to supply a realistic amount of context - simplified thought experiments can be useful. I'd understand your viewpoint better if you could explain why you believe they have that burden.
The trolley problem, free from any context, is sufficient to illustrate a conflict between deontology and utilitarianism, which is all that it's meant to do. It's true that it's not a realistic problem, but it makes a valid (if simple) point that would be destroyed by requiring additional context.
It's easy to respond to a question that doesn't contain much information with "It depends" (which is equivalent to saying "I don't know"), but you still have to make a guess. All else being the same, it's better to let 1 person die than 5. Summed over all possible worlds that fall under that description, the greatest utility comes from saving the most people. Discovering that the 1 is your friend and the 5 are SS should cause you to update your probability estimate of the situation, followed by its value in your utility function. Further finding out that the SS officers are traitors on their way to assassinate Hitler and your friend is secretly trying to stop them should cause another update. There's always some evidence that you could potentially hear that would change your understanding; refusing to decide in the absence of more evidence is a mistake. Make your best estimate and update it as you can.
The only additional complexity that ethical questions have that empirical questions don't is in your utility function. It's equally valid to say "answers to empirical questions are usually context-dependent", which I take to mean something like "I would update my answer if I saw more evidence". But you still need a prior, which is what the trolley problem is meant to draw out: what prior utility estimates do utilitarianism/deontology/virtue ethics give over the action space of pulling/not pulling the lever? Discovering and comparing these priors is useful. The Harvard students are correct in answering the question as asked. Treating the thought experiment as a practical problem in which you expect more information is missing the point.
I commend your vision of LessWrong.
I expect that if something like it is someday achieved, it'll mostly be done the hard way through moderation, example-setting and simply trying as hard as possible to do the right thing until most people do the right thing most of the time.
But I also expect that the design of LessWrong on a software level will go a long way towards enabling, enforcing and encouraging the kinds of cultural norms you describe. There are plenty of examples of a website's culture being heavily influenced by its design choices - Twitter's 280-character limit and resulting punishment of nuance comes to mind. It seems probable that LessWrong's design could be improved in ways that improve its culture.
So here are some of my own Terrible Ideas to improve LessWrong that I wouldn't implement as they are but might be worth tweaking or prototyping in some form.
(Having scanned the comments section, it seems that most of the changes I thought of have already been suggested, but I've decided to outline them alongside my reasoning anyway.)
Aggregating these into a total score isn't terrible, but it does lead to behaviour like "upvoting then commenting to point out specific problems with the comment so as to avoid a social motte-and-bailey" like you describe. Commenting will always be necessary to point out specific flaws, but more general feedback like "this comment makes a valuable point but is poorly written and somewhat misleading" could be expressed more easily if 'value', 'writing quality' and 'clarity' were voted on separately.
These could have a probability or probability distribution attached, if appropriate.
Most of these would make interacting with the site require extra effort, but (if done right) that's a feature, not a bug. Sticking to solid cultural norms takes effort, while writing destructive posts and comments is easy if due process isn't enforced.
Still, making these kinds of changes right is very difficult and would require extensive testing to ensure that the costs and incentives encourage cultural norms that are worth encouraging.