This is an idea that just occurred to me. We have a large community of people who think about scientific problems recreationally, many of whom are in no position to go around investigating them. Hopefully, however, some other community members are in a position to go around investigating them, or know people who are. The idea here is to allow people to propose relatively specific ideas for experiments, which can be upvoted if people think they are wise, and can be commented on and refined by others. Grouping them together in an easily identifiable, organized way in which people can provide approval and suggestions seems like it may actually help advance human knowledge, and with its high sanity waterline and (kind of) diverse group of readers, this community seems like an excellent place to implement this idea.
These should be relatively practical, with an eye towards providing some aspiring grad student or professor with enough of an idea that they could go implement it. You should explain the general field (physics, AI, evolutionary psychology, economics, psychology, etc.) as well as the question the experiment is designed to investigate, in as much detail as you are reasonably capable of.
If this is a popular idea, a new thread can be started every time one of these reaches 500 comments, or quarterly, depending on its popularity. I expect this to provide help for people refining their understanding of various sciences, and if it ever gets turned into even a few good experiments, it will prove immensely worthwhile.
I think it's best to make these distinct from the general discussion thread because they have a very narrow purpose. I'll post an idea or two of my own to get things started. I'd also encourage people to post not only experiment ideas, but criticism and suggestions regarding this thread concept. I'd also suggest that people upvote or downvote this post if they think this is a good or bad idea, to better establish whether future implementations will be worthwhile.
If being a good or bad programmer is an intrinsic quality that is independent of the task, then you could just give the same subject different tasks to solve. So you take N programmers, and give team all K tasks to solve. Then you can determine the mean difficulty of each task as well as the mean quality of each programmer. Given that you should be able to infer the variance.
There are some details to be worked out, for example, is task difficulty multiplicative or additive? I.e. if task A is 5 times as hard as task B, will the standard deviation also be 5 times as large? But that can be solved with enough data and proper prior probabilities of different models.