The moderators sound like a tech-savvy group of folks. Which makes the situation you describe all the more disappointing - the mods sound collectively like they are bikeshedding things to death. You can't reach any agreement among the mods? Then fork and experiment! Let factions run their own particular subforum and see what happens. Code is Law and many things can be reversed.
Discussion of almost every item on your list could be moved forward with simple experiments little more complex than were required to develop the models that the group is devoted to.
non-paying
Srsly? But, are you not sure whether the off-topic area should be public or not? Then set it public on alternating months until the data (post numbers, post quality, etc.) is clear on which way works better.
You have problems with people not being willing to update models/simulations? This sounds like they are rat's nests of copy-paste balls of dirt, and you have collectively failed to develop good abstractions or domain-specific languages for the bosses. (My suspicions are further heightened by the mention of spreadsheets - as we Haskellers like to say, Excel is the world's most popular zeroth-order functional programming language.) I doubt the dungeons are that complex, so you should be able to refactor and abstract until the models/spreadsheets are so transparent any fool could update them.
Or maybe you lack good tutorials, going step by step from nothing to a working model. These can be written by one of the regulars developing a new model (you just need to copy all the intermediate steps and later you can write down the justifications and motivations behind each transformation; I flatter myself that my Haskell tutorials are good examples of this.
People are penalized for criticizing models without iron-clad evidence? Then figure out how it make it cheaper & easier to test the models, or offer anonymous forms of feedback. Voting & polls come to mind as things supported by many forum software for exactly this sort of purpose.
Even in WoW, hard problems remain that resist quantification - How do you motivate 25 people to keep battling a dragon that's been killing them for the last 2 hours? How do you identify the recruits that will best fit into an existing group and its culture? How do you balance redundancy and responsibility in leadership?
WoW seems to offer good data on player activities, so that means it offers good data to experiment with. Maybe you can motivate people by swapping them out. Maybe good recruits can be identified by slowly rising in level per hour played (eg. because they are spending time building social bonds and exploring the world, not level-grinding and burning out). Maybe you can just download player profiles and turn ML software loose on the data to see what tendencies correlate with 'being part of many boss-slaying groups'. These problems may resist quantification, but you look like you haven't even tried!
How long long did it take me to think of this new thing? "Less than a minute, sensei..."
I fear I've fallen into the historian's trap of implying intentionality in the course of presenting a selection of events as a narrative. Your underlying assertion is that we did a poor job planning our application architecture in advance of the grand project of modeling WoW; the reality is that we didn't know we had undertaken such a project until we were in the middle of it, until the community consensus had emerged that Elitist Jerks is where the theorycrafting happens.
A good comparison is open-source software. There's no editorial control preventing so...
In response to: http://lesswrong.com/lw/c1/wellkept_gardens_die_by_pacifism/
I'm a moderator at Elitist Jerks (http://www.elitistjerks.com), a World of Warcraft discussion forum. Within the WoW community, EJ has always been known for its strict moderation standards. We're exactly the sort of 'well-kept garden' that EY's post is about. You can see the fruit of the mod team's labor here: http://elitistjerks.com/f34/ I'll give some of the site's backstory for non-WoW players, describe the crossroads that we're currently at, and then give some caveats before you generalize too much from our example.
EJ's initial community came together to discuss WoW's most challenging content, known as "raids". In order to optimally outfit our characters for maximum performance in raids, both empirical and theoretical work was necessary: the game's combat mechanics were reverse engineered and detailed models for each character class were created. Within a couple of years, this "theorycrafting" work became the forum's primary purpose - refining and updating models as new game patches were released. Throughout the forum's life, high moderation standards have been maintained in order to protect our high signal/noise discussion. Primarily, asking for help is forbidden when the resources to answer your question already exist.
However, we're starting to wonder if we've performed our task too well.
So here we moderators sit on our porch, having kept our garden tidy for six years now. The questions we're asking are "Is this the community we meant to create?" and "What happens to a community formed to solve a problem once the problem is effectively solved?"
Caveats:
edit: fixed some link formatting