Disadvantages of Card Rebalancing

Zvi

Previously: Artifact Embraces Card Balance Changes, Card Collection and Ownership, Card Balance and Artifact, Card Rebalancing, Card Oversupply and Economic Considerations in Digital Card Games, Advantages of Card Rebalancing

This is the last post in this sequence, although we will doubtless return to related topics in the future.

IX. Non-Economic Disadvantages of Card Rebalancing

Last time, I explored eight reasons why card rebalancing was great. Now it is time to turn those reasons on their head, and see how disaster might strike.

There are three central reasons why I worry about card rebalancing. They are card and game economics, destruction of history, work and memory, a desire to ‘overbalance,’ and Goodhart’s Law.

Economics I’ve already considered. For considerations of card ownership, in-game economics and related matters, see Card Collection and Ownership and Card Rebalancing, Card Oversupply and Economic Considerations in Digital Card Games. I won’t consider such issues here.

Category 1: Destruction of history, work and memory.

1A. Players cannot rely on their knowledge of what cards do.

I know the exact abilities of thousands of Magic cards and hundreds of Artifact cards, plus thousands of cards in other games, both collectible and otherwise. I know the rules of thousands of board games, card games and video games.

Now imagine if those abilities and rules were constantly changing. If every turn and every action I had to worry that the rule wasn’t I thought it was, or the card had different abilities.

Magic players already have a taste of this with the grand creature type update. In that case, the problem was redoubled by not being able to know the change by reading the card. Players know that looking at a sufficiently old card’s printed creature type is not a good guide to its actual creature type. It can be obvious that Elvish Archers is now an Elf and an Archer, and most other changes are similarly guessable and logical.

The good news in a digital game is that one can always read the card, and various visual aids can be introduced to warn a player that a card has changed within some time frame, or since the last time the card was played against them. The other good news is that one periodically expects to face new cards, so facing new variations of existing cards is a reasonable punch to roll with.

Still, not being able to trust your knowledge of a game is off putting at best. When one is keeping close track of a game, making it a primary focus of time, this isn’t a big deal. When one wants to take breaks and return to games later or occasionally, it is a much bigger one.

I have always had trouble learning foreign languages. The closest I’ve come is being able to learn card games and their associated vocabulary. Not ebing a

1B. Players cannot rely on their knowledge of the strategic landscape.

The flip side of keeping a game fresh and new all the time, and giving people new strategies to explore, is that you invalidate what everyone has learned.

I will speak to this issue from my experience as a professional competitor.

When strategic situations move slowly, or shift periodically with set releases, one can make a medium-term investment in strategic acumen. Practice and study can be gotten out of the way, allowing one to be ready for and focus on the game itself (along with any drafts or sealed deck builds). Once familiar with a game and a format, knowledge will continue to advance and the meta-game may shift, and there is always room to go deeper and improve, but with confidence that one’s work will continue to have relevance, and mostly not go to waste.

When strategic situations move too fast, taking even a week off of ‘doing the work’ gets severely punished. It becomes impossible to ‘stay in fighting shape’ without continuous work. Magic, due to Magic Online and various events, even without cards changing, has reached a point where you must prepare for the constructed portion of events during the final week, and you can’t change your list in the last two days, leaving a very narrow window that leaves zero flexibility for players lives. All previous work was necessary to be in position to adapt but still mostly wasted.

This is in sharp contrast to the old era, when people prepared mostly in private over the course of a month. Then, I would work hard but often have several weeks between finishing my work and the event itself.

This is all, of course, the flip side of making innovation and interesting new things possible. It’s not like we want a world in which there are not new things. But by effectively forcing everyone to constantly relearn things to stay in shape, while having constant internet sources of basic knowledge and strategy, you force those looking to not follow the herd to put in absurd amounts of work that won’t last. Cards changing all the time would make this that much worse.

See Mark Cuban’s recent notes about why he declined to buy an e-sports team. Constantly changing rules and elements force insanely long work weeks and lead to player burnout. I speak from experience that burnout is a real problem, and that minor tweaks that wipe out your existing knowledge and assumptions are a big contributor to that, in a way that adding additional elements every so often is distinct from.

What about more casual players? Is this a problem confined to a relatively small percent of players at the top? Based on my experience in other games that I was not taking as seriously, the same problem is real there as well, on longer time scales, and losing solidity is a big deal.

Below a certain level of precision and seriousness this should presumably fall off in magnitude. Even better, for those who only play during a short window, the changes won’t feel like changes at all, so none of this matters unless a player stays for a longer time frame. That also makes it difficult to measure the effect.

1C. Destruction of history.

A lot of the joy of Magic lies in its history. We have a quarter century of cards, games, mistakes, brilliancies, broken ideas, innovations, theories, decks, choices, competitions, friendships, communities, arguments, formats and stories. Investing in Magic now pays off not only in fun and learning how to think. It pays off in a community, in deep friendships and shared stories and experiences. It pays off in older decks and formats that provide permanent sources of throwback fun, that are a much underused source whose surface Magic Online has only begun to scratch in its throwback offerings.

When we alter things such that this history, our stories and the related formats and decks no longer makes sense or function properly, because its components have all been changed, we severely damage this.

I know this may seem like a special interest petty concern. I do not think it is, and it and other hard to notice risks and costs being completely ignored and crushed once Goodhart’s Law considerations start operating.

The default of the internet age is to periodically break or expire everything from the past, such that it ceases to function, in many ways that make me deeply sad. Servers shut down and games become unavailable, websites go silent. Often this could have been avoided at little cost, but such considerations were not taken into account at all, so over time we all get poorer than we could have been.

Getting the throwback NES Classic and SNES Classic consoles these past few years has been a great joy, and the Playstation Classic and Sega Genesis Classic failed to join them only due to failure of execution. I spend a substantial portion of my gaming time on experiences created during those ancient times, and variations therefrom, and if technologically easier I would spend more. Remakes of these experiences, or modern takes on those experiences that impose similar restrictions, I often find to be superior. Remakes can greatly enhance the experience, by offering quality-of-life improvements, but can also make difficulty adjustments that destroy the real game experience. This brings us to the second category of concerns.

Category 2: Goodhart’s Law considerations

2A. Choices are Bad

When cards can change, there will be higher expectations for every aspect of them, and more blame when things are not exactly to people’s preferences. People will compare what exists to what might exist, and not forgive the designers their (in the player’s opinion) mistakes. This will be especially bad given that mistakes are, to an extent, good for a game, as we’ll get to later.

Quirky things will be cries for fixes rather than quirky things to adjust around. When a player loses to a thing, they will more often cry the thing is unfair and to nerf it, rather than looking around for how to beat it.

There are two big potential downsides here.

The first is that for any given game experience delivered, players (who care about things enough to think at this level) will be less happy, because they are judging against a different standard.

The second is that there will be tremendous pressure to optimize a variety of metrics, and to judge changes on short time horizons based on success on these metrics.

Potential targets include number of active players, time played, card prices, revenue, estimated lifetime revenue per customer, reported player satisfaction or posted reviews, what people say on Reddit or Google News or elsewhere, diversity of metagame on a variety of skill and card access levels, win rates for various strategies, number of players playing a variety of decks, or other things I’m not thinking about right now.

Hello, Goodhart’s Law. Hello, short termism.

2B. Players are wrong about what they want

People are very bad at knowing what they want. They are even worse at figuring out what impacts changes would have, short term or long term, on a game.

Magic’s player base reliably had too much support, too fast, for bans and emergency actions, for decades. This may have eventually been fixed after decades of explanations and data, or it might not have been. I would expect this to be the pattern for almost all games, and for balance changes to follow the pattern even more.

There are lots of other things that are short term popular but long term detrimental and not so popular. Power creep, and using up more design space, are two easy examples. In the interests of space and avoiding side arguments I won’t go further here.

Magic has managed to educate its player base to dampen this effect somewhat, but only somewhat.

2C. Game risks focusing developer time on the wrong things

Focus risks being on tweaking existing things, and making short term improvements, rather than on creating new things and making long term improvements.

When creating new cards and formats, it will be impossible to plan for them fully in advance, as key elements already in print will change all the time. The equivalent of the Magic “Future Future League” will give far less valuable data. Secondary and older formats like Vintage and Legacy (or I suspect even Modern) in Magic will not be properly considered when changes are made, either.

2D. Goodhart’s Law hill climbs destroy art and end poorly

I strongly believe that Goodhart’s Law problems are getting worse every year, and causing many, perhaps most, of our internet experiences to become incrementally worse rather than better, in ways that major corporations that do many brilliant things have proven unable to overcome. Some day I want to focus in on Netflix, and some day I want to spend a ton of time getting this case exactly right, but for now I want to avoid such sidetracks.

I do not trust anyone, myself included, to handle this well. Give a person enough knobs, and enough immediate feedback that feels important, and they will turn those knobs until their numbers go up. Other considerations that are harder to measure will be discarded. Visions and virtues will be compromised. Utilitarian, consequentialist philosophies will de facto dominate the discussion.

2E. Old man yells at cloud

I want to stop here to say, yes, I totally, totally get this is a thing and I might well be doing a lot of that thing. And that my inability to be more convincing, or in places more concrete, with these arguments is a symptom that points to that. That I liked things better back when everything was worse, time has in some ways passed me by, and other such things.

I feel that way about a lot of things, sometimes. I get old, same as everyone. It’s hard to tell how much of this is me getting old and set in ways and nostalgic, versus how much is actual problems and civilizational decline and social media and Goodhart’s Law and [other things] destroying everything good. How much of this is me having very strong links to the benefits and deep implications of systems that no longer make sense, and making errors in magnitude? Is this the same impulse that leads to NIMBYism and hatred of capitalists and creative destruction everywhere, and is almost always valid in important ways but centrally and more importantly wrong? Is all of my good reasons justifying it me being too clever for my own good?

I don’t know.

It is also likely that I have undo motivation to find problems with card rebalancing. I have quirky preferences. I think it leads to too much superficial balance, largely because players demand it, and that this is quite bad, which I’ll discuss in the last section. Also, I’ll be building a game where easy rebalancing is flat out not an option because it will be counting on establishing player trust in its digital objects, and appealing to an audience that highly values what I call card ownership, because it highly values parallel types of ownership anywhere and everywhere.

When I ask myself what my true, main objection is in all this, I get back two answers.

The first is that by doing constant changes we’re wiping out value and history, but that feels like a valid choice. It will sometimes be the right thing to do. If that’s all there is.

The second is that I believe that given the power and opportunity, it will be used wrong, and be used to make the product worse. Goodhart’s Law. When I tunnel into that, and ask: What is the primary way I expect this to happen?

I get a clear answer.

Overbalance.

3. Games That Rebalance Will Overbalance

3A. Cards, including many iconic cards that are core to the game, or to factions/colors/classes, get nerfed.

Games without rebalancing solve their most extreme problems with bans. As the cost of banning cards goes down via rebalancing, they use this cheaper form of banning cards more often.

Eternal and Hearthstone rely on rebalancing. In both cases, it has been used frequently to nerf constructed cards, resulting in effective bans.

In both cases, it has not been used frequently (if at all) to strengthen cards in a way that resulted in the cards gaining constructed level power and appearing frequently.

See this compilation of Hearthstone balance changes, almost all of which are cost increases or effect decreases. Hearthstone’s base set has continuously gotten crappier, as its staple cards have been taken out one at a time, to the extent that a substantial percentage of the set has now fallen victim. The rate slowed a lot when the beta ended, but the pattern continued.

This is especially bad when it targets powerful cards that represent a unique ability of a color or faction. The best versions of these abilities need to be very good cards, so the abilities will be things worth sacrificing to get. You also don’t want more cards that do the same thing at the same power level, to avoid letting players use tons of copies of the card. Red in Artifact is balanced not around all its heroes being bigger and stronger than all others, but around its best heroes being stronger than other colors’ best heroes. Hence (pre-patch) Axe and Legion Commander, then a drop off. Druid in Heartstone used to get Innervate and Wild Growth, which balanced what other colors get, and has now mostly lost both.

Each iconic card that gets weakened doubly strengthens the case for weakening the next one. The remaining cards are comparatively stronger, and the bar to taking action has gone down in both relative and absolute terms.

3B. Cards that are already good will not get better

Artifact’s first patch weakened or transformed two of its best heroes, and strengthened five others. None of the five strengthened heroes were previously playable in constructed, and at most one of them is at all playable now.

One of its other two moves, the changes to Jasper Daggers, did create a powerful card with a unique new ability. This was good to see in principle, if a little odd (and I do in some ways dislike that you can get out from having all your heroes silenced, but we learn and adapt), and its primary purpose was to serve as an additional way to weaken one of the two injured heroes, by giving players an answer to Gust and thus to Drow Ranger.

Jasper Daggers was not an example of ‘we have a card that we’d like to be good enough, but people aren’t playing it, so we’ll push that hard more.’ It was also not an example of, ‘we have a card that people are playing, but we’d like them to play it more, so we’re going to make it better’ or ‘we have a card people are playing, but we’d like to make the associated decks stronger, so we’re going to make it better.’ It was more like ‘we need this card to exist, and can’t wait until we have an expansion, so we’re going to stick that card (with aggressive costing) into the slot where we used to have Jasper Daggers.’

To preserve the flavor and connection to the original Jasper Daggers, it was allowed to keep pierce. But this was much more of a ‘print a new card out of cycle’ action than a balance change.

Eternal, unlike Hearthstone, does often strengthen cards, but like Artifact’s hero improvements, it does so on cards that are not competitive, and does not in doing so create dangerous new cards. Consider the first change log I found on Google search, patch 1.39. That was also the patch that made me say ‘all right, I’m out with (almost) no regrets.’ The constructed section takes down key longtime staples Channel the Tempest and Icaria, the Liberator, newcomer Auerlian Merchant, and the what-did-I-ever-do-to-you-except-eat-your-face hidden gem Predatory Carnosaur. Then the draft portion offers various buffs and nerfs, with more buffs than nerfs, but none of those buffs are impactful on constructed play, nor were they intended to be.

So in an important sense, none of them count.

Thus:

3C. Card power levels will converge and cards will become redundant

There is a range of sensible power differentials between cards. Sometimes, this move will be in the right direction. For a time. But all slopes are slippery and everything is trying to kill you, if changes only go in one direction.

The rest of this section risks being something that belonged in Card Balance and Artifact but I realized that the previous post didn’t really justify why such balance could be a bad thing, so here we are.

Each faction/color will now have a wide variety of options to do each of its core things, within a relatively narrow power level. Decisions stop being as interesting.

I like the idea that blue gets a single copy of a broken counter, Mana Drain, then four copies of one great counter, Counterspell, then mediocre ones like Power Sink and Spell Blast. Or later, that you get Counterspell for UU, you get a conditional Mana Leak for 1U, and then if you want more counters like Dissipate you’ll have to pay three mana for them and not get much in exchange. No matter what you’re looking for, you have big tension, as there’s a marginal card that’s there to tempt you.

Or that red gets to play Lightning Bolt, then it gets to play Chain Lightning, then it gets to pay up for Fireball, or at least Incinerate. Or that Druid in Hearthstone gets Innervate and Wild Growth, but only gets two of each and no supplemental options half as good.

Magic now has a variety of counters available, mostly all the same. You spend three mana, and you get to counter a spell plus a little something extra, as many times as you want. I tested for the Pro Tour on Magic Arena with Sinister Sabotage instead of Ionize, because I didn’t feel like wasting rare wildcards on Ionize. Close enough. Red similarly has a large number of available burn spells at close power levels, none of which is a clear big reward for splashing the color like Lightning Bolt is in Old School.

In Artifact, it is important that the fourth red hero is much worse than the second one. That’s one way you have to sacrifice to get one color decks. Another is that your cards 35-40 are also going to suck, and you’re going to get less of the awesome. Take those away, and we get a lot of mono-color decks and things are much less interesting. If mono-blue gets too good in Artifact, which is easy to imagine, the risk is that they target it by doing something like pushing Annihilate to seven mana, which makes rewards to diverse colors that much worse rather than better.

Games also stop being as interesting. We want a diversity of game experiences, and having all the cards at similar power levels makes that much harder, and risks taking out one of the good sources of variance and luck. Decks should have key cards that they very much want to draw, that games then revolve around, and to do that those cards need to be big rewards.

It seems like over time, we learn what players will play and like it because they need the effect in question, and we give them that, which to me takes the joy away. The cards are less special, you don’t have that ‘good stuff’ feeling that players (and I in particular) love. That idea that you’re getting away with something and mining out the premium quality.

The countering force is new cards that restore the imbalance. As noted in the advantages section, rebalancing enables this to be more extreme. That also carries its own dangers. Colors, and the game, become focused each quarter on the ‘new hotness’ of the latest set, disenfranchising existing cards and strategies and forcing players to take up the new ones. Existing strategies need to get major help each set as the balance shifts, or they fall away, so diversity is limited to what an individual set can do. Core abilities like counters and burn can’t get such boosts all that often.

3D. Matchups become more balanced, cards get more complex and answers stop working

Brad Nelson recently wrote an article (behind the Star City Games paywall) where he celebrated that Magic was embracing cards that could play in all situations. He noted that one big advantage of this was that Arena’s best of one matches would make more sense, as sideboards would be less vital. He also noted how he hates it when sideboards are about jamming lots of hate cards in that swing entire matchups.

I agree that this is happening. I strongly disagree that this is good.

I think Magic’s lack of hate cards, and its printing lots of cards (including but not limited to its planeswalkers) that have tons of versatility and that always play well, as one of its key problems. I think that Modern is great in large part because if you want to beat any given thing, you mostly can do that quite reliably, so things adjust. It’s a little heavy handed, sure, but it gets the job done.

Most importantly, it guards us against mistakes. If something is out of hand, players can respond with the hate until it is back in hand. Whereas in Standard, we’ve seen cycle after cycle where one or two decks proved to ‘play better Magic’ than others, once the good builds were found, and suddenly entire colors couldn’t beat them no matter how much they cared.

Sideboarding that is subtle and involves small strategic shifts is super interesting, and it’s been great for Brad Nelson, since he’s perhaps the world’s expert in it. But it’s also highly dangerous, and I think it being less dramatic and obvious hurts the average player experience.

Players often complain about matchups being ‘too lopsided’ as if this reduces skill too much. In extremus it can do that, but having everything be 55-45 to me is far worse. I much prefer things more bold. Not every good card, or every good deck, should be good at every task.

In practice, I believe that more tuning will lead to calls to even out these matchups, and for things to more approach the 50-50-for-all world far too often. Which also gets us to the next problem.

3E. Decks become scripted and dictated

This is a strange result. It seems backwards.

One would expect that without careful balance, whatever is strongest would emerge as strongest, and force everyone to use it.

Instead, what we actually see is that development teams effectively do things like ‘This is the second three drop for the U/G flying deck and that deck needs a little help there to deal with the three-damage mass removal spell we used to balance out the R/G deck, so let’s move it to 3/4.’

Once the decks people want to play, slash the game wants to create, are identified, they are given tools that are fitted to answer the other fitted tools in the other decks. Cards that other supported decks find hard to answer are weakened, cards that other supported decks are strong against are not.

The idea that ‘these are the things, and they should all be roughly equal’ is a natural thing to think, in and out of games. My Google News last week contained a Reddit post on Hearthstone entitled ‘we might have a problem’ because in the last week, the author had faced one hero 30% of the time. As conclusive evidence on its own. Out of nine possible! That is a highly toxic thing to consider an issue. Even in his stats, you could see that the second most played class was a natural answer to the most played one. This is one more reason to want to tie one’s hands. It’s also one more example of unbalanced matchups keeping things in check.

The result is that strategies end up with more and more clearly correct included cards, and less room for variety and creativity. There might be a lot of these scripted decks (e.g. Eternal has a ton of at least tier 2 options because of this) but each involves a core that’s been pre-selected, then a few choices between similarly strong filler cards slash adjustments to the curve or number of removal spells or such, all of which are solid. There is relatively little need or temptation to splash additional colors or include conflicting strategies to improve card quality or shore up weaknesses.

By contrast, when decks that are unexpected take the stage, often they have to play awkward cards that don’t work right but are necessary, or fill out requirements with cards that are outright bad because the deck wasn’t intentionally rounded out in that spot. Or they don’t have everything they naturally want so they are tempted to branch out into other colors or actions rather than give up card quality or flexibility.

Often those decks are the result of someone’s heroic effort, and/or a window where what is being played becomes unbalanced and opens things up to a not-as-naturally-strong idea. I don’t want to lose that.

When I returned to the Pro Tour, I came to Standard with a Blue/Black Faerie deck. As is strangely often the case, I was a set too early, as Bitterblossom was in the next set and did not yet exist. But with the use of Pendelhaven and a willingness to play eight 1/1 high flying one drops, my team had made the deck work, and it did quite well.

Before the tournament, head developer Aaron Forsythe asked to see the deck. I showed him.

When he asked who built the deck, I told him: You did.

There were exactly enough playable Faeries to make the deck work, so you played all of them. Then you played the obviously best spells to complement that. Then you were done. I’d put the deck together, but in a real sense, Aaron dictated what it looked like.

I was wrong that day. Wizards had, with the aid of Bitterblossom, built a different and much better deck. Mine was my own creation, and I came to realize this – I was making something where there wasn’t (yet) supposed to be anything. More and more, though, this has become a rarity. Which is a shame.

One could flip that on its head, of course. By forcing cards to lock in, we might force things to be heavy handed, to make sure the intended strategies work, whereas with adjustments we can see what people do and go from there in case we miss (or overshoot). But then, in the end, it will absolutely be the game’s people choosing what the decks look like.

4. Conclusion

How aggressively and frequently should games, especially digital card games, rebalance their components?

Different games have different economic structures and different goals, and want different things. My conclusion, having looked at this from many angles, is that each game should embrace a pattern. That pattern should either strongly favor rebalancing, or strongly favor not rebalancing. Games in the first section should use this tool to move their components around in interesting ways on a continuous basis. Games in the second section should do their best to avoid changing their components once they are set, and do so at most in emergencies where the alternative is a ban.

The biggest key for the first category is to avoid the trap of scaling back everything powerful and cool, and strengthening only things that are still not very good. The other key is to not be afraid to try stuff and be inconsistent. It’s totally fine to change a casting cost from four to six to three and then back to four, if that’s doing something interesting. Try stuff. See what happens. Build the expectation that players will need to roll with the punches, and that what they’ve learned could prove unhelpful.

The biggest key for the second category is that you’re operating without a net. The benefits are hard to get. If you mess up, the mistake is forever. That makes it hard to do interesting things without taking big, real risks. Rotation helps, but only goes so far. The idea of a public rebalancing period, before permanent ownership of cards (or allowing it, but with the known associated risks) and during which cards can be changed, may be a good one – but it also might allow the same kind of Goodhart’s Law problems discussed here in through the back door.

One can also think of rebalancing cards as a type of government intervention, with all its advantages and downsides. The intervention will almost always be well-meaning, and seek to respond to the needs of the people. It has a problem it is trying to help with. But it threatens people’s rights, property, and ability to know how things work or predict the future, and risks making things too much about what interventions are chosen. People focus on the next action, what it should be, how to influence it. If you do what is popular or solves short term concerns, too often, the result ends up gradually being more oppressive in ways that can be hard to see. Interventions tend to be in one direction, the government is loathe to look like an idiot or to contradict itself, and its moves become difficult to reverse. On the other hand, there are real problems that won’t solve themselves, and things that get sufficiently out of balance will stay out of balance and make people unhappy and worse off if not addressed.

[-]Raemon7y70

Just wanted to note, while I'd generally been enjoying this series, this post did a good job of prompting me to ask whether (both the issues in this post, and some raised elsewhere in the series) were more applicable outside of the realm of game design. Still mulling that over, not sure if it'll output anything useful yet.

In a weirder way, I also felt some sense of "this prompts me to figure out what my opinions actually are." A lot of games I enjoy in different ways and I'm sort of okay with experiencing them as a particular kind of novel art or something. I can tell what games grip me, but often that's less about my opinions and more about "did they do a good job skinner boxing". What makes a game that I deeply respect, for reasons somewhat idiosyncratic to me?

[-]Zvi7y70

Thanks, this is very helpful feedback. Request for more such notes from readers.

A lot of the motivation for writing it was, in fact, to figure out what my own opinions actually were.

I do think a lot of this has implications outside game design, and I was sad that I couldn't efficiently write this in a way that didn't bog it down in a lot of game-design-specific detail, which means it will be hard for those not into the detail to extract the implications unless I come back to them in another form.

[-]ryan_b7y20

I can see definite advantages to writing these posts and then using them as references for another post generalizing beyond game design. They will just be object-level context references instead of meta-level context references.

In fact of the two arrangements, I think the object-level references would be more useful to me.

[-]Raemon7y20

BTW it turns out the answer to the last question is "Minecraft", which I will now fairly confidently describe as "the best game." :P

LESSWRONG
LW