Is it worth refactoring yyyymmdd
to currentDate
? I think that there are two ways to look at it.
You can zoom in and ask yourself questions about whether such a refactor will actually have a business impact. Will it improve velocity? Reduce bugs? Sure, currentDate
might be slightly more descriptive, but does it really move the needle? How long does it take to figure out that yyyymmdd
refers to a date? A few seconds, maybe? Won't it be pretty obvious given the context? Shouldn't your highly paid, highly intelligent engineers be smart enough to put two and two together? Did we all just waste 30 seconds of our lives talking about this?
The other way of looking at it is to zoom out. How do you feel when you work in codebases where the variable names are slightly confusing? It slows you down, right? Often times you legitimately can't put two and two together. And there are times when it leads to bugs. Right?
It's interesting how two different viewpoints − zoomed in vs zoomed out − can produce wildly different answers to essentially the same question: do the costs of investing in code quality outweigh the benefits? When you zoom in, eg. to a single variable name, unless the code is truly awful, it usually doesn't seem worth it. The answer is usually, "it's not that bad, developers will be able to figure it out". But when you zoom out and look at the entirety of a codebase, I think the answer is usually that working in messy codebases will have legitimate, significant impacts on things like velocity and bugs, and it's worth taking the time to do things the right way.
What's going on here? Is this a paradox? Which is the right answer? To answer those questions, let's talk about something called the planning fallacy.
The Denver International Airport opened sixteen months later than scheduled, with a total cost of $4.8 billion, over $2 billion more than expected.
− https://en.wikipedia.org/wiki/Planning_fallacy#Real-world_examples
When estimating things, people usually zoom in. "Build an airport in Denver? Well, we just have to do A, B, C, D, E and F. Each should take about six months and $500M, so overall it should be three years and $3B." The problem with this is… well… the problem is that it just never works. You always forget something. And the individual components always end up being more complicated than they seem. Just like when you think dinner will be ready in 30 minutes.
So what can you do instead? Well, how long have similarly sized airports taken to build in the past? Ten years and $10B? Hm, if so, maybe your estimate is off. Sure, your situation is different from those other situations, but you can adjust upwards or downwards using the reference class of the other airports as a starting point. Maybe that brings you from 10 to 8 or 10 to 7, but probably not 10 to 3.
How does this relate to code quality? Well, I think that something similar is going on. When you zoom in and take the inside view, it looks like everything will be good. But when you zoom out and take the outside view, you realize that messy codebases usually cause significant problems. Is there a good reason to believe that your codebase is a special snowflake where messiness won't cause significant problems? Probably not.
I feel like I'm being a little bit dishonest here. I don't want to hype up the outside view too much. In practice, inside view thinking also has it's virtues. And it makes sense to combine inside view thinking with outside view thinking. Doing so is more of an art than a science, and something that I am definitely still developing a feel for.
I think that certain things lend themselves more naturally to inside view thinking, and others lend themselves more naturally to outside view thinking. For example, coming up with startup ideas or scientific theories are both good fits for inside view thinking, IMHO. On the other hand, code quality feels to me like something that is a great fit for the outside view. And so, that's the viewpoint that I favor when I think about whether or not it is worthwhile to invest in.
I'm aware that the
currentDate
versusyyyymmdd
thing is only an example, but I'm not sure it's a good example because it's not obvious to me thatcurrentDate
is necessarily better.If this thing is a string describing the current date then there are at least two separate pieces of information you might want the name to communicate. One is that it's the current date rather than some other date. The other is that it's in
yyyymmdd
format rather than some other format.Whether
currentDate
oryyyymmdd
is more informative depends on (1) which of those two things is easier to infer from context (e.g., maybe this is a piece of software that does a lot of stuff with dates in string form and they're alwaysyyyymmdd
; or maybe the only date it ever has any reason to consider is the current date) and (2) which of them is more important in the bit of code in question (e.g., if what you're doing is working out which month it is, that operation is the same whether you're dealing with today's date or something else, but it depends a lot on the format of the input).It might actually be better in some cases to call the variable something like
yyyymmdd_now
orcurrentDate_ymd8
(the latter only makes sense if in your code there are a few different string formats in use for some hopefully-good reason (maybe you need to interoperate with multiple other bits of date-handling software), so that giving them codenames makes sense).Ah. I didn't even notice that but that's a great point. I also think that
yyyymmdd
suggests no separators.