Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem I wrestle with is “comments lie.” Not all the time of course, but often enough that I often end up reading the code to make sure anyway. It’s those very “why?” and “how?” comments I’ve seen lie. My own have lied to me. As have my teammates. They go in at one point, the code evolves, the comments become misaligned. I’m not trying to justify no comments. I just wish there were a better solution to this very real issue.


I have experienced it, and never have I found it anywhere near as bad as no comments and "self documenting code".

IME it's mildly obvious when the comments are obsolete or out of whack. It's testing one hypothesis. Whereas with no comments, there are basically an infinite number of possible hypotheses as to what the code does.


I think the best solution is having a few tactical comments. Imo the best setup for a project is:

1. Some high-level documentation explaining how the key systems work and interact conceptually, to give context.

2. Comments for anything non-standard (i.e. we're using some weird encoding here instead of JSON because of XYZ reason)

3. Comments for the tricky extra complicated bits

For everything else, well-written code following standard conventions and good naming practices is enough.

Also choose a statically typed language, so that your function signatures will give you a useful contract to work with, and it will mostly be pretty easy to trace the intent of your code.


> Also choose a statically typed language, so that your function signatures will give you a useful contract to work with, and it will mostly be pretty easy to trace the intent of your code.

Furthering this, avoid:

- "Stringly typed programming": where everything has a static type, but they're all `String` (or `Int`, `Float`, etc. instead of something with domain meaning). This gives very little information to the reader, and makes it easy to mix-up values.

- "Boolean blindness": where we query a bunch of booleans to determine what data we can extract, e.g.

    Result::hasUserId : Boolean    Whether this result contains a UserId
    Result::getUserId : UserId     Returns a UserID if hasUserId; otherwise error

    log("Request came from " + myResult.hasUserId? myResult.getUserId : "anonymous guest")

Compare this to e.g.

    Result::getUserId : Option[UserId]

    log("Request came from " + myResult.getUserId.orElse("anonymous guest"))


This. I regard comments as something a bit negative. You use them if you have to but you see if you can write the code so you don't need them. If it's internal to my code I can almost always get rid of the comment, I end up with most of my comments referring to external matters.


I had the opposite experience. I never read a comment worth reading.

Reading code was always strictly better. Truthful and often easier to parse and understand.

And with code you can actually converse by putting in breakpoints, disabling parts of it and observing it what it is actually doing, how and infering why.

Documentation was always dead almost always outdated prose.

Even when it was up to date and accurate I often understood what the person writing a comment actually meant only after I read and understood the code it was referring to.


> infering why

To me, this is where the most valuable comments are. The more that your code is a representation of business logic, the more you have weird "whys" to fill in.

Any kind of plan that involves everyone on the team inferring the same "why" is a terrible plan in my book.


Perhaps it comes down to a preference of reading code depth-first vs. breadth-first. Reading code only is best for depth-first I guess. Suppose function A has a call graph that is 3-6 levels deep, with 3 immediate children.

If you want to read it breadth-first, you'll want some context as to what the three children do. If you want to read it depth-first, the comments are somewhat redundant, because you're going down to the source anyway.

I rarely have a desire (and even less often a strict need) to understand the whole call graph of a function. If it works fine, tests pass, and I want to understand / change one specific behaviour, I need to grok a slither of the entire possible call graph. Comments help guide where I'm going then. If grokking the whole thing, I see how comments are at best a waste of space.


I think that if you want “what the code does”, detailed description in the commits is much better, since then you can at least see the snapshot with its original context.

You can even go through a particular file’s associated changes to see how it’s morphed over time.


This is an issue of bad habits. "I am just quickly going to change this tiny, little thing" — you are saying to yourself and forget about the existing comments.

Some tips for better habits:

- whenever changing existing code, read the comments

- if you know the changes are going to be big, just delete the old comment first

- if the changes are minor, but change the code add a TODO: check comments on top of the existing comments

- try to keep comments tied to the thing you are commenting, comments that describe interactions between components go to the outside scope, this way there are less surprises where comments need to be updated

- consider writing comments describing what you are going to do before writing the code. On top of writing the comments this acts as an additional reality check and quite often leads to better code as well


Why cant we have comments that are tied to a hash of the block of code the comments are refering to?

Have editor plugins that highlight if the hash no longer matches, then you can evaluate if the comment needs to be updated or not and update the hash.

This would eliminate the problems raised about working on teams and people not following procedures.


This is fine if you're the only person to ever touch this code. But if other people will modify the code you can generally not rely on them following your idea of "good habits"; therefore it's sensible to help them not shoot themselves in the foot.


> This is fine if you're the only person to ever touch this code. But if other people will modify the code you can generally not rely on them following your idea of "good habits";

Teams of two or more individuals can establish norms with only a little more difficulty than individuals, and can provide accountability to those norms better than an individual.


It should be easier to keep comments in code multiple work on up to date, since there would be code review that lets everyone keep each other honest.


Unless you are editing somebody elses code, any team working on code can (and in my opinion: should) agree on some common standards. If you are not taking a day to discuss this, you are saving time in the wrong space.


The solution is to just let it be. A why comment encodes the programmer's belief at the time. This is very useful.

Just because the comment is at odds with what the code is doing doesn't mean that the code is right. The code wins by force because the comment isn't executable.

Sometimes the comment is at odds with the code because the code should be the way the comment says. It's not that way because (1) someone changed it. Or (2) it never was that way; the comment is just wishful thinking.

If you want to know which of those two it is, you do the archaeology.


I'm more concerned with the fact that code lies. Everyone knows comments lie. Only some people know that code lies, and the biggest liars don't seem to get what the big deal is. I understand the code, they'll say, and then give a several minute explanation about how everyone else could 'easily' understand it.

Nobody wants to memorize your code. And you shouldn't want to either. You can't memorize code that other people are contributing regularly to. So either you won't know it when you come back, or you're subtly pressuring everyone to move their changes outside of your code, and leaving your code alone. Even if it has a choke hold on data flow in the system.

But that almost misses the point, which is that yes, code that doesn't look like what it actually does can cause people to miss the real problem. But code that looks like it's a potential source of the problem being investigated steals attention from the real culprit. And no amount of memorization is going to completely solve that problem. I think people mistake the notion of 'code smells' as the act of an overly fastidious mind, but the neat freak is forever asking the slob "how do you find anything in this mess?" and that problem plays out in code, but amplified. Like if you were looking for your credit card and it wasn't a matter of whether you put your wallet where it 'belongs' or if it was in your bedside table, but instead you have 9 wallets scattered around your room full of customer loyalty cards.


I saw a lovely example of this a few months agp. We had a list of ids representing completely different business cases, and code like this:

  // Because of [sane business reason] we need to remove all 89's
  something.removeall(75);
I found this because people in case 75 were logging a never ending stream of tickets for months that the program was doing weird things. No shit, sherlock. Talk to business, everyone agrees 75 makes no sense at all and it really really should be 89. So before we change this back to sanity, let's check the git history:

  Date: 3 years ago
  Message: Urgent fix for major downtime.  (No detail of course)
  Commit by: someone who left a little bit later 
  Change: 1 line patch, changes 89 to 75
Asking around, nobody technical knows anything anymore, the whole team got laid off around that same time 3 years ago and the new team basically rediscovered everything from scratch. Some business people do remember it, it was Very Bad, they don't know exactly what it was or how it got fixed, but anything that might cause it to happen again is strictly forbidden.

What would you do? I got reorganized to another team, lucky me.


This is a great example of why documentation is important though. If that comment had documented why 75 had been removed and what the impact of that was you would know how to address this situation.

In fact, even the fact that the comment is now wrong tells you that the change was ill-considered and needs to be revisited.


I have to disagree here.

Any fix should produce at least a test to guard against regressions of the bug it's trying to fix and encode the correct business logic.

The test name would then explain the why. The test variables present the story of the what. And finally, the API calls tell the story of how how.


Most commonly, the comments end up lying about the "what". Because so much of the "what" comments are boilerplate code that we unconsciously tune out, it's the first to go stale and out of date since we also don't remember to change them when the "what" changes (and also it's a two-place change, which always falls out of sync).

Next up is the "how", although it's not as likely as the "what" since it's not as common (as it should be - you don't need to write a "how" about every list iteration).

The least likely to be a lie is the "why", since the circumstances leading to a "why" rarely change, and when they do, you usually end up replacing the whole section.

And so, your "what" should be almost entirely in self-documenting code, your "how" should be used judiciously, and your "why" probably won't be a problem.

The more unnecessary comments you write, the greater the risk of them going stale due to maintainers glossing over them. When they're a rarity, you take more notice.


Like TFA suggests, right at the end, this is where code review shines. Keeping documentation valuable is a discipline challenge, and we all fail from time to time at keeping ourselves disciplined. If you have team agreement on the level of documentation that gives good ROI for time spent maintaining it, you can rely on your team to help catch your slips just as they can rely on you to help catch theirs.

Using something like ADRs[0] can help provide the structure the team can look for when it comes to "why" documentation. I recommend a standardised README across projects. If it's "what" comments look for language tooling that will execute documented examples as test cases. If it's "how" then make sure there's value in it - do you have disaster recovery exercises, does your incident response team use your "how" documentation to triage issues, do stakeholders outside your development team get consulted on contextual changes in your application, etc.

[0]: https://brunoscheufler.com/blog/2020-07-04-documenting-desig...


Good "why" documentation can sometimes be fairly localized though. ADRs are great for overarching design "whys", but are not really the best tool for highly localized "why" documentation.

For example: sometimes you may have a codebase where whenever "X" is done, there is an automatic retry mechanism of failure. But then you have one spot where there is no automatic retry. And it turns out there is a very subtle and non-obvious reason why an automatic retry here would be a problem.

The code then deserves a comment explaining what this subtle issue is, and and why it means automatic retries must be avoided here. Without such a comment, the next person to touch the code may well just assume the automatic retrying was forgotten, and add it, causing say the painful data corruption bug to return.


Even outdated comments I still find useful in that they tell me how things used to work, and why they used to work that way.

Which is important context for figuring out how it works now.


It is important tho to note that wrong documentation is significantly worse than no documentation.


I disagree with that. Sure, wrong documentation can mislead and confuse, but it can also be very helpful.

Lying comments have often helped me because it's often useful to know that a particular person at a particular point in time believed something to be true, even if it isn't true any more and even if it were never true. In a pile of spaghetti code, a good lying comment can point to the needle in the haystack where the root cause of a bug is hiding.


I have not found that to be the case. usually it’s easy to tell when documentation is outdated, and then i just use it as reference to figure out how it works now.


Maybe not so easy. I would say every documentation page needs a "Last Modified" date.

Similarly, in a production environment, I found that when there was no time to keep the docs up to date with the latest changes, I marked them on the cover page as PRELIMINARY just to put readers on alert. And so, similarly, should a document not getting the attention it needs be marked as NOT WELL MAINTAINED ? I get the feeling this idea would not fly. Not in a commercial environment anyways.


Shouldn't code-reviews make sure that comments are relevant?


Theoretically, but humans are less reliable than a compiler at catching issues.


I guess, in an ideal world, our tools could ~pair code and comments introduced adjacent to each other in the same commit, and then ~visually surface how many commits have touched the code without touching the comment (or whether it has been orphaned).


I've always wished you could put function tests in the comments that would compile time evaluate (or at least let you run a localized version of just that function), and could then WARN YOU when you've broken parity with whatever test/example you put together right there.

Yes a unit test will show you that something is changed, but often it's the comments that don't get updated.

Hell maybe just let us flag the comments that will need to be changed if X or Y unit test fails. Just some way to better link comments to code adjustments.


If your code is versioned, it's still better to have outdated documentation with an obvious older timestamp than no documentation.


Apparently-meaningful identifiers in code can also mislead and lie, to precisely the same extent as they can inform.


By that same token, code can lie too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: