Sometimes, when people use Code Climate, they try to make changes in response to all of the issues it reports, in order to achieve straight A's and/or a 4.0 GPA. We do not recommend using Code Climate in this way. This is an attempt to articulate why.
Today, Code Climate primarily reports smells. Smells are "symptoms of your code that possibly indicate a deeper problem." They are not necessarily problems themselves. Only a human programmer can decide if a piece of code is maintainable, because it's humans who have to maintain code.
The best analogy I've heard, is to use code metrics the way a doctor uses vital signs. They are not a diagnosis, but they can help make one in some cases. Other times the most appropriate course of action is to "do nothing" even though a vital sign may be abnormal. Like a doctor: first, do no harm.
In addition to the smells themselves, Code Climate aggregates an A-F rating for each class, as well as a GPA for each repo (which is simply the ratings weighted by lines of code). We recommend interpreting the A-F ratings as such:
- A's and B's are good
- Treat C's as a caution flag.
- Avoid (and sometimes fix, depending on the context) D's and F's
The insight here is that while individual smells are sometimes not issues that need addressing, in aggregate they are pretty good indicators. Most, most people feel like they'd have trouble maintaining code in classes/files that Code Climate scores as D's or F's. On the other hand, most people feel like they do not have trouble maintaining code scored as A's and B's.
For a large app, over a year old, under active maintenance a score of 3.5 or better is great.
Note: Overall Code Climate averages skew higher than that, because we host a lot of small projects (e.g RubyGems). Smaller projects tend to be more maintainable and also have higher Code Climate GPAs.
Our main app scores a 3.2. Our "worker" scores a 3.0.
Good question. Maybe we should get rid of them. They are primarily there because an A-F scale felt most understandable, and it includes a B between A and C.
We call this "the camel problem". As in, "the straw that broke the camel's back". Code Climate rescores the entire class every time it updates, so the size of a grade or GPA change is not connected to the size of the change made.
It is very common for bad code to accumulate through lots of small, individually justifiable changes. At some point Code Climate throws a flag out. In those cases, it is not a reflection on the particular change that was made, but an overall warning about the area of code itself. We recommend taking a step back and evaluating the situation holistically in these instances.
In cases where the algorithm can be changed to be clearly more accurate, we will do that. (Although these updates take a fair amount of time for us to roll out carefully.) An example of this would be penalization of Symbol#to_proc
in Ruby. This was never particularly intended, and this is now a popular Ruby idiom (one we adhere to ourselves). The penalty for it is vestigial.
Other cases are less clear. For example, Code Climate's tendency to flag long methods is too sensitive for some (and too generous for others). The core problem is that there is a tension between providing early warnings about developing maintainability issues and detecting only issues worth taking action on.
If we make Code Climate more strict, it will report more things that do not, in the judgement of humans, require action to be taken. On the other hand, if we make it less strict (for example so it only reports BIG problems that almost certainly require action), we won't be providing information until it's too late. Code Climate today can help you avoid ever introducing BIG problems because it will start to warn you early (by marking something as a C, for example).
The current system is a reflection of the balance between these priorities.
Good question. We may end up doing this. However, enabling someone to manipulate their Code Climate scores is both complex as well as risky.
For example, one of our most important use cases is within a team. In those contexts, you have a mix of experiences. In that case, if one programmer were to mark an issue as "wontfix" (and the GPA went up as a result), that issue would be hidden from other people on the team. This would impair the ability of others on the team to use Code Climate to review the code in question (because it would have been changed to report all A's).
Also, when hiring a new developer, they would not be to as easily explore Code Climate and learn about the app.
Note: Interestingly, Issues reported by our Rails Security Monitor are treated differently. For security issues, there is generally a deterministic answer (agreeable across teams and in general) as to whether it is real. So in those cases, we do provide a way for someone to mark something as a false positive if it is not a real vulnerability.
I really like the Q&A sections. They provide a lot of great information. However, I'm not sure how they relate to the original premise of why you shouldn't aim to have a perfect 4.0 GPA.
In that vein, I think you could speak more about how code metrics of all kinds (coverage is a common example) are not intended to be made perfect. That's a waste of time, there's no business value, it may actually lead to less maintainable or more brittle code, etc.
You could also have more of a section about what to do with the D/F graded stuff. Those are places to focus more. Start from the bottom, the big problems. More importantly, if you have to work in a D/F file/class on a feature, that is the time to work to improve it.
As for the Q&A, you could maybe make that into a whole distinct article. :-)