Sometimes, when people use Code Climate, they try to make changes in response to all of the issues it reports, in order to achieve straight A's and/or a 4.0 GPA. We do not recommend using Code Climate in this way. This is an attempt to articulate why.
Today, Code Climate primarily reports smells. Smells are "symptoms of your code that possibly indicate a deeper problem." They are not necessarily problems themselves. Only a human programmer can decide if a piece of code is maintainable, because it's humans who have to maintain code.
The best analogy I've heard, is to use code metrics the way a doctor uses vital signs. They are not a diagnosis, but they can help make one in some cases. Other times the most appropriate course of action is to "do nothing" even though a vital sign may be abnormal. Like a doctor: first, do no harm.
In addition to the smells themselves, Code Climate aggregates an A-F rating for each class, as well as a GPA for each repo (which is simply the ratings weighted by lines of code). We recommend interpreting the A-F ratings as such:
- A's and B's are good
- Treat C's as a caution flag.
- Avoid (and sometimes fix, depending on the context) D's and F's
The insight here is that while individual smells are sometimes not issues that need addressing, in aggregate they are pretty good indicators. Most, most people feel like they'd have trouble maintaining code in classes/files that Code Climate scores as D's or F's. On the other hand, most people feel like they do not have trouble maintaining code scored as A's and B's.
For a large app, over a year old, under active maintenance a score of 3.5 or better is great.
Note: Overall Code Climate averages skew higher than that, because we host a lot of small projects (e.g RubyGems). Smaller projects tend to be more maintainable and also have higher Code Climate GPAs.
Our main app scores a 3.2. Our "worker" scores a 3.0.
Good question. Maybe we should get rid of them. They are primarily there because an A-F scale felt most understandable, and it includes a B between A and C.
We call this "the camel problem". As in, "the straw that broke the camel's back". Code Climate rescores the entire class every time it updates, so the size of a grade or GPA change is not connected to the size of the change made.
It is very common for bad code to accumulate through lots of small, individually justifiable changes. At some point Code Climate throws a flag out. In those cases, it is not a reflection on the particular change that was made, but an overall warning about the area of code itself. We recommend taking a step back and evaluating the situation holistically in these instances.
In cases where the algorithm can be changed to be clearly more accurate, we will do that. (Although these updates take a fair amount of time for us to roll out carefully.) An example of this would be penalization of Symbol#to_proc
in Ruby. This was never particularly intended, and this is now a popular Ruby idiom (one we adhere to ourselves). The penalty for it is vestigial.
Other cases are less clear. For example, Code Climate's tendency to flag long methods is too sensitive for some (and too generous for others). The core problem is that there is a tension between providing early warnings about developing maintainability issues and detecting only issues worth taking action on.
If we make Code Climate more strict, it will report more things that do not, in the judgement of humans, require action to be taken. On the other hand, if we make it less strict (for example so it only reports BIG problems that almost certainly require action), we won't be providing information until it's too late. Code Climate today can help you avoid ever introducing BIG problems because it will start to warn you early (by marking something as a C, for example).
The current system is a reflection of the balance between these priorities.
Good question. We may end up doing this. However, enabling someone to manipulate their Code Climate scores is both complex as well as risky.
For example, one of our most important use cases is within a team. In those contexts, you have a mix of experiences. In that case, if one programmer were to mark an issue as "wontfix" (and the GPA went up as a result), that issue would be hidden from other people on the team. This would impair the ability of others on the team to use Code Climate to review the code in question (because it would have been changed to report all A's).
Also, when hiring a new developer, they would not be to as easily explore Code Climate and learn about the app.
Note: Interestingly, Issues reported by our Rails Security Monitor are treated differently. For security issues, there is generally a deterministic answer (agreeable across teams and in general) as to whether it is real. So in those cases, we do provide a way for someone to mark something as a false positive if it is not a real vulnerability.
My overall impression is that this reads like a long, thoughtfully-considered excuse.
I believe Code Climate could be refined to the point where its recommendations would align very closely with human judgement. Maybe not 100% of the time but some percentage approaching 100. Maybe the upper limit is 95%, maybe it’s 92%, maybe it’s 81.375%. To me, this essentially says, “Since we’re pretty sure we’ll never reach 100%, we’re not gonna bother trying to reach that upper limit.” 😞
Let’s say a doctor measures your blood pressure to be 130/85. This reading is not indicative of hypertension (which starts around 140/90) but it does indicate prehypertension. The prescription for prehypertension is typically a change in lifestyle and diet: do more exercise, eat less salt, drink less alcohol. There’s just one problem: the sphygmomanometer was broken. Your blood pressure is actually normal (110/70). There’s no medical reason for you to do more exercise, eat less salt, or drink less alcohol (at least as far as hypertension is concerned). Following this recommendation may even cause harm (e.g. an iodine deficiency as a result of reducing salt intake). In my opinion, this analogy is closer to the current state of Code Climate’s instruments. They are broken, they are giving false positive reading, and they need to be fixed. As you said: first, do no harm.
This is crazy town (and you’re the sheriff). The whole point of the American-style grading systems is to incentivize students to strive for the highest grades. What’s the point of copying this system and then telling people not to strive for a 4.0 GPA?