Skip to content

Instantly share code, notes, and snippets.

@brynary
Created February 17, 2014 14:36
Show Gist options
  • Save brynary/21369b5892525e1bd102 to your computer and use it in GitHub Desktop.
Save brynary/21369b5892525e1bd102 to your computer and use it in GitHub Desktop.
Why You Should Never Try For All A's on Code Climate

Sometimes, when people use Code Climate, they try to make changes in response to all of the issues it reports, in order to achieve straight A's and/or a 4.0 GPA. We do not recommend using Code Climate in this way. This is an attempt to articulate why.

Today, Code Climate primarily reports smells. Smells are "symptoms of your code that possibly indicate a deeper problem." They are not necessarily problems themselves. Only a human programmer can decide if a piece of code is maintainable, because it's humans who have to maintain code.

The best analogy I've heard, is to use code metrics the way a doctor uses vital signs. They are not a diagnosis, but they can help make one in some cases. Other times the most appropriate course of action is to "do nothing" even though a vital sign may be abnormal. Like a doctor: first, do no harm.

In addition to the smells themselves, Code Climate aggregates an A-F rating for each class, as well as a GPA for each repo (which is simply the ratings weighted by lines of code). We recommend interpreting the A-F ratings as such:

  • A's and B's are good
  • Treat C's as a caution flag.
  • Avoid (and sometimes fix, depending on the context) D's and F's

The insight here is that while individual smells are sometimes not issues that need addressing, in aggregate they are pretty good indicators. Most, most people feel like they'd have trouble maintaining code in classes/files that Code Climate scores as D's or F's. On the other hand, most people feel like they do not have trouble maintaining code scored as A's and B's.

What is a good Code Climate score?

For a large app, over a year old, under active maintenance a score of 3.5 or better is great.

Note: Overall Code Climate averages skew higher than that, because we host a lot of small projects (e.g RubyGems). Smaller projects tend to be more maintainable and also have higher Code Climate GPAs.

What does Code Climate score on Code Climate?

Our main app scores a 3.2. Our "worker" scores a 3.0.

If A's and B's are both fine, why have B's at all?

Good question. Maybe we should get rid of them. They are primarily there because an A-F scale felt most understandable, and it includes a B between A and C.

Why does a small change to the code sometimes cause a grade to change?

We call this "the camel problem". As in, "the straw that broke the camel's back". Code Climate rescores the entire class every time it updates, so the size of a grade or GPA change is not connected to the size of the change made.

It is very common for bad code to accumulate through lots of small, individually justifiable changes. At some point Code Climate throws a flag out. In those cases, it is not a reflection on the particular change that was made, but an overall warning about the area of code itself. We recommend taking a step back and evaluating the situation holistically in these instances.

Why not improve Code Climate to report only smells I care about?

In cases where the algorithm can be changed to be clearly more accurate, we will do that. (Although these updates take a fair amount of time for us to roll out carefully.) An example of this would be penalization of Symbol#to_proc in Ruby. This was never particularly intended, and this is now a popular Ruby idiom (one we adhere to ourselves). The penalty for it is vestigial.

Other cases are less clear. For example, Code Climate's tendency to flag long methods is too sensitive for some (and too generous for others). The core problem is that there is a tension between providing early warnings about developing maintainability issues and detecting only issues worth taking action on.

If we make Code Climate more strict, it will report more things that do not, in the judgement of humans, require action to be taken. On the other hand, if we make it less strict (for example so it only reports BIG problems that almost certainly require action), we won't be providing information until it's too late. Code Climate today can help you avoid ever introducing BIG problems because it will start to warn you early (by marking something as a C, for example).

The current system is a reflection of the balance between these priorities.

Why not let me override Code Climate in specific cases to tell it a piece of code is "as intended"?

Good question. We may end up doing this. However, enabling someone to manipulate their Code Climate scores is both complex as well as risky.

For example, one of our most important use cases is within a team. In those contexts, you have a mix of experiences. In that case, if one programmer were to mark an issue as "wontfix" (and the GPA went up as a result), that issue would be hidden from other people on the team. This would impair the ability of others on the team to use Code Climate to review the code in question (because it would have been changed to report all A's).

Also, when hiring a new developer, they would not be to as easily explore Code Climate and learn about the app.

Note: Interestingly, Issues reported by our Rails Security Monitor are treated differently. For security issues, there is generally a deterministic answer (agreeable across teams and in general) as to whether it is real. So in those cases, we do provide a way for someone to mark something as a false positive if it is not a real vulnerability.

@sferik
Copy link

sferik commented Feb 17, 2014

@brynary Thanks for replying.

This is certainly not how we feel about it.

I know. That’s just how it comes across. I know you guys care a lot. 😄

@etagwerker
Copy link

I agree with @chrismdp on the coloring solution.

@brynary
Copy link
Author

brynary commented Feb 17, 2014

Worth noting. B's are green already today. :)

@sferik
Copy link

sferik commented Feb 17, 2014

@brynary light green

@sferik
Copy link

sferik commented Feb 17, 2014

One more thought: in my opinion, the “Churn vs. Quality” scatterplot is the most useful feature on Code Climate. Basically, it tells me whether I need to refactor and where I should start (the upper right). Why is this chart tucked away on the last page? This is much more useful than the “Classes by Rating” donut chart on the home page, especially if you don’t want people to focus too much on a 4.0 GPA.

@loren
Copy link

loren commented Feb 17, 2014

The article as a whole makes good sense and it's clear you have weighed the pros/cons of the grading system thoughtfully, but I think the title of the piece is causing some trouble. Perhaps it should be "Why You Should Never Try For Anything at All on Code Climate". I should try to write clear/maintainable code, but I have my bad days and CC nudges me when something smells. It's up to my team to decide what policy to create around that signal (e.g., if you commit code that lowers the GPA, make sure you have a reason). Maybe we enjoy maintaining that 30-line block of Sunspot DSL magic.

No matter what metric you assign to code (a letter grade, a smiley face, a color, a floating point number), someone will strive to raise that score on the next commit unless it's clear to them what they ought to be doing instead.

Changes I would make to this piece:

  1. Rip out Q&A and stick it somewhere else (echoing @tpitale).
  2. Adjust the rest of the piece to focus on a few healthy ways people ought to use CC in their flow, versus the one way they shouldn't be using it. Title accordingly.

@danielmorrison
Copy link

I think 4.0 is a tough, impractical goal for an existing project, but if you haven't been writing tests from the beginning, so is reaching 100% coverage. If you didn't have any tests, then you'd be pretty happy getting up to 10%.

That said, we have a few real, production apps with 4.0-3.95 scores, and others with ~3.5. The reason these are so high is that we started them with CodeClimate. Once you start with 4.0, you start to be critical of code that takes your score down. Sometimes staying at 4.0 isn't worth it, but it often doesn't take much work.

So I'd say improving your score, at any stage, is the goal. If your codebase comes in at 2.0, that's ok, just work on getting it up toward 2.5. Even that will be hard on a big legacy codebase. But for new projects, you should have higher expectations.

@brynary
Copy link
Author

brynary commented Feb 18, 2014

@loren and @danielmorrison -- Thanks for all of that feedback. Agreed on all accounts.

@brycesenz
Copy link

I care less about the particular rating system that Code Climate uses, and more about just whether or not it's helping me make good decisions about how to prioritize which parts of my code to restructure/refactor. I wish I had the luxury of tackling "B" classes, but I don't since there's almost always something more pressing.

I don't know if you're going to be able to come up with a single metric that can capture all of the reasons why code might need addressing. Honestly, I don't even know if you should. A good example would be test coverage - "A" code with zero test coverage and "F" code with 100% test coverage both need addressing, but for very different reasons. A meta-metric that ranks these examples as both Cs is going to make the situation less clear, not more.

In my mind, there's more to be gained by adding complementary tools (e.g UML diagrams) that broaden one's perspective on his/her code than in trying to optimize what is a useful-if-imperfect scoring system.

@lightcap
Copy link

I think it's a great bit of work. Well done. I love that you're thinking so critically about the side effects and consequences of the way that people are using Code Climate.

For what it's worth my team and I use Code Climate almost entirely to watch trends. For old projects or rescue projects we use it to ensure that there is a consistent improvement in the metics, while for new projects it helps us keep code quality high from the beginning. And, really, it's all about thinking critically about your code.
So, the way we use Code Climate, I'd much rather see more emphasis on trending and less on the current snapshot of the code. Context is crucial, focusing on the trend rather than the current score may help people use the product more effectively.

Thanks for posting. (And thanks for Code Climate, too).

@leenasn
Copy link

leenasn commented Feb 18, 2014

Thanks for sharing this @brynary, its really useful. Especially the explanation about interpretation about the ratings and your recommendation on what a good Code Climate should be.

Can you also write on how to use/interpret the test coverage report too? That would be really useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment