Skip to content

Instantly share code, notes, and snippets.

@jrafanie
Created May 22, 2016 15:15
Show Gist options
  • Save jrafanie/0d6740b29afed7ae6213e5d186c157d6 to your computer and use it in GitHub Desktop.
Save jrafanie/0d6740b29afed7ae6213e5d186c157d6 to your computer and use it in GitHub Desktop.
static analysis idea: where exactly was complexity added in the last sprint/release
I have a belief that code duplication, while a useful metric, is often done
because of code complexity. In other words, complicated code is more likely to
be duplicated. It doesn't mean that simple code can't be duplicated but that
it's less likely and less of a problem. Along with this thinking, I'm asserting
that if you limit complexity increase spikes, you will slowly remove duplication
with it. Therefore, it might be less useful to target duplication except as a
way to measure the impact of decreased complexity on duplication.
With a large legacy codebase, it's nearly impossible to remove all complexity.
One, it takes many resources many hours/days/weeks/months to do this. Two, you'd
have to stop everything you're doing if you want to do this as it would be hard
to make large changes all over the place while also trying to ship features.
Three, with legacy codebases, you tend to have areas of churn, code that is
frequently changed and other code that is infrequently or never changed.
Changing code that is infrequently modified is a wasted effort because people
aren't spending that much time dealing with the complexity of these files,
classes, modules.
With this in mind, we have tooling for ruby/javascript/etc. and git projects to
try to improve the visibility of this problem.
First, you need to calculate the time period you care about. Perhaps, the time
period from the current release to the prior release, or maybe sprint boundaries:
commit_range = release4_git_sha1...release5_git_sha1
We do the last release or last N releases because we want to eliminate the noise
of code we haven't been changing.
Take this commit_range and divide it up into batches.
Process each batch of commits:
batch.commits.each do |commit|
commit.files_changed.each do |file|
calculate_net_lines_added_removed(file)
increment_commit_count(file)
# Runs flog/complexity before: git show sha1~:/path_to_file
# Runs flog/complexity after: git show sha1:/path_to_file
calculate_net_complexity_added_removed(file)
methods_changed.each do |method|
increment_commit_count(method)
calculate_net_complexity_added_removed(method)
end
class_modules_changed.each do |class_module|
increment_commit_count(class_module)
calculate_net_complexity_added_removed(class_module)
end
# track author for managers/leads to see who needs help refactoring,
# pairing, etc.
end
end
end
Note, you'll have to deal with negative complexities and lines for people like
me who prefer to remove lines of code.
Compile all the batch results.
Graph net complexity added per file, method, class/module, author
Graph net complexity per line added per file, method, class/module, author
Graph commit counts vs. complexity added per file, method, class/module
Run this on each sprint, release, etc. boundary and try to find patterns.
Learn from the results, teach and mentor where you have spikes of increased
complexity. Based on these results, you may now have new ways to measure
pull requests to detect these problems before they're merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment