Created
May 22, 2016 15:15
-
-
Save jrafanie/0d6740b29afed7ae6213e5d186c157d6 to your computer and use it in GitHub Desktop.
static analysis idea: where exactly was complexity added in the last sprint/release
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| I have a belief that code duplication, while a useful metric, is often done | |
| because of code complexity. In other words, complicated code is more likely to | |
| be duplicated. It doesn't mean that simple code can't be duplicated but that | |
| it's less likely and less of a problem. Along with this thinking, I'm asserting | |
| that if you limit complexity increase spikes, you will slowly remove duplication | |
| with it. Therefore, it might be less useful to target duplication except as a | |
| way to measure the impact of decreased complexity on duplication. | |
| With a large legacy codebase, it's nearly impossible to remove all complexity. | |
| One, it takes many resources many hours/days/weeks/months to do this. Two, you'd | |
| have to stop everything you're doing if you want to do this as it would be hard | |
| to make large changes all over the place while also trying to ship features. | |
| Three, with legacy codebases, you tend to have areas of churn, code that is | |
| frequently changed and other code that is infrequently or never changed. | |
| Changing code that is infrequently modified is a wasted effort because people | |
| aren't spending that much time dealing with the complexity of these files, | |
| classes, modules. | |
| With this in mind, we have tooling for ruby/javascript/etc. and git projects to | |
| try to improve the visibility of this problem. | |
| First, you need to calculate the time period you care about. Perhaps, the time | |
| period from the current release to the prior release, or maybe sprint boundaries: | |
| commit_range = release4_git_sha1...release5_git_sha1 | |
| We do the last release or last N releases because we want to eliminate the noise | |
| of code we haven't been changing. | |
| Take this commit_range and divide it up into batches. | |
| Process each batch of commits: | |
| batch.commits.each do |commit| | |
| commit.files_changed.each do |file| | |
| calculate_net_lines_added_removed(file) | |
| increment_commit_count(file) | |
| # Runs flog/complexity before: git show sha1~:/path_to_file | |
| # Runs flog/complexity after: git show sha1:/path_to_file | |
| calculate_net_complexity_added_removed(file) | |
| methods_changed.each do |method| | |
| increment_commit_count(method) | |
| calculate_net_complexity_added_removed(method) | |
| end | |
| class_modules_changed.each do |class_module| | |
| increment_commit_count(class_module) | |
| calculate_net_complexity_added_removed(class_module) | |
| end | |
| # track author for managers/leads to see who needs help refactoring, | |
| # pairing, etc. | |
| end | |
| end | |
| end | |
| Note, you'll have to deal with negative complexities and lines for people like | |
| me who prefer to remove lines of code. | |
| Compile all the batch results. | |
| Graph net complexity added per file, method, class/module, author | |
| Graph net complexity per line added per file, method, class/module, author | |
| Graph commit counts vs. complexity added per file, method, class/module | |
| Run this on each sprint, release, etc. boundary and try to find patterns. | |
| Learn from the results, teach and mentor where you have spikes of increased | |
| complexity. Based on these results, you may now have new ways to measure | |
| pull requests to detect these problems before they're merged. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment