Last active
August 25, 2020 14:17
-
-
Save JeffreyMFarley/52aa1ce6398595aadf10247b05ed4113 to your computer and use it in GitHub Desktop.
Find Duplicated Javascript code
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
files=frontend/src/static/js | |
pmd cpd --language ecmascript --minimum-tokens 25 --files "$files" --format csv \ | |
| sed "1s/.*/lines,tokens,occurrences,L1,F1,L2,F2,L3,F3,L4,F4/" \ | |
| sed "s|$(pwd)/$files|.|g" \ | |
> dups.csv |
But what if the identical blocks of code are rearranged?
This will appear as, say, 10 blocks of 90+ tokens, but you know that most of the file is copied
The cycle is:
- Run dups
- Move some identical lines where they appear in the original
- Run dups again
- You should see longer token runs and fewer of them. 90 tokens => 170 tokens
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Some annotations
pmd cpd
(source) is an open source tool that detects similar linesLine 1: Read in all the Javascript files, look for at least a run of 25 identical tokens, output to CSV
Line 2: Rename the first line to something more usable to Excel
Line 3: Replace the full absolute pathname in the files with the shorter relative directory
Line 4: Output to
dups.csv
Dups will contain a list of the identical token blocks with the
Lx
&Fx
columns containing the Line and File