Skip to content

Instantly share code, notes, and snippets.

@andersonfreitas
Forked from weakish/README.markdown
Last active May 4, 2017 16:28
Show Gist options
  • Save andersonfreitas/57a96588fc009cc44f6f to your computer and use it in GitHub Desktop.
Save andersonfreitas/57a96588fc009cc44f6f to your computer and use it in GitHub Desktop.

Below is adapted from Matt Might's blog post:

3 shell scripts to improve your writing, or "My Ph.D. advisor rewrote himself in bash." .

Weasel words

Weasel words--phrases or words that sound good without conveying information--obscure precision.

I notice three kinds of weasel words in my students' writing:

  • salt and pepper words
  • beholder words
  • lazy words.

Salt and pepper words

New grad students sprinkle in salt and pepper words for seasoning. These words look and feel like technical words, but convey nothing.

My favorite salt and pepper words/phrases are various, a number of, fairly, and quite. Sentences that cut these words out become stronger.

  • Bad: It is quite difficult to find untainted samples.
  • Better: It is difficult to find untainted samples.
  • Bad: We used various methods to isolate four samples.
  • Better: We isolated four samples.

Beholder words

Beholder words are those whose meaning is a function of the reader; for example: interestingly, surprisingly, remarkably, or clearly.

Peer reviewers don't like judgments drawn for them.

  • Bad: False positives were surprisingly low.
  • Better: To our surprise, false positives were low.
  • Good: To our surprise, false positives were low (3%).

Lazy words

Students insert lazy words in order to avoid making a quantitative characterization. They give the impression that the author has not yet conducted said characterization.

These words make the science feel unfirm and unfinished.

The two worst offenders in this category are the words very and extremely. These two adverbs are never excusable in technical writing. Never.

Other offenders include several, exceedingly, many, most, few, vast.

  • Bad: There is very close match between the two semantics.
  • Better: There is a close match between the two semantics.

Adverbs

In technical writing, adverbs tend to come off as weasel words.

I'd even go so far as to say that the removal of all adverbs from any technical writing would be a net positive for my newest graduate students. (That is, new graduate students weaken a sentence when they insert adverbs more frequently than they strengthen it.)

  • Bad: We offer a completely different formulation of CFA.
  • Better: We offer a different formulation of CFA.
#!/usr/bin/env perl
# Finds duplicate adjacent words.
use strict ;
my $DupCount = 0 ;
if (!@ARGV) {
print "usage: dups <file> ...\n" ;
exit ;
}
while (1) {
my $FileName = shift @ARGV ;
# Exit code = number of duplicates found.
exit $DupCount if (!$FileName) ;
open FILE, $FileName or die $!;
my $LastWord = "" ;
my $LineNum = 0 ;
while (<FILE>) {
chomp ;
$LineNum ++ ;
my @words = split (/(\W+)/) ;
foreach my $word (@words) {
# Skip spaces:
next if $word =~ /^\s*$/ ;
# Skip punctuation:
if ($word =~ /^\W+$/) {
$LastWord = "" ;
next ;
}
# Found a dup?
if (lc($word) eq lc($LastWord)) {
print "$FileName:$LineNum $word\n" ;
$DupCount ++ ;
} # Thanks to Sean Cronin for tip on case.
# Mark this as the last word:
$LastWord = $word ;
}
}
close FILE ;
}
#!/bin/bash
irregulars="awoken|\
been|born|beat|\
become|begun|bent|\
beset|bet|bid|\
bidden|bound|bitten|\
bled|blown|broken|\
bred|brought|broadcast|\
built|burnt|burst|\
bought|cast|caught|\
chosen|clung|come|\
cost|crept|cut|\
dealt|dug|dived|\
done|drawn|dreamt|\
driven|drunk|eaten|fallen|\
fed|felt|fought|found|\
fit|fled|flung|flown|\
forbidden|forgotten|\
foregone|forgiven|\
forsaken|frozen|\
gotten|given|gone|\
ground|grown|hung|\
heard|hidden|hit|\
held|hurt|kept|knelt|\
knit|known|laid|led|\
leapt|learnt|left|\
lent|let|lain|lighted|\
lost|made|meant|met|\
misspelt|mistaken|mown|\
overcome|overdone|overtaken|\
overthrown|paid|pled|proven|\
put|quit|read|rid|ridden|\
rung|risen|run|sawn|said|\
seen|sought|sold|sent|\
set|sewn|shaken|shaven|\
shorn|shed|shone|shod|\
shot|shown|shrunk|shut|\
sung|sunk|sat|slept|\
slain|slid|slung|slit|\
smitten|sown|spoken|sped|\
spent|spilt|spun|spit|\
split|spread|sprung|stood|\
stolen|stuck|stung|stunk|\
stridden|struck|strung|\
striven|sworn|swept|\
swollen|swum|swung|taken|\
taught|torn|told|thought|\
thrived|thrown|thrust|\
trodden|understood|upheld|\
upset|woken|worn|woven|\
wed|wept|wound|won|\
withheld|withstood|wrung|\
written"
if [ "$1" = "" ]; then
echo "usage: `basename $0` <file> ..."
exit
fi
egrep -n -i --color \
"\\b(am|are|were|being|is|been|was|be)\
\\b[ ]*(\w+ed|($irregulars))\\b" $*
exit $?
#!/bin/bash
# Author: Matt Might
# URL: http://matt.might.net/articles/shell-scripts-for-passive-voice-weasel-words-duplicates/
### Find weasel words in text
weasels="many|various|very|fairly|several|extremely\
|exceedingly|quite|remarkably|few|surprisingly\
|mostly|largely|huge|tiny|((are|is) a number)\
|excellent|interestingly|significantly\
|substantially|clearly|vast|relatively|completely"
wordfile=""
# Check for an alternate weasel file
if [ -f $HOME/etc/words/weasels ]; then
wordfile="$HOME/etc/words/weasels"
fi
if [ -f $WORDSDIR/weasels ]; then
wordfile="$WORDSDIR/weasels"
fi
if [ -f words/weasels ]; then
wordfile="words/weasels"
fi
if [ ! "$wordfile" = "" ]; then
weasels="xyzabc123";
for w in `cat $wordfile`; do
weasels="$weasels|$w"
done
fi
if [ "$1" = "" ]; then
echo "usage: `basename $0` <file> ..."
exit
fi
egrep -i -n --color "\\b($weasels)\\b" $*
exit $?
# Check style:
proof:
echo "weasel words: "
sh bin/weasel *.tex
echo
echo "passive voice: "
sh bin/passive *.tex
echo
echo "duplicates: "
perl bin/dups *.tex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment