Skip to content

Instantly share code, notes, and snippets.

@softwaredoug
Last active February 21, 2020 13:13
Show Gist options
  • Save softwaredoug/92efde1a5ad7dbdd66dafc89227516fd to your computer and use it in GitHub Desktop.
Save softwaredoug/92efde1a5ad7dbdd66dafc89227516fd to your computer and use it in GitHub Desktop.
Curated list of badly stemmed plural into a Solr / Elasticsearch synonyms file (as initially gathered by Mark Harwood, see linked issue in file)
# English minimal stemmer Plural misstems as a synonyms file
# NOT attempting to cover irregular plurals (feet/foot...)
#
# Gathered by Mark Harwood at Elastic
# https://github.com/elastic/elasticsearch/issues/42892
#
# Observation in curating these: some are both plurals and verbs
# such as 'harnesses' - Henry harnesses horses with harnesses
employees => employee
refugees => refugee
sees => see
fees => fee
degrees => degree
ties => tie
lies => lie
trees => tree
attendees => attendee
guarantees => guarantee
dies => die
agrees => agree
oversees => oversee
committees => committee
yankees => yankee
knees => knee
woes => woe
nominees => nominee
trustees => trustee
toes => toe
foes => foe
bees => bee
retirees => retiree
referees => referee
pies => pie
brees => bree
franchisees => franchisee
disagrees => disagree
honorees => honoree
rupees => rupee
detainees => detainee
devotees => devotee
frees => free
tees => tee
undergoes => undergo
trainees => trainee
licensees => licensee
entrees => entree
coffees => coffee
lees => lee
volcanoes => volcano
tornadoes => tornado
appointees => appointee
toffees => toffee
evacuees => evacuee
foresees => foresee
buffaloes => buffalo
businesses => business
buses => bus
matches => match
processes => process
losses => loss
classes => class
passes => pass
taxes => tax
launches => launch
coaches => coach
addresses => address
approaches => approach
witnesses => witness
inches => inch
boxes => box
wishes => wish
reaches => reach
dishes => dish
catches => catch
branches => branch
touches => touch
weaknesses => weakness
clashes => clash
teaches => teach
discusses => discuss
churches => church
successes => success
finishes => finish
glasses => glass
watches => watch
speeches => speech
searches => search
breaches => breach
beaches => beach
bosses => boss
masses => mass
dresses => dress
pitches => pitch
publishes => publish
illnesses => illness
sandwiches => sandwich
pushes => push
ashes => ash
crashes => crash
misses => miss
stretches => stretch
switches => switch
crosses => cross
encompasses => encompass
sunglasses => sunglass
patches => patch
stresses => stress
fixes => fix
possesses => possess
progresses => progress
expresses => express
punches => punch
lunches => lunch
actresses => actress
establishes => establish
flashes => flash
mixes => mix
kisses => kiss
riches => rich
sketches => sketch
batches => batch
bushes => bush
rushes => rush # ambiguous: may be verb
assesses => assess
presses => press
benches => bench
brushes => brush
parishes => parish
lashes => lash
stitches => stitch
scratches => scratch
trenches => trench
peaches => peach
marches => march
foxes => fox
washes => wash
mattresses => mattress
witches => witch
dismisses => dismiss
harnesses => harness
glitches => glitch
clutches => clutch
excesses => excess
researches => research
messes => mess
impresses => impress
diminishes => diminish
niches => nich
notches => notch
taxes => tax
boxes => box
indexes => index
fixes => fix
mixes => mix
complexes => complex
foxes => fox
sixes => six
sexes => sex
axes => ax
remixes => remix
exes => ex
relaxes => relax
reflexes => reflex
mailboxes => mailbox
hoaxes => hoax
inboxes => inbox
annexes => annex
waxes => wax
multiplexes => multiplex
gearboxes => gearbox
flexes => flex
faxes => fax
lunchboxes => lunchbox
duplexes => duplex
paradoxes => paradox
tuxes => tux
climaxes => climax
sandboxes => sandbox
influxes => influx
maxes => max
prefixes => prefix
coaxes => coax
toolboxes => toolbox
nixes => nix
premixes => premix
vortexes => vortex
fluxes => flux
suplexes => suplex
shoeboxes => shoebox
equinoxes => equinox
vexes => vex
hotfixes => hotfix
connexes => connex
suffixes => suffix
checkboxes => checkbox
bug fixes,bugfixes => bugfix
crucifixes => crucifix
jukeboxes => jukebox
letterboxes => letterbox
saxes => sax
subindexes => subindex
hexes => hex
perplexes => perplex
affixes => affix
pickaxes => pickax
rolexes => rolex
apexes => apex
xboxes => xbox
praxes => prax
aframaxes => aframax
cineplexes => cineplex
appendixes => appendix
flummoxes => flummox
panamaxes => panamax
boomboxes => boombox
transfixes => transfix
jinxes => jinx
textboxes => textbox
muxes => mux
shoes => shoe
heroes => hero
tomatoes => tomato
potatoes => potato
echoes => echo
superheroes => superhero
mosquitoes => mosquito
undergoes => undergo
volcanoes => volcano
tornadoes => tornado
buffaloes => buffalo
cargoes => cargo
throes => throe
zeroes => zero
vetoes => veto
canoes => canoe
mangoes => mango
dominoes => domino
faroes => faro
negroes => negro
horseshoes => horseshoe
torpedoes => torpedo
frescoes => fresco
embargoes => embargo
backhoes => backhoe
mementoes => memento
tiptoes => tiptoe
floes => floe # a sheet of floating ice
dingoes => dingo
commandoes,commandos => commando
snowshoes => snowshoe
avocadoes => avocado
mottoes => motto
antiheroes => antihero
siloes => silo
foregoes => forego
flamingoes => flamingo
sloes => slo # fruit of the blackthorn
ghettoes => ghetto
gittoes => gitto
innuendoes => innuendo
manifestoes => manifesto
haloes => halo
aloes => alo
grottoes => grotto
ciscoes => cisco
acoes => aco
desperadoes => desperado
sheroes => shero
peccadilloes => peccadillo
erdoes => erdo
weirdoes => weirdo
supervolcanoes => ringoes
oboes => obo
porticoes => portico
fiascoes => fiasco
hammertoes => hammertoe
inductees => inductee
awardees => awardee
threes => three
returnees => returnee
chimpanzees => chimpanzee
grantees => grantee
interviewees => interviewee
enrollees => enrollee
invitees => invitee
escapees => escapee
pharisees => pharisee
honeybees => honeybee
absentees => absentee
burpees => burpee
amputees => amputee
divorcees => divorcee
gees => gee
lessees => lessee
emcees => emcee
pedigrees => pedigree
humvees => humvee
soirees => soiree
manatees => manatee
marquees => marquee
loanees => loanee
signees => signee
mentees => mentee
monkees => monkee
kees => kee
bumblebees => bumblebee
transferees => transferee
headaches => headache
patches => patch
punches => punch
lunches => lunch
riches => rich
sketches => sketch
batches => batch
benches => bench
stitches => stitch
scratches => scratch
trenches => trench
peaches => peach
marches => march
witches => witch
smartwatches => smartwatch
attaches => attach
arches => arch
glitches => glitch
clutches => clutch
pouches => pouch
researches => research
crutches => crutch
niches => nich
ditches => ditch
dispatches => dispatch
notches => notch
preaches => preach
couches => couch
tranches => tranche
torches => torch
bunches => bunch
enriches => enrich
backbenches => backbench
ranches => ranch
bitches => bitch
hatches => hatch
swatches => swatch
cliches => clich
cockroaches => cockroach
crunches => crunch
porches => porch
pooches => pooch
caches => cache
mismatches => mismatch
starches => starch
latches => latch
clinches => clinch
porsches => porsche
snatches => snatch
avalanches => avalanche
hitches => hitch
perches => perch
roaches => roach
wrenches => wrench
finches => finch
pinches => pinch
fetches => fetch
leeches => leech
brunches => brunch
lurches => lurch
mustaches => mustache
relaunches => relaunch
apaches => apache
breeches => breech
brooches => brooch
slouches => slouch
wristwatches => wristwatch
winches => winch
heartaches => heartache
psyches => psych
moustaches => moustache
haunches => haunch
hunches => hunch
blotches => blotch
beseeches => beseech
twitches => twitch
smooches => smooch
quiches => quiche
deutsches => deutsch
encroaches => encroach
entrenches => entrench
rematches => rematch
goldfinches => goldfinch
flinches => flinch
roches => roch
outreaches => outreach
beeches => beech
naches => nach
bleaches => bleach
detaches => detach
poaches => poach
birches => birch
impeaches => impeach
crouches => crouch
belches => belch
cwtches => cwtch
masterbatches => masterbatch
geocaches => geocache
cinches => cinch
stiches => stich
despatches => despatch
botches => botch
@softwaredoug
Copy link
Author

table copy pasted from Mark's issue, used this regex in MS Code (^(\w+)\s+\d+\s(\w+) replace with $1 => $2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment