-
-
Save eek/9c4887e80b3ede05c0e39fee4dce3747 to your computer and use it in GitHub Desktop.
function slugify(text) { | |
return text.toString().toLowerCase().trim() | |
.normalize('NFD') // separate accent from letter | |
.replace(/[\u0300-\u036f]/g, '') // remove all separated accents | |
.replace(/\s+/g, '-') // replace spaces with - | |
.replace(/&/g, '-and-') // replace & with 'and' | |
.replace(/[^\w\-]+/g, '') // remove all non-word chars | |
.replace(/--+/g, '-') // replace multiple '-' with single '-' | |
} |
Why replace(/\-\-+/g, '-')
? Wouldn't replace(/-+/g, '-')
do it?
Why
replace(/\-\-+/g, '-')
? Wouldn'treplace(/-+/g, '-')
do it?
It would, but that also means replacing single '-' with itself '-'.
/\-\-+/g
only matches from the 2nd (e.g. '--') hyphen onwards.
It wouldn't really matter in most cases, but performance wise, if your string has no multiple hyphens and only a few single hyphens after the previous replacements, using a single hyphens match replacement would be slower (Check this - https://jsben.ch/7v4OT for an example only with single hyphens) and this (https://jsben.ch/GxYWA for an example with less multiple hyphens than single ones). But I guess it's negligible in almost all real-world-use scenarios.
@eek
Very interesting, and makes total sense! Thanks for your explanation.
However, eslint (in VScode) complains about "useless escape characters". It would like to read /--+/g
, which actually works fine. Is there also a reason for those backslashes? maybe a historical one (e.g. IE)?
@rowild - Can't really remember, I've removed the escape characters now.
@eek
Cool! Thanks again!
Now one more thing that I observe is that NFD actually does not take care about Umlauts (ä => ae...) or ß (ß -> ss) nor does [\u0300-\u036f] - 2 things which are anyway still quite a bit of a riddle for me.
Is it possible to deduce which "languages" does your script support? I didn't test for Hungarian or Turkish or any of those Nordern languages like islandic... should they theoretically be transcribed correctly according to their "locale"?
To my personal script, I therefore added a snippet for special characters, which I found here:
https://gist.github.com/mathewbyrne/1280286#gistcomment-3753527
Is this "wrong" when using normalize?
NFD takes care of all diacritics. So anything that's above or below the character.
It doesn't modify the actual component of the word. So Äpfel
has the diacritic removed and becomes Apfel
. Türkçe
becomes Turkce
. It doesn't change ä
to ae
it just removes the diacritics that are above and below the character. I've used it mainly for French and Romanian to generate URLs from Titles. Places where the written word without diacritics is exactly the same: mămăligă
is written mamaliga
.
So yeah ß
doesn't get converted to anything because it doesn't have any upper or lower accents.
Finally it made click and I believe to understand, what NFD does! Thank you very much for your efforts, @eek ! :-)
Wont work in IE...because (as expected) it doesn't support normalize(), otherwise nice to know =)