-
-
Save eek/9c4887e80b3ede05c0e39fee4dce3747 to your computer and use it in GitHub Desktop.
function slugify(text) { | |
return text.toString().toLowerCase().trim() | |
.normalize('NFD') // separate accent from letter | |
.replace(/[\u0300-\u036f]/g, '') // remove all separated accents | |
.replace(/\s+/g, '-') // replace spaces with - | |
.replace(/&/g, '-and-') // replace & with 'and' | |
.replace(/[^\w\-]+/g, '') // remove all non-word chars | |
.replace(/--+/g, '-') // replace multiple '-' with single '-' | |
} |
@eek
Cool! Thanks again!
Now one more thing that I observe is that NFD actually does not take care about Umlauts (ä => ae...) or ß (ß -> ss) nor does [\u0300-\u036f] - 2 things which are anyway still quite a bit of a riddle for me.
Is it possible to deduce which "languages" does your script support? I didn't test for Hungarian or Turkish or any of those Nordern languages like islandic... should they theoretically be transcribed correctly according to their "locale"?
To my personal script, I therefore added a snippet for special characters, which I found here:
https://gist.github.com/mathewbyrne/1280286#gistcomment-3753527
Is this "wrong" when using normalize?
NFD takes care of all diacritics. So anything that's above or below the character.
It doesn't modify the actual component of the word. So Äpfel
has the diacritic removed and becomes Apfel
. Türkçe
becomes Turkce
. It doesn't change ä
to ae
it just removes the diacritics that are above and below the character. I've used it mainly for French and Romanian to generate URLs from Titles. Places where the written word without diacritics is exactly the same: mămăligă
is written mamaliga
.
So yeah ß
doesn't get converted to anything because it doesn't have any upper or lower accents.
Finally it made click and I believe to understand, what NFD does! Thank you very much for your efforts, @eek ! :-)
@rowild - Can't really remember, I've removed the escape characters now.