A Note About Intl.Segmenter

If you use Intl.Segmenter instead of String.prototype.length to calculate string lengths, awesome, thank you for not being Anglocentric. However, I'd like to introduce you to a minor issue that can happen when doing this. Consider the below word which is written in Malayalam, one of the Indic languages belonging to the Dravidian language family:

പരീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീക്ഷ

Here, the character "ീ" is a mark character (\p{M} in RegExp), which is repeated 79 times. Yet Intl.Segmenter would tell you that the length of the word is 3 graphemes, which is technically correct — but this behaviour can be exploited by an attacker to send arbitrarily long data even if your application sets a limit on the input length. So it would always make sense to cap the byte length of your input on top of the Intl.Segmenter-based length validation.

FlameWolf/Using_Intl.Segmenter.md

Select an option

No results found

Select an option

No results found