Skip to content

Instantly share code, notes, and snippets.

@FlameWolf
Created May 5, 2026 07:09
Show Gist options
  • Select an option

  • Save FlameWolf/c7ae1cf2c7593e850b6302e2dfcf583a to your computer and use it in GitHub Desktop.

Select an option

Save FlameWolf/c7ae1cf2c7593e850b6302e2dfcf583a to your computer and use it in GitHub Desktop.
A Note About Intl.Segmenter

If you use Intl.Segmenter instead of String.prototype.length to calculate string lengths, awesome, thank you for not being Anglocentric. However, I'd like to introduce you to a minor issue that can happen when doing this. Consider the below word which is written in Malayalam, one of the Indic languages belonging to the Dravidian language family:

പരീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീീക്ഷ

Here, the character "ീ" is a mark character (\p{M} in RegExp), which is repeated 79 times. Yet Intl.Segmenter would tell you that the length of the word is 3 graphemes, which is technically correct — but this behaviour can be exploited by an attacker to send arbitrarily long data even if your application sets a limit on the input length. So it would always make sense to cap the byte length of your input on top of the Intl.Segmenter-based length validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment