Skip to content

Instantly share code, notes, and snippets.

@fand
Created October 8, 2025 21:46
Show Gist options
  • Save fand/bea626cb8273ca649d305ab475b19681 to your computer and use it in GitHub Desktop.
Save fand/bea626cb8273ca649d305ab475b19681 to your computer and use it in GitHub Desktop.
unicode-segmentation v1.12.0 indic script segmentation test

grapheme count test

The crate unicode-segmentation has fixed the segmentation logic for indic scritps in v1.12.0. ref. unicode-rs/unicode-segmentation#125

unicode-segmentation v1.11.0

$ cargo update -p unicode-segmentation --precise 1.11.0
$ cargo run

नमस्ते (4 graphemes): ["न", "म", "स\u{94d}", "त\u{947}"]
नमस्ते (4 graphemes): ["न", "म", "स\u{94d}", "त\u{947}"]
नमस्ते (4 graphemes): ["न", "म", "स\u{94d}", "त\u{947}"]
র্যর্য (4 graphemes): ["র\u{9cd}", "য", "র\u{9cd}", "য"]
র্যর্য (4 graphemes): ["র\u{9cd}", "য", "র\u{9cd}", "য"]

unicode-segmentation v1.12.0

$ cargo update -p unicode-segmentation --precise 1.12.0
$ cargo run

नमस्ते (3 graphemes): ["न", "म", "स\u{94d}त\u{947}"]
नमस्ते (3 graphemes): ["न", "म", "स\u{94d}त\u{947}"]
नमस्ते (3 graphemes): ["न", "म", "स\u{94d}त\u{947}"]
র্যর্য (2 graphemes): ["র\u{9cd}য", "র\u{9cd}য"]
র্যর্য (2 graphemes): ["র\u{9cd}য", "র\u{9cd}য"]
[package]
name = "rust-grapheme-count-test"
version = "0.1.0"
edition = "2024"
[dependencies]
unicode-segmentation = "1.11.0"
use unicode_segmentation::UnicodeSegmentation;
fn main() {
let g = |s: &'static str| {
let graphemes = s.graphemes(true).collect::<Vec<_>>();
println!("{s} ({} graphemes): {:?}", graphemes.len(), graphemes);
};
g("नमस्ते");
g("नमस्ते");
g("नमस्ते");
g("র্যর্য");
g("র্যর্য");
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment