A naive regex to match supplementary characters in the range U+10000–U+EFFFF produces nonsensical results: jshell> Pattern.matches("[\u10000-\uEFFFF]+", "abc") ==> true Likewise the version using regex escapes rather than character escapes: jshell> Pattern.matches("[\\u10000-\\uEFFFF]+", "abc") ==> true This is presumably because interprets the first four digits as the character code, and the final digit as a separate character: jshell> "\u10000".toCharArray() ==> char[2] { 'က', '0' } // '\u1000', '0' jshell> "\uEFFFF".toCharArray() ==> char[2] { '', 'F' } // '\uEFFF', 'F' According to [_Supplementary Characters in the Java Platform_](http://www.oracle.com/us/technologies/java/supplementary-142654.html), the proper way to escape surrogate characters is with UTF-16 code units. In UTF-16, [U+10000](http://www.fileformat.info/info/unicode/char/10000/index.htm) is 0xD800 0xDC00, and [U+EFFFF](http://www.fileformat.info/info/unicode/char/EFFFF/index.htm) is 0xDB7F 0xDFFF. This gives us the regex `"[\uD800\uDC00-\uDB7F\uDFFF]"`: jshell> Pattern.matches("[\uD800\uDC00-\uDB7F\uDFFF]", "1") ==> false jshell> Pattern.matches("[\uD800\uDC00-\uDB7F\uDFFF]", "\uD9BF\uDFFF") // U+7FFFF ==> true