json-no-low-surrogate-error-explained.md

What does "No low surrogate in string" mean when talking about invalid JSON?

This error occurs when JSON contains a broken Unicode character encoding. Here's everything you need to understand it:

Unicode Background

Unicode assigns every character a unique number (code point). Simple characters like "A" get small numbers (U+0041), while emojis like 😀 get large numbers (U+1F600).

Unicode organizes these numbers into 17 "planes" - think of them as 17 floors in a building, each holding 65,536 characters:

Plane 0 (U+0000 to U+FFFF): Common characters - Latin, Asian scripts, basic symbols
Planes 1-16 (U+10000 to U+10FFFF): Emojis, historical scripts, rare symbols

The UTF-16 Problem

JSON uses UTF-16 encoding, which represents characters as 16-bit (2-byte) units. This works perfectly for Plane 0 characters:

"A" = \u0041 (one unit)
"€" = \u20AC (one unit)

But 16 bits can only represent 65,536 values. Characters in planes 1-16 have numbers too large to fit! The solution: use two 16-bit units working together, called a surrogate pair.

How Surrogate Pairs Work

For characters beyond Plane 0, UTF-16 splits them into two parts:

High surrogate: A value between \uD800 and \uDBFF (the first half)
Low surrogate: A value between \uDC00 and \uDFFF (the second half)

Example: The 😀 emoji (U+1F600) becomes \uD83D\uDE00:

\uD83D = high surrogate
\uDE00 = low surrogate
They MUST appear together, in this order

The "No low surrogate" Error

This error means JSON contains a high surrogate that isn't followed by a low surrogate:

// ❌ INVALID - High surrogate alone
{"text": "\uD83D"}

// ❌ INVALID - High surrogate followed by regular character
{"text": "\uD83Dabc"}  

// ❌ INVALID - High surrogate followed by another high surrogate
{"text": "\uD83D\uD83D"}

// ✅ VALID - Complete surrogate pair
{"text": "\uD83D\uDE00"}  // Renders as 😀

Common Causes

String truncation: Cutting text in the middle of an emoji

"Hello 😀".substring(0, 7)  // Might cut the emoji in half

Manual escaping: Writing escape sequences incorrectly
```
{"text": "\uD83D"}  // Forgot the second part
```
Data corruption: Network or storage issues damaging the string
Encoding conversion bugs: Errors when converting between different Unicode formats

The Fix

Validate surrogate pairs: If you see \uD800-\uDBFF, ensure it's followed by \uDC00-\uDFFF
Keep pairs intact: When processing strings, never split between a high and low surrogate
Use proper libraries: Unicode-aware functions that handle surrogates correctly
Test with emojis: They're the most common use of surrogate pairs

Quick Summary

The error "No low surrogate in string" means: "I found the first half of a two-part character (like an emoji), but the second half is missing or invalid." It's like finding an opening parenthesis "(" without its closing partner ")" - JSON knows something is incomplete and rejects it as invalid.

mkwatson/json-no-low-surrogate-error-explained.md

Select an option

No results found