Skip to content

Instantly share code, notes, and snippets.

@mkwatson
Created August 22, 2025 17:34
Show Gist options
  • Save mkwatson/613a659b027a09fe483a9c47c0156cc5 to your computer and use it in GitHub Desktop.
Save mkwatson/613a659b027a09fe483a9c47c0156cc5 to your computer and use it in GitHub Desktop.

What does "No low surrogate in string" mean when talking about invalid JSON?

This error occurs when JSON contains a broken Unicode character encoding. Here's everything you need to understand it:

Unicode Background

Unicode assigns every character a unique number (code point). Simple characters like "A" get small numbers (U+0041), while emojis like πŸ˜€ get large numbers (U+1F600).

Unicode organizes these numbers into 17 "planes" - think of them as 17 floors in a building, each holding 65,536 characters:

  • Plane 0 (U+0000 to U+FFFF): Common characters - Latin, Asian scripts, basic symbols
  • Planes 1-16 (U+10000 to U+10FFFF): Emojis, historical scripts, rare symbols

The UTF-16 Problem

JSON uses UTF-16 encoding, which represents characters as 16-bit (2-byte) units. This works perfectly for Plane 0 characters:

  • "A" = \u0041 (one unit)
  • "€" = \u20AC (one unit)

But 16 bits can only represent 65,536 values. Characters in planes 1-16 have numbers too large to fit! The solution: use two 16-bit units working together, called a surrogate pair.

How Surrogate Pairs Work

For characters beyond Plane 0, UTF-16 splits them into two parts:

  1. High surrogate: A value between \uD800 and \uDBFF (the first half)
  2. Low surrogate: A value between \uDC00 and \uDFFF (the second half)

Example: The πŸ˜€ emoji (U+1F600) becomes \uD83D\uDE00:

  • \uD83D = high surrogate
  • \uDE00 = low surrogate
  • They MUST appear together, in this order

The "No low surrogate" Error

This error means JSON contains a high surrogate that isn't followed by a low surrogate:

// ❌ INVALID - High surrogate alone
{"text": "\uD83D"}

// ❌ INVALID - High surrogate followed by regular character
{"text": "\uD83Dabc"}  

// ❌ INVALID - High surrogate followed by another high surrogate
{"text": "\uD83D\uD83D"}

// βœ… VALID - Complete surrogate pair
{"text": "\uD83D\uDE00"}  // Renders as πŸ˜€

Common Causes

  1. String truncation: Cutting text in the middle of an emoji

    "Hello πŸ˜€".substring(0, 7)  // Might cut the emoji in half
  2. Manual escaping: Writing escape sequences incorrectly

    {"text": "\uD83D"}  // Forgot the second part
  3. Data corruption: Network or storage issues damaging the string

  4. Encoding conversion bugs: Errors when converting between different Unicode formats

The Fix

  • Validate surrogate pairs: If you see \uD800-\uDBFF, ensure it's followed by \uDC00-\uDFFF
  • Keep pairs intact: When processing strings, never split between a high and low surrogate
  • Use proper libraries: Unicode-aware functions that handle surrogates correctly
  • Test with emojis: They're the most common use of surrogate pairs

Quick Summary

The error "No low surrogate in string" means: "I found the first half of a two-part character (like an emoji), but the second half is missing or invalid." It's like finding an opening parenthesis "(" without its closing partner ")" - JSON knows something is incomplete and rejects it as invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment