miller-rabin-testing-questions

[Claude 4.6 Sonnet, May 1 2026]

[Jason] For 2048-bit odd numbers, what fraction of them are multiples of at least one of the first 256 prime numbers?

[Claude] This is a classic inclusion-exclusion / sieve problem. Let me work through it carefully.

Setup

We want: among all 2048-bit odd numbers, what fraction are divisible by at least one prime in {3, 5, 7, 11, ..., p₂₅₆} where p₂₅₆ = 1619?

(We exclude 2 since we're already restricting to odd numbers.)

Key Insight

For an odd prime $p$, the probability that a "random" odd number is divisible by $p$ is exactly $1/p$. This is because among any $p$ consecutive integers, exactly one is divisible by $p$, and half of all integers are odd — but crucially, among any $2p$ consecutive integers, exactly 2 are divisible by $p$, and $p$ are odd, so the fraction of odd numbers divisible by $p$ is $2/(2p) = 1/p$...

Wait — let me be more careful.

Among integers in any window of size $2p$: there are $p$ odd numbers, and exactly 1 odd multiple of $p$ (since $p$ is odd, $p$ itself is odd, so multiples of $p$ alternate odd/even). So the fraction is $1/p$. ✓

Inclusion-Exclusion via Complement

The fraction not divisible by any of the first 256 odd primes is:

$$\prod_{i=2}^{257} \left(1 - \frac{1}{p_i}\right) = \prod_{i=2}^{257} \frac{p_i - 1}{p_i}$$

where $p_2 = 3, p_3 = 5, \ldots, p_{257} = 1619$.

This works because divisibility by distinct primes is independent over uniform integers (by CRT).

So the fraction divisible by at least one is:

$$F = 1 - \prod_{i=2}^{257} \left(1 - \frac{1}{p_i}\right)$$

Computation

Oops — primerange(3, 1620) gives 255 primes, not 256. Let me fix:

Result

$$\boxed{ F \approx 84.887 \% }$$

Summary

Quantity	Value
Primes used	3, 5, 7, …, 1621 (the 2nd through 257th primes)
Fraction with no small prime factor	≈ 15.113%
Fraction with at least one small prime factor	≈ 84.887%

Why the 2048-bit part doesn't matter

The 2048-bit constraint (i.e., the number lives in $[2^{2047}, 2^{2048})$ ) is essentially irrelevant. Since all our primes are tiny (≤ 1621) compared to a 2048-bit number, the divisibility conditions are uniform and independent by the Chinese Remainder Theorem — the same fraction applies regardless of the bit-length.

Practical relevance

This is exactly the calculation done by prime generation algorithms (like those in RSA key generation). Before doing an expensive Miller-Rabin primality test, candidates are trial-divided against small primes. This shows that ~85% of random odd numbers are eliminated by just checking the first 256 primes — a huge speedup!

[Jason] What if I check the first 4096 primes? How many odd numbers, on average, are not divisible by any of these prime numbers?

[Claude] Great follow-up! Note the question shifts from fraction to count — so the 2048-bit constraint now matters.

Results

The first 4096 odd primes go up to 38,891.

Fraction surviving

About 10.62% of 2048-bit odd numbers have no factor among the first 4096 primes — up from ~15.1% with 256 primes, the extra sieving eliminates another ~4.5 percentage points.

Expected count of survivors

The 2048-bit odd numbers span $[2^{2047},\ 2^{2048}-1]$, and there are exactly $2^{2046}$ of them. The expected number with no small prime factor is:

$$2^{2046} \times \prod_{i=1}^{4096}\left(1 - \frac{1}{p_i}\right) \;\approx\; 2^{2046} \times 0.10621 \;\approx\; 2^{2042.76}$$

That's an astronomically large number — roughly $2^{2042.76}$, or a number with about 614 decimal digits. Even after aggressive sieving, the sheer size of the 2048-bit space means an enormous count of survivors remain, most of which are composite (and would be caught by Miller-Rabin). This is why trial division alone can never replace a proper primality test for large numbers.

[Jason] When I said "how many" I meant what fraction, not the absolute count

[Claude] Fair enough! Here's the clean answer:

≈ 10.621% of 2048-bit odd numbers survive — i.e., are not divisible by any of the first 4096 primes (3 through 38,891).

For comparison:

Primes checked	Largest prime	Fraction surviving
256	1,621	15.113%
4096	38,891	10.621%

And as noted before, the 2048-bit constraint is irrelevant to this fraction — it's the same for any sufficiently large odd numbers.

[Jason] Miller-Rabin has at least 3/4 witnesses for composite numbers. But many composite numbers have a larger fraction of witnesses. How can I detect whether a given number has properties that would make the number of witnesses close to this 3/4 lower bound?

[Claude] Great question. The 3/4 bound is tight, and the numbers that come closest to it are Carmichael numbers (or more precisely, their generalization in the Miller-Rabin context).

Why some composites are "hard" for Miller-Rabin

For Miller-Rabin with base $a$, write $n-1 = 2^s \cdot d$. A witness $a$ is a strong liar (non-witness) if:

$a^d \equiv 1 \pmod{n}$, or
$a^{2^r d} \equiv -1 \pmod{n}$ for some $0 \le r < s$

The composites with the most strong liars — closest to the 3/4 bound — are called strong pseudoprimes to many bases, and they have a specific structure.

The key property: Carmichael-like structure

The numbers achieving close to 3/4 non-witnesses are characterized by:

$n$ is a product of distinct odd primes $p_1, p_2, \ldots, p_k$ where for each $p_i$: $$p_i - 1 \mid \lambda(n) / 2^{\text{something}}$$

More precisely, let $n - 1 = 2^s \cdot d$. The strong liars modulo $n$ form a subgroup, and its size is maximized when:

$$\text{ord}_{p_i}(a) \mid 2^s d \text{ in the "most compatible" way across all factors}$$

Practical detection

Here's what to check:

1. Compute $\lambda(n)$ (Carmichael function)

If $n = \prod p_i^{e_i}$, then $\lambda(n) = \text{lcm}(\lambda(p_i^{e_i}))$. A composite $n$ with $\lambda(n) \mid n-1$ is a Carmichael number — these fool Fermat tests completely and are hard for Miller-Rabin too.

2. Check the 2-adic structure

Write $n - 1 = 2^s d$ and for each prime factor $p \mid n$, write $p - 1 = 2^{t_p} \cdot d_p$. The fraction of strong liars is large when:

$$t_p = s \quad \text{for all primes } p \mid n$$

i.e., all $p-1$ have the same 2-adic valuation as $n-1$. This maximizes the overlap in the Miller-Rabin subgroup structure.

3. The exact formula for the number of strong liars

For squarefree $n = p_1 \cdots p_k$, the number of strong liars is:

$$L(n) = \prod_{i=1}^{k} \gcd(2^s d,\ p_i - 1) \cdot \frac{1}{n} \quad \text{(roughly)}$$

More precisely:

$$|SL(n)| = \text{lcm}\!\left(\gcd(2^{t} d,\, p_i-1) : i=1\ldots k\right) \cdot \ldots$$

Actually the exact count is:

$$|SL(n)| = \prod_{j} \gcd\!\left(\frac{n-1}{2^j},\ \lambda(n)\right) \quad \text{(summed over } j \text{)}$$

The cleanest form: let $m = \text{lcm}(p_1-1, \ldots, p_k-1)$. The number of strong liars is related to $\gcd(n-1, m)$ decomposed by 2-adic layers.

Concrete algorithm to assess "hardness"

from sympy import factorint, gcd
from math import gcd as mgcd

def miller_rabin_hardness(n):
    factors = factorint(n)
    if len(factors) == 1 and list(factors.values())[0] > 1:
        return 0  # prime power, MR works well

    # Write n-1 = 2^s * d
    nm1 = n - 1
    s = 0
    while nm1 % 2 == 0:
        nm1 //= 2
        s += 1
    d = nm1

    primes = list(factors.keys())

    # For each prime factor, get 2-adic valuation of p-1
    t_values = []
    for p in primes:
        pm1 = p - 1
        t = 0
        while pm1 % 2 == 0:
            pm1 //= 2
            t += 1
        t_values.append(t)

    # Red flag: all t_p equal s (maximizes strong liars)
    all_equal_s = all(t == s for t in t_values)

    # Compute approximate fraction of strong liars
    # = (1/n) * product of gcd(2^s * d, p_i - 1)
    liar_count_approx = 1
    for p in primes:
        liar_count_approx *= mgcd(n - 1, p - 1)

    fraction = liar_count_approx / n

    return {
        's': s,
        't_per_factor': list(zip(primes, t_values)),
        'all_t_equal_s': all_equal_s,
        'liar_fraction_approx': fraction,
        'is_carmichael': all(((n-1) % (p-1) == 0) for p in primes)
    }

Summary of warning signs

Property	Why it's a red flag
$n$ is Carmichael ($\lambda(n) \mid n-1$)	Maximum Fermat liars; MR still works but harder
All $p_i - 1$ have same 2-adic valuation as $n-1$	Maximizes strong liar subgroup size
$n$ has few, large prime factors	Fewer factors = larger liar subgroups per factor
$\gcd(n-1, p_i - 1)$ is large for all $p_i$	Directly inflates liar count

The worst case for Miller-Rabin is a product of two primes $p, q$ where $p-1 \mid q-1$ and both have the same 2-adic structure as $n-1$ — these can approach the 3/4 bound closely. Running Miller-Rabin with random bases rather than fixed ones is the standard mitigation, since an adversary can't craft a number that fools all random bases simultaneously with good probability.

[Jason] Could you change this function to not depend on knowing the factors of n?

[Claude] Good point — in practice you'd only run this analysis on numbers that look like they might be prime, so you won't have the factorization. Here's the refactored version:

from math import gcd, isqrt

def miller_rabin_hardness(n):
    """
    Assess how close n is to the 3/4 strong-liar bound for Miller-Rabin,
    without requiring prior factorization.
    """
    if n < 2 or n % 2 == 0:
        raise ValueError("n must be odd and > 1")

    # Write n-1 = 2^s * d
    nm1 = n - 1
    s, d = 0, nm1
    while d % 2 == 0:
        d //= 2
        s += 1

    # Trial-divide to find small prime factors
    small_primes = []
    rem = n
    for p in range(3, min(isqrt(n) + 1, 100_000), 2):
        if rem % p == 0:
            small_primes.append(p)
            while rem % p == 0:
                rem //= p
    if rem > 1:
        small_primes.append(rem)  # one large factor (possibly composite itself)

    if not small_primes:
        return {"verdict": "no small factors found — likely prime or large semiprime"}

    # 2-adic valuation of each p-1
    def v2(x):
        t = 0
        while x % 2 == 0:
            x //= 2
            t += 1
        return t

    t_values = [(p, v2(p - 1)) for p in small_primes]

    # Key red flags
    all_t_equal_s = all(t == s for _, t in t_values)
    gcd_nm1 = 1
    for p in small_primes:
        gcd_nm1 *= gcd(n - 1, p - 1)
    liar_fraction_approx = gcd_nm1 / n

    is_carmichael_candidate = all((n - 1) % (p - 1) == 0 for p in small_primes)

    return {
        's (2-adic valuation of n-1)': s,
        't per small factor': t_values,
        'all t_p == s': all_t_equal_s,
        'liar fraction (approx, small factors only)': liar_fraction_approx,
        'carmichael candidate': is_carmichael_candidate,
        'unfactored residue': rem if rem > 1 else None,
    }

The main changes:

Trial division up to a bound (here 100,000) replaces factorint. You find whatever small prime factors you can, and track the unfactored residue.
The liar fraction is now a lower bound — if there's an unfactored residue, there may be more prime factors contributing additional liars you haven't accounted for.
The Carmichael check is similarly only over known factors, so it's a necessary but not sufficient condition.

The tradeoff is that if n is a semiprime with two large factors (e.g. an RSA modulus), trial division won't find either, and the function will tell you so honestly rather than giving a misleading result. In that case you'd need Pollard's rho or similar to go further.

[Jason] Would this actually speed up Miller-Rabin in practice, to dynamically determine the number of iterations?

[Claude] No, not really — and there are a few reasons why:

The analysis itself is expensive. Trial division up to 100,000 is slower than just running several Miller-Rabin rounds, each of which is a single modular exponentiation. You'd be spending more time on the diagnostic than you'd save.

You'd only ever reduce iterations, not increase them. The point of adaptive iteration counts would be to do fewer rounds on "easy" composites. But easy composites are already caught in 1–2 rounds with overwhelming probability. The hard cases — Carmichael-like numbers near the 3/4 bound — are exactly the ones where your analysis would say "use more rounds," giving you no savings.

The 3/4 bound is very conservative in practice. Random composites have a strong liar fraction much closer to 0 than 3/4. The standard recommendation of 40 rounds (or even 20) already gives astronomically small error probabilities for random inputs. The near-3/4 cases are so rare and so specifically structured that they're not worth optimizing for.

The real-world use case points the other way. In cryptographic prime generation, you're running Miller-Rabin on numbers that have already survived trial division against thousands of small primes. That sieving step is cheap per candidate and eliminates ~90% of them before Miller-Rabin even runs — which is the right way to combine the two techniques, rather than using trial division to tune iteration counts.

Where this analysis is useful is not in speeding up Miller-Rabin, but in understanding and testing implementations — e.g. constructing adversarial inputs to verify that a primality-testing library correctly uses random bases rather than fixed ones, or in academic analysis of the test's worst-case behavior.

[Jason] Is there an optimal number of trial divisions to use, before running Miller-Rabin? Sieving appears to have diminishing returns.

[Claude] Yes, and there's a clean way to think about the optimum.

The tradeoff

Each trial division step costs roughly $O(\log n)$ (one division), and eliminates a candidate with probability $1/p$ for prime $p$. Each Miller-Rabin round costs $O(\log^2 n)$ (one modular exponentiation). So trial division is cheaper than MR by a factor of roughly $\log n$ — but with diminishing returns as primes get larger.

The optimal cutoff

You should keep trial dividing as long as the expected cost of the next division is less than the expected cost it saves in Miller-Rabin rounds. The expected saving from testing prime $p$ is:

$$\frac{1}{p} \times C_{\text{MR}}$$

since with probability $1/p$ you eliminate the candidate and avoid running MR entirely. The cost of the trial division is $C_{\text{div}}$. So continue while:

$$\frac{C_{\text{MR}}}{p} > C_{\text{div}} \implies p < \frac{C_{\text{MR}}}{C_{\text{div}}} \approx \log n$$

So the optimal trial division bound is roughly $\log n$ — for a 2048-bit number, that's around 2048. This gives a cutoff around the 309th prime (which is 2053).

This is surprisingly small! The intuition is that once $p > \log n$, each division costs more in expectation than it saves.

Caveats

In practice the constant factors shift this somewhat:

Batch trial division (precomputing a product of many small primes and taking a single GCD with $n$) is much cheaper per prime than individual divisions, which shifts the optimal cutoff higher.
Memory and precomputation of a prime table is essentially free if you're generating many candidates, which again pushes the cutoff up.
The first few primes are so cheap to check that the theoretical cutoff doesn't matter much — checking all primes up to a few thousand is practically free compared to even one MR round.

What implementations actually do

Libraries like GMP and OpenSSL use cutoffs in the range of a few hundred to a few thousand, chosen empirically. The theoretical optimum and the practical sweet spot agree reasonably well — there's not much to be gained beyond ~2000–3000 for 2048-bit numbers, and the curve is fairly flat near the optimum so the exact choice doesn't matter much.

So the short answer: roughly $\log_2 n$ as a prime bound, but the returns are so flat near the optimum that anywhere in the range of a few hundred to a few thousand primes gives nearly identical performance for 2048-bit numbers.

[Jason] "checking all primes up to a few thousand is practically free compared to even one MR round." -- why is this true?

[Claude] Good challenge — I overstated it. Let me be precise.

For a 2048-bit number $n$:

One division by a small prime $p$ costs roughly $O(k)$ word operations where $k = 2048/64 = 32$ words — you're dividing a big number by a tiny one.
One Miller-Rabin round requires a modular exponentiation: ~ 1.5 × 2048 modular squarings/multiplications (by square-and-multiply), each of which is a $k \times k$ word multiplication — so roughly $3000 \times 32^2 \approx 3,000,000$ word operations.

So one MR round costs roughly 100,000× more than one trial division. That means you can do ~100,000 trial divisions for the cost of one MR round — which covers all primes up to around 1.3 million, well past where diminishing returns set in.

So the claim is actually true, it's just that my reasoning for why was lazy. The key isn't that division is cheap in absolute terms — it's that modular exponentiation over 2048-bit numbers is extraordinarily expensive because multiplication cost scales as $O(k^2)$ in the number of words, while division by a small prime stays $O(k)$.

This also means my earlier estimate of the optimal cutoff — $\log n \approx 2048$ — was too low, since it implicitly assumed the cost ratio was $\log n$ when it's actually closer to $(\log n)^2$, which puts the theoretical optimum much higher, in better agreement with what real implementations do.

jason-s/claude-4.6-sonnet-mrq.md

Select an option

No results found