Skip to content

Instantly share code, notes, and snippets.

@maqp
Last active January 4, 2021 06:38
Show Gist options
  • Save maqp/0f5351a71d33a2ebc6799b8b54764b41 to your computer and use it in GitHub Desktop.
Save maqp/0f5351a71d33a2ebc6799b8b54764b41 to your computer and use it in GitHub Desktop.
Pythagorean Factorization
#!/usr/bin/env python3.7
# -*- coding: utf-8 -*-
import math
import multiprocessing
import random
import time
from multiprocessing import Queue, Process
from typing import Tuple
import matplotlib.pylab as plt
def pythagorean_factorization(semiprime: int) -> Tuple[int, int, int]:
"""
Robert Grant's Pythagorean Factorization Algorithm
Source: instagram.com/p/CJMhkTjnBDj/
Claim:
Prime Factor-Based Encryptions (both RSA and ECC which is used by Bitcoin wallets as the wallet p2pk (p2pkh)
addresses) is now rendered useless BEFORE the advent of powerful Quantum Computers due to a Right Triangle-based
CONSTANT time mathematical solution.
Algorithm description:
Pythagorean Factorization Formula: (x+r)*(2B + (x+r)) = Side A^2 (which must
be a # Perfect Square Value). Where:
B = (Public Key)^.5
x = a whole number
r = the Δ difference between (Public Key)^.5 and the next rounded up integer value.
After finding the Perfect Square Value for Side A, use the Pythagorean Theorem (A^2 + B^2 = C^2) to solve for C.
Then solve the Prime factors (Private Keys) by the following:
Prime1 = C - A and
Prime2 = C + A.
Misc info:
I used our Crown Sterling random number generator to test the new Pythagorean Calculator via Microsoft Excel; We
were able to both immediately and successfully factor 1,000 Public Key values of various lengths/sizes (see
video for examples). Global Security Consultancy firm Deloitte recently published that over 25% of ALL Bitcoin
(approximately US$90B) is presently at risk of theft due to Quantum Computers which are now pre-empted by the
above mathematical formula.
"Proof" by picture:
https://preview.redd.it/fqhf1mqd0b761.jpg?width=598&format=pjpg&auto=webp&s=3ab97efb7bb1d4be8ce07c9020384fdbed169e2b
"""
# Lets initialize the counter for steps first. This will help us to calculate the efficiency of the algorithm.
steps = 0
# According the the description, the very next step is supposedly
# "(x+r)*(2B + (x+r)) = Side A^2 (which must be a Perfect Square Value)"
# but we can't perform the equation because we don't yet know what the values B, x and r are.
# Things start to clear out in the next step when we get definition for B:
# "B = (Public Key)^.5"
# Raising something to the power of 0.5 == ½, is the same as taking the square root of it.
steps += 1
side_B = math.sqrt(semiprime)
# We're then given the definition for x
# "x = a whole number".
# Okay, this is slightly weird in that there's nothing about the value of x mentioned. We'll ignore that for now,
# as e.g. initialization of variable before setting a value for it would just add an extra step.
# The next step is
# "r = the Δ difference between (Public Key)^.5 and the next rounded up integer value."
# As said above, "(Public Key)^.5" = B, so we'll round up the value B first to get the next integer:
steps += 1
next_integer = math.ceil(side_B)
# And now we can find r with substraction
steps += 1
r = next_integer - side_B
# The next step is
# "After finding the Perfect Square Value for Side A, use the Pythagorean Theorem (A^2 + B^2 = C^2) to solve for C."
# So we are supposed to find an integer value for side A. So what would that A need to be?
# We know A^2 needs to be a perfect square, which basically means A needs to be an integer, because the square of an
# integer is a perfect square, always. Duh. Furthermore, the number needs to be non-negative.
# To get to A we can't really use Grant's algorithm
# A = sqrt( (x+r) * (2 * side_B + (x+r)) )
# because do that, we need to know three things
# * side_B (we do)
# * r (we do)
# * x (we DON'T)
# The reason we don't know x is that would require us to know side_C, since
# x = side_C - side_B
# and again, we don't know side_C because to know side_C, we need know side_A, since
# side_C = √( side_A² + side_B² )
# Sound's like a catch-22? More on that later.
# To find out the solution we'll notice Grant's demonstration video on his Excel spreadsheet has as a running number
# "Ops", that the algorithm defines as X. To explain the connection between the variables: The algorithm that
# produces A in the Excel spreadsheet is visible in 2:33, and is
# SIDE A = ROUND(((( D5 + $C$5) * ((2*$C$3) + ($C$5 + D5)))^0.5),4) which matches the proof-by-picture algorithm
# A = √( ( x + r) * ( 2* B + ( r + x)) in Grant's Instagram post
#
# The D in D5 refers to the D column that's named "Ops".
# Similarly, the $C$5 refrences the value in the specific cell, defined on the video as
# "next_integer - √(public_key)". This is clearly r, as it was just defined above: "r = next_integer - side_B".
# Note:
# The Excel spreadsheet looks a lot like a LUT and initially it fooled me. But it turns out the entire sheet is
# actually a table of values as it iterates over value for X. The reason it fooled me, and the reason that makes (at
# least this particular) Excel spreadsheet a foolish tool to factor semiprimes with, is it doesn't automatically
# stop when it finds a match. Instead, as seen on video at e.g. 2:33, is the fact the algorithm continues to work
# after row 5, printing all the possible invalid prime candidate values etc., for as far as the spreadsheet cells
# contain instructions to update the value when the Public Key value at $A$3 is changed.
# In instagram post Grant was asked by the audience "how is x determined", to which Grant replied "x is a whole
# number". Given that Grant refuses to give a better explanation, and given that x is clearly iterated from 0 as far
# as needed, we'll implement this by iterating x. But because we are producing the algorithm in good faith, we will
# not implement the algorithm in a way that will continue picking invalid prime candidates even after the correct
# prime candidates have been found. This ensures Grant's algorithm has a fair chance against other algorithms.
# The reason we improve Grant's algorithms efficiency this way is a) it might not have been possible to implement it
# differently in Excel and b) Were the reasoning for the claim "the algorithn algorithm runs in constan time" the
# fact it takes fixed time to fill the entire excel spreadsheet, even when that's wast of CPU cycles, it wouldn't be
# valid from the perspective of algorithm design, as that would limit the maximum size of the public key to the
# largest semi-prime in the excel spreadsheet, which is visible in 6:32
# of the video:
# Unfortunately for Grant, the last value 53019257441871 is not a semiprime, as it's the product of 3, 7, 7,
# 179, 277, and 7274171. The next 9 last values are also not semi-primes, but the 11. to last, 53019184700161 is.
#
# But since a) this 47-bit semi-prime is not nearly large enough to compare to a modern 2048-bit RSA key, and because
# b) Grant claims the number of steps for the algorithm remain the same irrespective of key size, and
# c) Grant claims the algorithm can defeat modern key sizes,
# it's clear that the algorithm should not have an upper limit that's much smaller than what is claimed.
# That's just not how algorithms are designed.
# As per above, no better method has been provided, so we'll initialize
# the variable x as 0 as that's where Grant's Excel starts its "Ops" value.
steps += 1
x_candidate = 0
while True:
steps += 1 # While loop check
# Since we know r already, we can produce x+r
steps += 1
x_plus_r = x_candidate + r
# We can then produce a candidate for A using Grant's formula
# side_A = round(math.sqrt(x_plus_r * ((2 * side_B) + x_plus_r)))
# But we'll split it to pieces to get more accurate reading on the number of steps
steps += 1
two_times_B = 2 * side_B
steps += 1
two_times_B_plus_x_plus_r = two_times_B + x_plus_r
steps += 1
x_plus_r_times_two_times_B_plus_x_plus_r = x_plus_r * two_times_B_plus_x_plus_r
# We'll be fair and assume calculating square root only takes one step.
steps += 1
sqrt_x_plus_r_times_two_times_B_plus_x_plus_r = math.sqrt(x_plus_r_times_two_times_B_plus_x_plus_r)
steps += 1
side_A = round(sqrt_x_plus_r_times_two_times_B_plus_x_plus_r)
# Finally, we can use Pythagorean Theorem (A² + B² = C²) to solve C.
# Again, we'll split the process into steps.
steps += 1
A_squared = side_A ** 2
steps += 1
B_squared = side_B ** 2
steps += 1
sum_A_B_sq = A_squared + B_squared
steps += 1
sqrt_sum = math.sqrt(sum_A_B_sq)
steps += 1
side_C = round(sqrt_sum) # As Grant relies on decimal values, he hast to accept some numbers need to be rounded.
# Finally, we can produce the prime candidates...
steps += 1
side_C_sub_side_A = side_C - side_A
steps += 1
side_C_sum_side_A = side_C + side_A
steps += 1
prime_p_candidate = round(side_C_sub_side_A)
steps += 1
prime_q_candidate = round(side_C_sum_side_A)
# ...and see if they produce the semi-prime.
steps += 1
purp_semi_prime = prime_p_candidate * prime_q_candidate
steps += 1
match = purp_semi_prime == semiprime
steps += 1
if match:
steps += 1 # For return
return prime_p_candidate, prime_q_candidate, steps
# If we reach this point, the value for x was incorrect, so we increase it by 1 and restart the loop.
steps += 1
x_candidate += 1
def trial_division(semiprime: int) -> Tuple[int, int, int]:
"""Trial division (brute force) from https://en.wikipedia.org/wiki/Trial_division#Method"""
steps = 0
steps += 1
a = []
steps += 1
semiprime_mod_2 = semiprime % 2
steps += 1
semiprime_mod_2_is_zero = semiprime_mod_2 == 0
while semiprime_mod_2_is_zero:
steps += 1 # While evaluated True
steps += 1
a.append(2)
steps += 1
semiprime = semiprime / 2
steps += 1
n_mod_2 = semiprime % 2
steps += 1
semiprime_mod_2_is_zero = n_mod_2 == 0
steps += 1 # While evaluated False
steps += 1
f = 3
steps += 1
f_squared = f*f
steps += 1
f_squared_smaller_or_equal_to_semiprime = f_squared <= semiprime
while f_squared_smaller_or_equal_to_semiprime:
steps += 1 # While evaluated True
steps += 1
n_mod_f = semiprime % f
n_mod_f_equals_zero = n_mod_f == 0
steps += 1
if n_mod_f_equals_zero:
steps += 1
a.append(f)
steps += 1
semiprime = semiprime // f
else:
steps += 1
f += 2
steps += 1
f_squared = f * f
steps += 1
f_squared_smaller_or_equal_to_semiprime = f_squared <= semiprime
steps += 1 # While evaluated False
steps += 1
semiprime_not_equal_to_1 = semiprime != 1
steps += 1
if semiprime_not_equal_to_1:
steps += 1
a.append(semiprime)
steps += 1
prime1, prime2 = a
steps += 1
return prime1, prime2, steps
# Challenges
def generate_random_prime_smaller_than(n) -> int:
"""Returns a random prime smaller than n.
Based on
https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n/3035188#3035188
"""
sys_random = random.SystemRandom()
sieve = [True] * n
for i in range(3, int(n ** 0.5) + 1, 2):
if sieve[i]:
sieve[i * i::2 * i] = [False] * ((n - i * i - 1) // (2 * i) + 1)
list_of_primes = [2] + [i for i in range(3, n, 2) if sieve[i]]
return sys_random.choice(list_of_primes)
def generate_prime_between(min_, max_):
"""Create prime between min and max numbers."""
while True:
prime = generate_random_prime_smaller_than(max_)
if prime < min_:
if debug:
print(f"generate_prime_between(): Generated too small prime {prime} (should have been >={min_})")
continue
if debug:
print(f"generate_prime_between(): Generated prime {prime}")
return prime
def create_challenge(semiprime_bit_strength,
list_of_previous_challenges,
repeated_challenges_before_exit):
"""Create a challenge for a round."""
prime_bit_strength = semiprime_bit_strength // 2
prime_upper_limit = 2 ** prime_bit_strength
prime_lower_limit = prime_upper_limit // 2
while True:
repeats = 0
prime1 = generate_prime_between(prime_lower_limit, prime_upper_limit)
prime2 = generate_prime_between(prime_lower_limit, prime_upper_limit)
semiprime_challenge = prime1 * prime2
if semiprime_challenge in list_of_previous_challenges:
repeats += 1
if repeats >= repeated_challenges_before_exit:
print(f"Generated too many ({repeats}) consecutive identical games.\n"
"RNG is probably running out of challenges. Try larger primes\n"
"or wider range of allowed primes. Exiting.")
exit(1)
else:
break
return semiprime_challenge, prime1, prime2
def little_league():
"""\
In the little league, the contestants play a number of games to compare
semiprime factoring speed from the perspective of "number of steps".
Each contestant (algorithm) is given a random semiprime to factor.
Each algorithm will then return their prime factors and the number
of steps it took to produce them. The game host will then compare
the two to determine which algorithm was more efficient.
"""
# Game management
list_of_previous_challenges = []
repeated_challenges_before_exit = 10
# Rules
semiprime_bit_strength = 32
number_of_rounds = 100
# Evaluation
efficiency_list = []
show_rounds = True
round_number = 0
while round_number < number_of_rounds:
round_number += 1
semiprime_challenge, prime1, prime2 = create_challenge(semiprime_bit_strength,
list_of_previous_challenges,
repeated_challenges_before_exit)
# Our first contestant is the Crown Sterling Pythagorean Factoring
# Method reproduced in good faith to be as efficient as possible:
# https://instagram.com/p/CJMhkTjnBDj/
cs_factor1, cs_factor2, cs_steps = pythagorean_factorization(semiprime_challenge)
# Our second contestant is the trial division, the most laborious (read: least
# efficient, the same as brute force) method for finding prime factors.
trial_factor1, trial_factor2, trial_steps = trial_division(semiprime_challenge)
# Both contestants have now provided their purported prime factors and
# the number of steps they had to go through in order to produce them.
# We first verify that the factors are correct.
if cs_factor1 * cs_factor2 != semiprime_challenge:
print(f"Oh no! Crown Sterling failed factoring {semiprime_challenge}! Competition ends!")
exit(1)
if trial_factor1 * trial_factor2 != semiprime_challenge:
print(f"Oh no! Trial division failed factoring {semiprime_challenge}! Competition ends!")
exit(1)
# Both algorithms were correct! Time to see which one was more efficient during this round.
# We will represent the the efficiency of Crown Sterling method with a decimal value:
# 0.5 means Crown Sterling is twice as fast
# 2.0 means Crown Sterling twice as slow
cs_efficiency = cs_steps / trial_steps
# Well then add the result to a list for later evaluation of average score.
efficiency_list.append(cs_efficiency)
# We can observe per-round performance here:
if show_rounds:
challenge_whitespace = (13 - len(str(round_number))) * ' '
print(f"Challenge {round_number}:{challenge_whitespace} Factor {semiprime_challenge} | Solution: {prime1} * {prime2} (not shown to contestants)")
print(f"Trial division: {trial_factor1} * {trial_factor2} = {semiprime_challenge} -- Steps: {trial_steps}")
print(f"Crown Sterling: {cs_factor1 } * {cs_factor2 } = {semiprime_challenge} -- Steps: {cs_steps }")
print('--------------------------------------------------------------------------------------------------')
# We have now played all rounds of the game!
# Time to calculate the average efficiency of Crown Sterling method compared to brute force method.
avg_cs_efficiency = sum(efficiency_list) / len(efficiency_list)
# Depending on which one wins, we'll describe it a bit differently, because
# it's hard to understand "x was 0.5 times faster than y" but
# it's easy to understand "x was 2.0 times slower than y" (both mean the same thing).
result = ( f"{ avg_cs_efficiency:.2f} times slower" if avg_cs_efficiency > 1
else f"{(1/avg_cs_efficiency):.2f} times faster")
print(f"After {number_of_rounds} games played with {semiprime_bit_strength}-bit RSA public keys, on average, the Crown Sterling\n"
f"Pythagorean Factoring Method was {result} than trial division (brute force).")
def big_league():
"""
In the big league the competing algorithms play a number of games to
compare average semiprime factoring speed in number of steps, as a
function of linearly increased key size.
Each contestant (algorithm) is given a random semiprime to factor.
Each algorithm will then return their prime factors and the number
of steps it took to produce them. Based on the results, the game host
will display a graph that allows estimation of the algorithm's time
complexity class.
"""
# Game management
list_of_previous_challenges = []
repeated_challenges_before_exit = 10
# Rules
rounds_per_thread_per_key = 5
smallest_key_size = 6
largest_key_size = 51
# Evaluation
efficiency_dict_cs = {}
efficiency_dict_td = {}
show_rounds = False
def thread(queue: Queue, thread_id: int) -> None:
for key_size in range(smallest_key_size, largest_key_size+1): # +1 due to exlusive range.
round_number = 0
per_keysize_efficiency_list_cs = []
per_keysize_efficiency_list_td = []
while round_number < rounds_per_thread_per_key:
print(f"Thread {str(thread_id).zfill(2)}: Starting game {round_number} for key size {key_size}")
round_number += 1
semiprime_challenge, prime1, prime2 = create_challenge(key_size,
list_of_previous_challenges,
repeated_challenges_before_exit)
# Our first contestant is the Crown Sterling Pythagorean Factoring
# Method reproduced in good faith to be as efficient as possible:
# https://instagram.com/p/CJMhkTjnBDj/
cs_factor1, cs_factor2, cs_steps = pythagorean_factorization(semiprime_challenge)
# Our second contestant is the trial division, the most laborious (read: least
# efficient, the same as brute force) method for finding prime factors.
td_factor1, td_factor2, td_steps = trial_division(semiprime_challenge)
# Both contestants have now provided their purported prime factors and
# the number of steps they had to go through in order to produce them.
# # We first verify that the factors are correct.
if cs_factor1 * cs_factor2 != semiprime_challenge:
print(f"Oh no! Crown Sterling failed factoring {semiprime_challenge}! Competition ends!")
exit(1)
if td_factor1 * td_factor2 != semiprime_challenge:
print(f"Oh no! Trial division failed factoring {semiprime_challenge}! Competition ends!")
exit(1)
# The steps from the round are added to list.
per_keysize_efficiency_list_cs.append(cs_steps)
per_keysize_efficiency_list_td.append(td_steps)
# We can observe per-round performance here:
if show_rounds:
challenge_whitespace = (13 - len(str(round_number))) * ' '
print(f"Challenge {round_number}:{challenge_whitespace} Factor {key_size}-bit {semiprime_challenge} | Solution: {prime1} * {prime2} (not shown to contestants)")
print(f"Crown Sterling: {cs_factor1} * {cs_factor2} = {semiprime_challenge} -- Steps: {cs_steps}")
print(f"Trial division: {td_factor1} * {td_factor2} = {semiprime_challenge} -- Steps: {td_steps}")
print('--------------------------------------------------------------------------------------------------')
# We have now played all rounds for a given keysize!
# Time to calculate the average number of steps for both contestants
avg_cs_steps = sum(per_keysize_efficiency_list_cs) / len(per_keysize_efficiency_list_cs)
avg_td_steps = sum(per_keysize_efficiency_list_td) / len(per_keysize_efficiency_list_td)
# We'll finally add the average number of steps to a dictionary for a given key
efficiency_dict_cs[key_size] = avg_cs_steps
efficiency_dict_td[key_size] = avg_td_steps
queue.put((efficiency_dict_cs, efficiency_dict_td))
# We'll make use of the modern CPUs' capability to handle multiple threads at the same time. This will improve
# accuracy by multiplying the number of samples from which average number of steps will be calculated.
no_threads = multiprocessing.cpu_count()
# We'll initialize a list of queues that will be used to return values from the worker threads.
queue_list = []
for _ in range(no_threads):
queue_list.append(Queue())
# Here we'll initialize the worker threads
thread_list = []
for i in range(no_threads):
thread_list.append(Process(target=thread, args=(queue_list[i], i)))
# Here we'll start each worker
for i, t in enumerate(thread_list):
print(f"Starting Thread {i}")
t.start()
# Here we wait until each worker reports it has results available
while True:
time.sleep(0.1)
empty_queue = False
for q in queue_list:
if q.qsize() == 0:
empty_queue = True
if not empty_queue:
break
# Here we collect results from each worker
td_dict_list = []
cs_dict_list = []
for q in queue_list:
cs_d, td_d = q.get()
cs_dict_list.append(cs_d)
td_dict_list.append(td_d)
# Here we parse the results from each worker and combine each workers results for specific key sizes
cs_averages_dict = {}
for d in cs_dict_list:
for keysize in d:
if keysize not in cs_averages_dict.keys():
cs_averages_dict[keysize] = [d[keysize]]
else:
cs_averages_dict[keysize].append(d[keysize])
td_averages_dict = {}
for d in td_dict_list:
for keysize in d:
if keysize not in td_averages_dict.keys():
td_averages_dict[keysize] = [d[keysize]]
else:
td_averages_dict[keysize].append(d[keysize])
# Here we calculate the average number of steps for each key size
cs_final_dict = {}
for keysize in cs_averages_dict.keys():
cs_avg_steps = sum(cs_averages_dict[keysize]) / len(cs_averages_dict.keys())
cs_final_dict[keysize] = round(cs_avg_steps)
td_final_dict = {}
for keysize in td_averages_dict.keys():
td_avg_steps = sum(td_averages_dict[keysize]) / len(td_averages_dict.keys())
td_final_dict[keysize] = round(td_avg_steps)
print("Raw data: Pythagorean Factorization")
print(cs_final_dict)
print("Raw data: Trial division")
print(td_final_dict)
print('-----------')
# Here we map the number of steps of both contestants as a function of key size to understand the efficiency.
lists_cs = cs_final_dict.items()
lists_td = td_final_dict.items()
x_cs, y_cs = zip(*lists_cs)
x_td, y_td = zip(*lists_td)
fig, ax1 = plt.subplots()
color = 'tab:red'
ax1.set_xlabel('Public key size (bits)')
ax1.set_ylabel('Trial Division', color=color)
ax1.plot(x_td, y_td, color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
color = 'tab:blue'
ax2.set_ylabel('Pythagorean Factoring', color=color) # we already handled the x-label with ax1
ax2.plot(x_td, y_cs, color=color)
ax2.tick_params(axis='y', labelcolor=color)
fig.tight_layout() # otherwise the right y-label is slightly clipped
plt.show()
if __name__ == '__main__':
debug = False
# We'll start with the little league where we just compare average number of steps with some key size. If the
# contestant loses to brute force even in the little league, it makes no sense to even consider the big league.
little_league()
# In the little league we'll notice that Pythagorean Factoring method is 10 times faster than brute force!
# So this is clearly an improvement over the previous reciprocal factoring by Crown Sterling, as it's 60 times
# faster.
#
# Now, normally, a 10x improvement wouldn't be of ANY significance, but Grant has finally made a claim about the time
# complexity of his algorithm, which is remarkably, CONSTANT TIME. What this means that irrespective of key size,
# the algorithm should always take the same amount of steps. To understand the significance of this claim consider
# the following:
# * Brute force runs in exponential time and is effectively useless.
# * GNFS (best classical algorithm) runs in in subexponential time which is remarkable but not good enough.
# * Shor's algorithm runs in polynomial time which is fast enough to break RSA even with modern key lengths
# but it requires a large universal quantum computer to run.
# * Grant's algorithm is claimed to run in constant time which is better than Shor, and he claims the algorithm
# runs on a classical (i.e. non-quantum) computer.
#
# Now this I've got to see so it's time to welcome Grant's new algorithm to the big league!
#
# The rules of the big league
# ---------------------------
# If you look at the following graph https://entcheva.github.io/images/big-o.jpg
# you can see visual representations for the complexity classes. To give you an idea of efficiency classes,
# * Brute force is the line that says O(n^2)
# * GNFS would be between O(n^2) and O(n) but is not visible in this graph
# * Shor's algorithm is O(n)
# * According to Grant's claim, his algorithm will produce a horizontal line O(1), as he claims it runs in
# constant time.
#
# Even though we are not using large keys, we can expect the number of steps in Grant's algorithm to remain flat.
# If instead of of being flat, Grant's curve closely resembles the O(n), (the straight line) we still declare him
# as the winner, as that would still mean he has produced the equivalent of Shor's algorithm that runs on classical
# computers.
#
# To make the game slightly more interesting, Grant's algorithm will compete, again, against the brute force O(n^2)
# algorithm trial division. But since this is the big league, this time the handicap of incremental performance
# differences is dropped. In other words we're not interested in the tiny differences like n-times performance
# improvement that result in the OFFSET of the graph. What matters is the SHAPE of the efficiency curve. That's
# precisely why https://entcheva.github.io/images/big-o.jpg does not present any numbers.
#
# Thus, if the shape of the efficiency curve of Grant's algorithm resembles that of the exponential graph O(n^2)
# created by the useless brute force algorithm (trial division), Grant has unarguably lost.
#
# The overview of the game
# ------------------------
# The example graph will be produced on a Ryzen 1700, a 16 thread computer. Each thread will run 5 games with each
# key size. The set of semi-rpime key sizes will range from 6 to 51 bit keys, with 1-bit increments. That means
# 45 different key sizes in total, and 16*6 = 80 games with each key size. The small increments of key sizes and
# several games to produce averages will help produce a smooth graph, as it will remove statistical anomalies where
# the x-value randomly happens to be small. The wide range of keys means that the largest key is roughly
# 35,000,000,000,000 times larger than the smallest key. This should be adaquate enough range to determine which
# time complexity graph the algorithm produces. I.e. it will detect if the curve remains flat, or if it resembles
# that of the brute force, or if it's a straight line but not horizontal.
#
# So, ladies and gentlemen, give it up for the stars of our show. Time to drop the curtain!
big_league()
# ...and we have the results: https://i.imgur.com/wC1JT8A.png
#
# In Red we have the Trial Division, and in Blue we have
# Crown Sterling Pythagorean Factoring...
# Btw here's the raw data used to parse the graph:
# Raw data: Pythagorean Factorization
# {6: 9, 7: 10, 8: 16, 9: 16, 10: 26, 11: 26, 12: 50, 13: 48, 14: 92, 15: 87, 16: 175, 17: 181, 18: 359, 19: 359,
# 20: 721, 21: 731, 22: 1460, 23: 1390, 24: 2835, 25: 2860, 26: 5721, 27: 5714, 28: 11095, 29: 11588, 30: 22137,
# 31: 23035, 32: 46723, 33: 44952, 34: 89526, 35: 90671, 36: 175547, 37: 187461, 38: 363402, 39: 356902, 40: 732015,
# 41: 751871, 42: 1472577, 43: 1477059, 44: 2927451, 45: 2903642, 46: 5823890, 47: 5864920, 48: 11702947,
# 49: 11714605, 50: 24080335, 51: 23078546}
# Raw data: Trial division
# {6: 8, 7: 8, 8: 8, 9: 8, 10: 9, 11: 9, 12: 9, 13: 9, 14: 11, 15: 12, 16: 22, 17: 17, 18: 34, 19: 32, 20: 63, 21: 54,
# 22: 75, 23: 95, 24: 225, 25: 181, 26: 310, 27: 367, 28: 743, 29: 770, 30: 1589, 31: 1743, 32: 2942, 33: 2800,
# 34: 6395, 35: 6721, 36: 15501, 37: 14566, 38: 24784, 39: 22912, 40: 58420, 41: 50024, 42: 87215, 43: 98439,
# 44: 202480, 45: 125813, 46: 452392, 47: 432204, 48: 795242, 49: 786979, 50: 1569718, 51: 1600621}
# -----------
#
# And it's a... tie! That is, a tie between the algorithms. Meaning the graph shows Grant's algortihm runs obviously
# in the same exponential time complexity as brute force, and while it appears to be roughly 10..100 times faster
# than brute force, in terms of offset, such speedups are again, ignored in the Big O notation. I repeat, Grant's
# algorithm runs in O(n^2) time, and is thus exactly as bad as brute force.
# But what about Grant's proof? Clearly everything lines up on paper! Also, everything was really fast on Excel!
# --------------------------------------------------------------------------------------------------------------
#
# Not quite. Firstly, Excel is indeed ~instantanously fast with such small semi-primes. But so is your browser.
# Try it! The largest prime visible in Grant's program is visible at 6:32 of the video. It's 53019184700161,
# (i.e. the 11th from down).
#
# (You can also try the greater values at the bottom, but you'll find they're not actually even proper semi-primes.
# For example, the last purported value 53019257441871 is actually the product of 3, 7, 7, 179, 277 and
# 7274171. Grant calls them quasi-primes and RSA doesn't use quasi-primes but semi-primes that are always product of
# exactly two primes.)
#
# To factor the largest semi-prime in the file, copy it (53019184700161) to https://www.alpertron.com.ar/ECM.HTM and
# press "Factor". On my computer, factoring that semiprime took less than 0.1 seconds. So much for "pretty large"
# as claimed by Grant.
#
# As for Grant's description of the algorithm, you need to take a careful look at the structure, especially the way
# our programmatic implementation had to be started. You'll notice that while Grant did start his explanation from
# the public key, he immediately threw us the "(x+r)*(2B + (x+r)) = Side A^2 (which must be a Perfect Square Value)".
# That was weird, we were missing x, r, and B. So clearly it wasn't thought out thoroughly.
# We then get definition for B, then for x, and then for r. B would follow from √(public key), and r from ceil(B)-B.
# But X would not follow from anything. It just is.
#
# He then basically tells us to use the √((x+r)*(2B + (x+r))) to get A, but again, he completely ignores x.
# Then he jumps into how arrive to C from A and B:
# "After finding the Perfect Square Value for Side A, use the Pythagorean Theorem (A² + B² = C²) to solve for C."
#
# So the way to obtain x is never defined.
#
# So when Grant says "After finding A", the only variable that's missing from the equation for A is x, so he's
# actually saying, "After finding x". Finding x is the hard part. We had to resort into iterating with the program.
# And as the program iterated over x, it showed how the number of steps to find x increases exponentially
# when the public key size grows linearly. So Grant's textual explanation skips over the part that makes it
# exponential, so the explanation doesn't give any hint to how to implement the hard part.
# But the iterating over X is really fast! Surely it's constant time because the solution was on row 2 of Excel!
# --------------------------------------------------------------------------------------------------------------
#
# Interestingly, in the textual aspect Grant advices to just use X and find A, which invites into thinking about
# a solver algorithm from the perspective of iterating over A. But having watched the video Grant does something
# completely different himself. He appears to go to look for x. It wasn't immediately clear why Grant wanted to
# iterate over X when it forces the use of his solver formula for A, which on average adds extra operations.
#
# But it turns out, because Grant uses the "Crown Sterling™ CrownEngine™ RNG", he ends up creating random primes.
# Because the public keys demonstrated are very small (between 13 and 47 bits). the size of x is limited by the
# small size of the semi-prime alone. But much more importantly, using random primes sometimes leads to a situation
# the two prime factors are very close to one another. And when that happens, x ends up being very small. One easy
# way to see why this is, is to look at Grant's formula for finding A.
#
# But first, recall how Grant's algoritm finds primes:
# 1. once A is obtained for B, x and r, and
# 2. once C is obtained from A and B,
# 3. primes are obtained with
# prime1 = C-A and
# prime2 = C+A,
#
# You'll see both primes are "A" away from C (and C is the middlepoint between the two primes).
# So if A is smaller, that means the primes are closer to one another.
#
# You should now look at Grant's formula for generating A from x and r.
#
# First we notice that A is calculated with √((x + r) * (2 * side_B + (x + r))),
#
# That means A gets smaller when √((x + r) * (2 * side_B + (x + r))) gets smaller
#
# sqrt(some_value) gets smaller when some_value gets smaller, e.g. √(16) = 4, but √(9) = 3 Therefore
#
# √( (x+r) * (2 * side_B + (x+r)) ) gets smaller when (x+r) * (2 * side_B + (x+r)) gets smaler. And
#
# (x+r) * (2 * side_B + (x+r)) gets smaller when x+r gets smaller (or when 2B+x+r gets smaller). Finally
#
# (x+r) gets smaller when x gets smaller and 2B+x+r gets smaller when x gets smaller.
#
# So to recap: Sometimes randomly generated primes are close to another. In those cases x will be small.
# In such case, if factors are searched for by iterating over x, this method can find p and q fast.
#
# So what does it mean? It means that if Grant wants to make RSA seem really weak, he can cherry-pick semi-primes
# that have prime factors that are close to one another. That cherry-picked semi-rime is then trivial to decrypt
# due to small x. On the video Grant appears to be picking them from the list at random but the thing is,
# the ENTIRE list of semiprimes consists of primes that are relatively cose to one another, so picking
# any semi-prime from the list will be fast to factor.
# Prove x isn't usually small!
# ----------------------------
# That's exactly what the program above did. It plotted average value of x as a function of key size. The graph
# clearly shows average x grows exponentially when key size grows linearly.
# Oh, so, so what about the drawing? Everything was trivially easy to prove broken there!
# --------------------------------------------------------------------------------------
#
# When you look at the drawing, what you'll see is two important things:
#
# 1) The proof uses circular logic: it basically has no beginning or an end. This is because the hard part,
# finding x is already done. To make you forget that, Grant is just presenting the following tautology
# (logical statement that's always true):
# x+r = C-B
# <=> A = √((x+r) * (2B + (x+r)))
# <=> C = √(A²+B²)
#
# If you want proof it's a tautology
#
# A = √((x + r) * (2B + (x+r))) | ()²
# A² = (x + r) * (2B + (x+r)) | (x+r) <=> w
# A² = w * (2B + w ) |
# A² = w * (B+B + w ) |
# A² = wB + wB + w² | + (B²-B²)
# B²-B²+A² = B² + wB + wB + w² - B² | (a+b)(a+b) = a² + 2ab + b²
# A² = (B+w)(B+w) - B² | "x+r = C-B" <=> "w = C-B" <=> "w+B = C"
# A² = C² - B² | + B²
# A² + B² = C² | √()
# √(A² + B²)= C |
#
# 2) Like said, since X and the values that depend on it (C and A) were already written on the paper, everything
# fits together perfectly and all values appear to have existed there, always. What the reader will
# instinctively assume is, the values would be there magically beforehand every time, regardless of what
# public key needs decrypting. X is the missing variable but it feels small, maybe it's 2 every time, who
# knows. Surely it's a small value when it was small in every example. But again, finding X is the hard
# part. And the nasty fact is, X stops being small real quick when the prime factors p and q get further apart.
# I still don't believe you, this is a ground breaking attack!
# ------------------------------------------------------------
#
# Well, not at all. It turns out the factorization method Grant demonstrated, is actually really well
# known, and there already exists much more efficient method called Fermat's factorization method:
# https://en.wikipedia.org/wiki/Fermat%27s_factorization_method
#
# Thus, the risks of having the prime factors p and q too close to one another is extremely well known. In fact
#
# 1. E.g. on page 47 of https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-56Br2.pdf, NIST sets the
# minimum distance between p and q to 2^(N/2-100). How large is that, for RSA-2048 that's commonly used, that's
# at least 2^(2048/2-100) = 2^(1024-100) = 2^924, which is a ming-bogglingly large number. Neither Grant's, nor
# Fermat's factorization algorithm stands a chance with keys this large.
#
# 2. It's also not the case the recommendation exists in a vacuum on some NIST specification. Here's the
# piece of code in OpenSSL that checks p and q fullfill that particular NIST requirement:
# https://github.com/openssl/openssl/blob/master/crypto/rsa/rsa_sp800_56b_check.c#L251
#
# So how efficient is Fermat's algorithm actually? It turns out it runs in O(n^0.5) time as opposed to
# Grant's O(n^2) time. Grant's algorithm takes in the order of (p+q) / 2 - √(p*q) operations. So both run in
# exponential time, but Grant's is a lot worse.
#
# So, Grant's algorithm has the same time-complexity as trial division, that is, brute force. This makes the
# algorithm completely useless when attacking modern key sizes. E.g. GNFS algorithm is much more efficient
# algorithm as it runs in sub-exponential time. But even GNFS has so far only factored an 829-bit RSA key. The
# 2048-bit # RSA keys are out of reach for modern super computers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment