Last active
August 22, 2020 09:12
-
-
Save dosentmatter/7dd5b3612eb5243f78fe57cbb59e2c15 to your computer and use it in GitHub Desktop.
Leetcode 283. "Move Zeroes" Solutions 2/3 Analysis
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I don't think Solution 2 is always better than Solution 3 | |
First of all, it doesn't matter unless n is large, since both are O(n). | |
For both Solutions 2 and 3, you can skip operations when there are non-zeros at the beginning of the array by doing the following: | |
For Solution 2, you can skip moving when `lastNonZeroFoundAt == i` | |
For Solution 3, you can skip swapping when `lastNonZeroFoundAt == cur` | |
For Solution 2, even if you do a memset, that is an O(n) because it has to write each element to 0. | |
For Solution 3, we can say that swap is a 3 cost operation since it does 3 writes including the temporary variable. | |
--- | |
Let's calculate the cost for the general case: | |
Assumptions: | |
1. Each loop iteration costs 1 | |
2. Reading is 0 cost | |
3. Comparisons are 0 cost | |
4. Incrementing `lastNonZeroFoundAt` is 0 cost since both solutions increment it by the same amount (ie. the number of non-zeros) anyway | |
5. Writing is 1 cost | |
6. Swapping is 3 costs because it consists of 3 writes. | |
Let n be the number of elements in the array | |
Let C be the number of zeros. | |
Let k be the number of non-zeros in the incorrect location. | |
To explain k: | |
[1, 2, 3] - k = 0 because no zeros | |
[0, 1, 2, 3] - k = 3 because 1, 2, 3 need to be moved to indices 0, 1, 2 respectively | |
[0, 0, 1, 2, 3] - k = 3, because 1, 2, 3 need to be moved to indices 0, 1, 2 respectively | |
[0, 1, 2, 0, 3] - k = 3, because 1, 2, 3 need to be moved to indices 0, 1, 2 respectively | |
Notice that another way to think about k is "the number of non-zeros with at | |
least one zero before it". This is because if there is at least one zero | |
before a non-zero, it is out of place. More than one zero before a non-zero | |
does not change anything. That will only put more zeros at the end. This | |
knowledge will come in handy later. | |
--- | |
Cost of Solution 2 for the general case: | |
For the first loop, we have: | |
Cost(first loop) = Cost(first loop iterations) + Cost(first loop moves ie. writes) | |
since k of the elements need to be moved: | |
Cost(first loop) = Cost(n iterations) + Cost(k moves) | |
Cost(first loop) = n + k | |
For the second loop (memset), we have: | |
Cost(second loop) = Cost(second loop iterations) + Cost(second loop writes) | |
Cost(second loop) = Cost(C iterations) + Cost(C writes) | |
Cost(second loop) = C + C | |
Cost(second loop) = 2*C | |
So in total, | |
Cost(Solution 2) = Cost(first loop) + Cost(second loop) | |
Cost(Solution 2) = n + k + 2*C | |
Cost of Solution 3 for the general case: | |
Cost(Solution 3) = Cost(loop iterations) + Cost(loop swaps) | |
Cost(Solution 3) = Cost(n iterations) + Cost(k swaps) | |
Since a swap costs 3, | |
Cost(Solution 3) = n + 3*k | |
--- | |
So let's see when Solution 3 is better than or equal to Solution 2: | |
Cost(Solution 2) >= Cost(Solution 3) | |
n + k + 2*C >= n + 3*k | |
2*C >= 2*k | |
C >= k | |
k <= C | |
So this means that Cost(Solution 3) <= Cost(Solution 2) when k <= C. | |
k <= C means the number of non-zeros in the incorrect location is <= the number of zeros. | |
It would take O(n) to determine k or C. | |
For C, you O(n) loop to count number of zeros. | |
For k, you O(n) loop to find the first zero and then count the number of non-zeros found afterwards. | |
Even though you know the optimal Solution to use after counting, it might not | |
be worth it to do an O(n) count, so let's do some probability. | |
--- | |
So now let's do some probability and calculate what we expect k or C to be | |
given the following input: | |
1. We generate an array of length n. | |
2. Each element is generated independently with a probability p to be a non-zero, else zero. | |
What is the expected value of C and k? That is E[C] and E[k]? | |
If E[k] <= E[C], probability would suggest Solution 3 would be better | |
--- | |
E[C]: | |
Let C_i be a random variable on whether the element at index i is a zero. | |
Because of Linearity of Expectation, | |
E[C] = E[C_0] + E[C_1] + ... E[C_n-1] | |
There is a (1 - p) chance to generate a zero. | |
Since each element can contribute at most 1 zero with probability (1 - p). | |
E[C_i] = (1 - p) * 1 | |
E[C] = (1 - p) * 1 + (1 - p) * 1 + ... } n times | |
E[C] = (1 - p) * n | |
--- | |
E[k]: | |
This one is a little trickier | |
Again, we can use Linearity of Expecation, | |
Let k_i be a random variable on whether the element at index i is a non-zero in the incorrect location. | |
Mentioned above, I said another way to think about this is, | |
"a non-zero with at least one zero before it" | |
E[k] = E[k_0] + E[k_1] + ... E[k_n-1] | |
There is a p chance to generate a non-zero | |
Each element can contribute at most 1 incorrect non-zero, but what is the | |
probability of an incorrect non-zero? | |
The probability of an incorrect non-zero, is the probability of generating a | |
non-zero with at least one zero before it. | |
Let P mean "the probability of" | |
P(having at least one zero) = (1 - P(having all non-zeros)) | |
P(incorrect non-zero at index i) | |
= P(non-zero at index i with at least one zero before it) | |
Since the following two are independent events, they can be factored: | |
= P(non-zero at index i AND having at least one zero before index i) | |
= P(non-zero at index i) * P(having at least one zero before index i) | |
Since i is indexed starting at 0, there are i elements before index i and we can plug into above formulas: | |
= P(non-zero) * P(having at least one zero in i elements) | |
= P(non-zero) * (1 - P(having i non-zeros)) | |
= p * (1 - p**i) | |
Since each element can contribute at most 1 incorrect non-zero, | |
E[k_i] = (p * (1 - p**i)) * 1 | |
E[k_i] = p * (1 - p**i) | |
E[k_i] = p - p**(i + 1) | |
E[k] = E[k_0] + E[k_1] + ... E[k_n-1] | |
E[k] = (p - p**(0 + 1)) + (p - p**(1+1)) + ... (p - p**(n - 1 + 1)) | |
E[k] = (p - p**1) + (p - p**2) + ... (p - p**n) | |
--- | |
I took some time to analyze this with graphs and stuff, so I will just prove some assumptions: | |
p is a probability so it is in the range [0, 1] | |
Proposition: "If 0 <= p <= 0.5, then E[k] <= E[C]" | |
Lets prove by induction on n: | |
Proof: | |
Let P(n), where n is the number of elements in the array, be the statement that E[k] <= E[C] | |
Base Case: | |
Let's special case when n=0 since the forumla for E[k] above doesn't work for n=0 | |
When n=0, | |
E[C] = (1 - p) * 0 = 0, or you think of it as 0 because there are no zeros in an empty array | |
E[k] = 0, because there are no incorrect non-zeros in an empty array | |
So for n=0, E[k] <= E[C] | |
Let's start induction from n=1, | |
When n=1, | |
E[C] = (1 - p) * 1 = 1 - p | |
E[k] = (p - p**1) = 0 | |
So for n=1, E[k] <= E[C] | |
Inductive step: | |
Assume that P(x) is true for some arbitrary x >= 1 | |
We have to prove P(x + 1) is true. | |
P(x + 1) means | |
E[k_x+1 elements] < E[C_x+1 elements] | |
(p - p**1) + (p - p**2) + ... + (p - p**x) + (p - p**(x + 1)) <= (1 - p) * (x + 1) | |
(p - p**1) + (p - p**2) + ... + (p - p**x) + (p - p**(x + 1)) <= (1 - p) * x + (1 - p) | |
The first chunk of each side is just P(x), I will put them in square brackets: | |
[ (p - p**1) + (p - p**2) + ... + (p - p**x) ] + (p - p**(x + 1)) <= [ (1 - p) * x ] + (1 - p) | |
The part in the square brackets | |
[ (p - p**1) + (p - p**2) + ... + (p - p**x) ] <= [ (1 - p) * x ] | |
(p - p**1) + (p - p**2) + ... + (p - p**x) <= (1 - p) * x | |
E[k_x elements] <= E[C_x elements] | |
is just P(x), which we have assumed to be true. | |
How can we use P(x) to prove P(x + 1)? | |
We can use this fact: | |
If a <= b is true, | |
a + x <= b + y is also true if x <= y | |
In our case, | |
a = E[k_x elements] = (p - p**1) + (p - p**2) + ... + (p - p**x) = left terms inside square brackets | |
b = E[C_x elements] = (1 - p) * x = right terms inside square brackets | |
x = (p - p**(x + 1)) = left term outside square brackets | |
y = (1 - p) = right term outside square brackets | |
This leaves us with having to prove the following in order to prove P(x + 1). | |
Note that this is just the expected value of the next term added when adding | |
one more element to the array. | |
(p - p**(x + 1)) <= (1 - p) | |
p - p**(x + 1) <= 1 - p | |
2*p <= 1 + p**(x + 1) | |
Since 0 <= p <= 0.5, the greatest 2*p can be is 1, and 1 + p**(x + 1) is always greater than 1. | |
This means that the original x <= y inequality is true and P(x + 1) is true. | |
Thus, P(n) is true for n being any integer >= 0. | |
In other words: | |
"If 0 <= p <= 0.5 and n is an integer >= 0, then E[k] <= E[C]" | |
This means that for 0 <= p <= 0.5, Solution 3 is always better. | |
--- | |
How about 0.5 < p <= 1? | |
Assumptions: | |
1. n is very large. This is an okay assumption since we only care about | |
optimizing Solution 2/3 when n is large. | |
Proposition: "If 0.5 < p <= 1, then E[k] >= E[C]" | |
We will special case p=1. | |
E[C] = (1 - p) * n | |
= (1 - 1) * n | |
= (0) * n | |
= 0 | |
E[k] = (p - p**1) + (p - p**2) + ... (p - p**n) | |
= (1 - 1**1) + (1 - 1**2) + ... (1 - 1**n) | |
= (1 - 1) + (1 - 1) + ... (1 - 1) | |
= (0) + (0) + ... (0) | |
= 0 | |
When p=1, the array is filled with non-zeros. No work is done and E[k] = E[C], | |
but E[k] >= E[C] still holds true. | |
Now let's exclude p=1 and prove the rest where 0.5 < p < 1. | |
For this proposition, I will not do a formal proof. I will only analyze E[k] and E[C] | |
and show that E[k] grows faster than E[C] for large n. | |
Let's start with E[C] | |
E[C] = (1 - p) * n | |
Since 0.5 < p < 1, we know that 0 < (1 - p) < 0.5 | |
As n grows large, we keep adding more 0 < (1 - p) < 0.5 terms. | |
How about E[k]? | |
E[k] = (p - p**1) + (p - p**2) + ... (p - p**n) | |
Keep in mind 0.5 < p < 1. | |
For earlier terms such as (p - p**1) or (p - p**2), the terms will be pretty small, because | |
the exponent is small, so the subtraction is large. | |
However for later terms, and when n is large, (p - p**i) can be approximated as just (p), | |
since p**i ~= 0, where i is large. This is because 0.5 < p < 1, so each multiple of p decreases p**i. | |
When n is large, there will be many terms that will be approximated as just p. | |
As n grows large, we keep adding more approximate 0.5 < p < 1 terms. | |
Compare the statements above about E[C] and E[k]. | |
For large n, E[C] keeps adding 0 < (1 - p) < 0.5 terms | |
while E[k] keeps adding 0.5 < p < 1 terms | |
The added E[k] terms are larger than the E[C] terms. | |
This means the proposition holds true for large n, since E[k] grows faster than E[C] as you increase n. | |
Note that for low n, it's possible for the opposite to be true. E[C] might be | |
greater. You can graph it out on a calculator to see visually. But for low n, optimization is unnecessary. | |
This means for large n and 0.5 < p <= 1, Solution 2 is always better. | |
--- | |
So what have we learned? | |
Use Solution 3 for 0 <= p <= 0.5 | |
Use Solution 2 for 0.5 < p <= 1 (for large n) | |
--- | |
How do I apply this to the general case? | |
In the general case, there are a lot of unknowns. If there are some knowns, you | |
can use the above findings to determine the optimal algorithm. | |
For the general case, you are given a large n-size array. You don't know the p | |
value used to generate the array. You don't know the value of C or k. | |
It's not worth determining C or k because that would take O(n). | |
You can determine the approximation of p by sampling a few values and averaging | |
by how many non-zeros you see. | |
How many values should you sample? We can use the Chebyshev theorem: | |
See Chapter 8.1 of | |
https://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf | |
and 5a of | |
https://inst.eecs.berkeley.edu/~ee126/fa19/hw/hw5-sol.pdf | |
P(|X - u| >= E) <= V(X) / E**2 | |
Let X be a Bernoulli trial with X_i as the random variable for each coin flip. | |
There is a p chance for the coin to flip heads (non-zero element) | |
and a (1 - p) chance for tails (zero element) | |
S_n = X_0 + X_1 + ... X_n-1 | |
P(|S_n / n - p| >= E) <= V(S_n / n) / E**2 | |
P(|S_n / n - p| >= E) <= Var(S_n) / (n**2 * E**2) | |
P(|S_n / n - p| >= E) <= n * Var(X_i) / (n**2 * E**2) | |
P(|S_n / n - p| >= E) <= Var(X_i) / (n * E**2) | |
Var(X_i) = E[X_i**2] - E[X]**2 | |
= [ (1 - p) * 0**2 + p * 1**2 ] - [ (1 - p) * 0 + p * 1 ] ** 2 | |
= [ p * 1**2 ] - [ p * 1 ] ** 2 | |
= [ p ] - [ p ] ** 2 | |
= p - p**2 | |
= p * (1 - p) | |
P(|S_n / n - p| >= E) <= Var(X_i) / (n * E**2) | |
P(|S_n / n - p| >= E) <= p * (1 - p) / (n * E**2) | |
We set d = p * (1 - p) / (n * E ** 2) | |
n * E ** 2 = p * (1 - p) / d | |
n = p * (1 - p) / (d * E**2) | |
We can use this formula to determine a good value for n, the sample size. | |
E is the error threshold of estimated probability vs actual p: | |
|S_n / n - p| >= E | |
d is the probability of the estimated probability breaching the error threshold, E | |
P(|S_n / n - p| >= E) <= d | |
Formula: | |
n = p * (1 - p) / (d * E**2) | |
To be more concrete, we can maximize p * (1 - p) to be the maximum, which is when p = 0.5 | |
n = 0.5 * (0.5) / (d * E**2) | |
n = 1 / (4 * d * E**2) | |
we can set E = 0.1 and d = 0.1 | |
n = 1 / (4 * 0.1 * 0.1**2) | |
n = 250 elements to sample | |
For large n, 250 elements isn't much. You can adjust E and d depending how accurate you need to be. | |
Keep in mind, the cutoff to determining if you should use Solution 2 or 3 is at p = 0.5. | |
So your E should be small enough to distinguish the cutoff, but large enough so n isn't too large to sample. | |
Set d depending on how accurate you want your estimate to be | |
If n gets too large, you can also set a 10% array size limit to the sample size n. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
WOW, such a in-depth analysis of two approaches. Thanks !!