Last active
April 6, 2016 17:48
-
-
Save krishnanraman/6346307 to your computer and use it in GitHub Desktop.
Trivia: Generate primes between 1 and 100 million with Scalding Map-reduce ( well, just Map :) thanks @squarecog
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <stdio.h> | |
#include <stdlib.h> | |
int main(int argc, char* argv) { | |
FILE * fp; | |
int i = 0; | |
fp = fopen ("numbers", "w+"); | |
for(i = 1;i<100000000;i++) { | |
fprintf(fp, "%d\n", i); | |
} | |
fclose(fp); | |
return(0); | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import com.twitter.scalding._ | |
class PrimesMapReduce(args:Args) extends Job(args) { | |
val input = Tsv("numbers", 'number) | |
val output = Tsv("primes") | |
// sixTest : Every prime >3 can be expressed as 6n-1 or 6n+1. Handle the primes 2 or 3 separately | |
def isPrime(x:Int) = { | |
x match { | |
case 1 => false | |
case twoOrThree if (twoOrThree ==2 || twoOrThree == 3) => true | |
case sixTest if ((sixTest+1)%6 == 0 || (sixTest-1)%6 == 0) => (3 to math.sqrt(sixTest).toInt by 2).filter(d => sixTest%d==0).size == 0 | |
case _ => false | |
} | |
} | |
input | |
.filter('number) { | |
x:Int => isPrime(x) | |
} | |
.write(output) | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Trivia problem: Generate primes between 1 and 100 million with Scalding | |
Input: Numbers from 1 to 100 million, 1 per line - use a simple C program for this | |
Output: Filter input for primes using the filter function on pipes in Scalding | |
Execution time: 13:11:00 to 13:25:54 = 15 minutes in hdfs-local mode | |
Part files: 27 | |
$ ls | |
part-00000 part-00002 part-00004 part-00006 part-00008 part-00010 part-00012 part-00014 part-00016 part-00018 part-00020 part-00022 part-00024 part-00026 part-00001 part-00003 part-00005 part-00007 part-00009 part-00011 part-00013 part-00015 part-00017 part-00019 part-00021 part-00023 part-00025 | |
$ tail part-00026 | |
99999787 | |
99999821 | |
99999827 | |
99999839 | |
99999847 | |
99999931 | |
99999941 | |
99999959 | |
99999971 | |
99999989 | |
$ more part-00000 | |
2 | |
3 | |
5 | |
7 | |
11 | |
13 | |
17 | |
19 | |
23 | |
29 | |
31 | |
Verify: Is 99999989 a prime ? http://primes.utm.edu/curios/page.php/99999989.html | |
How many primes between 1 and 100 million ? | |
$ wc -l part* | |
305004 part-00000 | |
267880 part-00001 | |
240917 part-00002 | |
226206 part-00003 | |
223190 part-00004 | |
220649 part-00005 | |
218586 part-00006 | |
217030 part-00007 | |
215116 part-00008 | |
214170 part-00009 | |
212899 part-00010 | |
211929 part-00011 | |
210716 part-00012 | |
209982 part-00013 | |
209057 part-00014 | |
208377 part-00015 | |
207515 part-00016 | |
207115 part-00017 | |
206321 part-00018 | |
205839 part-00019 | |
204994 part-00020 | |
204775 part-00021 | |
204086 part-00022 | |
203646 part-00023 | |
203174 part-00024 | |
202923 part-00025 | |
99359 part-00026 | |
5761455 total | |
Only 5.76 million primes between 1 to 100 million. ( you could compute this with a Reduce step - Exercise for the motivated student with too much time on his hands ) |
wilf - No, 121 will not be on my list, nor will any other 6n-1 composite. Notice I use trial division upon all 6n-1 composites ( (3 to math.sqrt(sixTest).toInt by 2).filter(d => sixTest%d==0).size == 0 )
Only the primes will survive that.
In fact, lets look at the part file.
$ more part-00000
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97
101
103
107
109
113
127
131
137
139
No 121.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You're aware, of course, that although this generates all of the primes between 1 and 100 million, it also generates lots of non-primes? For example, 121 (= 11*11) is not prime, but will be on your list ((121-1)%6 == 0). Perhaps you should clarify?