Created
December 28, 2011 18:31
-
-
Save phaniram/1529047 to your computer and use it in GitHub Desktop.
Gild.com puzzle - Topfive
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import java.io.File; | |
import java.io.FileNotFoundException; | |
import java.io.FileReader; | |
import java.util.Comparator; | |
import java.util.HashMap; | |
import java.util.Map; | |
import java.util.Scanner; | |
import java.util.TreeMap; | |
class Topfive { | |
private HashMap<String, Integer> hashmap = new HashMap<String, Integer>(); | |
public HashMap<String, Integer> getHashmap() { | |
return hashmap; | |
} | |
public void putWord(String main) { | |
Integer frequency = getHashmap().get(main); | |
if (frequency == null) { | |
frequency = 0; | |
} | |
hashmap.put(main, frequency + 1); | |
} | |
public TreeMap<String, Integer> process(File fFile) | |
throws FileNotFoundException { | |
Scanner scanner = new Scanner(new FileReader(fFile)); | |
while (scanner.hasNextLine()) { | |
String wordp = scanner.nextLine(); | |
int j = wordp.indexOf("query=") + 6; | |
int k = wordp.length() - 1; | |
String fut = wordp.substring(j, k).trim(); | |
this.putWord(fut); | |
} | |
scanner.close(); | |
ValueComparator bvc = new ValueComparator(getHashmap()); | |
TreeMap<String, Integer> sorted_map = new TreeMap<String, Integer>(bvc); | |
sorted_map.putAll(getHashmap()); | |
return sorted_map; | |
} | |
public static void main(String[] args) { | |
if (args.length == 0) { | |
File fFile = new File("D:\\a.in"); | |
Topfive topfive = new Topfive(); | |
try { | |
TreeMap<String, Integer> sorted_map = topfive.process(fFile); | |
int count = 0; | |
for (String key : sorted_map.keySet()) { | |
System.out.println(key); | |
count++; | |
if (count >= 5) { | |
break; | |
} | |
} | |
} catch (FileNotFoundException e) { | |
e.printStackTrace(); | |
} | |
} | |
} | |
} | |
class ValueComparator implements Comparator<Object> { | |
private Map<String, Integer> base; | |
public ValueComparator(Map<String, Integer> base) { | |
this.base = base; | |
} | |
@Override | |
public int compare(Object first_obj, Object second_obj) { | |
int ret = -1; | |
if (base.get(first_obj) < base.get(second_obj)) { | |
ret = 1; | |
} | |
return ret; | |
} | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Puzzle Description
You are working for a state of the art search engine. Today you are attending a (very long and boring) meeting with the head of marketing, Tom, your boss Alice, and the head of sysadmin, Bob. You are almost sleeping when something interrupts your nap.
Tom: "...and this is how we will render the top five queries searched by our users"
Alice: "It seems so easy! Can you complete it by this evening?"
Silence. Then you wake up and realize they are talking to you!
You: "Well ehm ehy ehrr...Can you repeat that please? I have just missed the last part of ...what?"
Tom: "Sure! Every day users type any kind of queries in our search engine. We want to find the top five queries entered by our users every day and print them on the website"
Alice: "And they are printed in red, in a small square and yadda yadda bla bla bla in Times New Roman and yadda yadda ..."
You: "Ok got it. Bob, do we have this info logged anywhere?"
Bob: "Yea! Every search is logged in a giant file. Each log line has the date and time of the search, the IP of the user and the query they have made."
Alice: "Super! You have everything you need! Can you write this program by this evening?"
Write a program that takes a single argument on the command line. This argument is the name of the log file you have to open and parse. Your program should then find the top five queries (the 5 queries typed most frequently, ordered by frequency) and print them one after the other, separated by a newline.
As a brilliant engineer, you are also aware that the log file can be several Terabyte long, so your program must be very fast but efficient managing resources (memory, harddrive, etc.).
Input Specifications
The log file is a (very, very long) sequence of lines. Each line is separated by a newline ("\n"). Each line starts with the date and time of the log, followed by the string "new query". After that, the IP of the user and the query string are logged. The exact format of each line is:
[Date and Time] new query: [ip=x.x.x.x, query=a string representing the query]
You can assume a case sensitive comparison when comparing two queries (vMs is different from VMS).
An example of an input file is:
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=196.33.110.95, query=picturesque liar]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=251.185.82.103, query=fight nor fly]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=51.219.214.69, query=24 hours]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=42.101.132.104, query=love life]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=98.90.106.214, query=wrinkled]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=79.100.211.163, query=a million dollars]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=105.121.152.64, query=love life]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=17.191.160.203, query=even plan]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=116.210.228.97, query=you attempt things]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=83.93.145.66, query=a million dollars]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=122.45.71.88, query=many hardships]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=136.156.220.145, query=many hardships]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=66.202.6.18, query=married]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=22.215.0.91, query=head]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=170.25.57.90, query=many hardships]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=226.74.206.246, query=this year]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=41.44.186.38, query=secret shame]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=39.78.225.119, query=present]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=247.42.239.229, query=apple pie]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=91.244.8.38, query=many hardships]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=59.78.46.82, query=elephant]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=206.120.101.237, query=careful]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=182.159.177.152, query=peace]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=154.173.3.162, query=many hardships]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=126.32.5.196, query=apple pie]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=50.19.240.85, query=good afternoon]
[Fri Jan 07 17:46:30 CET 2011] new query: [ip=94.120.161.97, query=seven levels]
Output Specifications
Your program should find the top five queries entered by the users and print them all, separated by a newline.
To give you an example of the output printed by your program:
love life
many hardships
artificial intelligence
apple pie
a million dollars