Created
August 13, 2015 17:00
-
-
Save pietrop/006fb642e6a52903cfae to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
=begin | |
@date 23 July 2015 | |
@author: [email protected] | |
Given a CSV with two coulmns that have duplicate fields, | |
this script compares the two lists and returns a list of the duplicates, | |
and prints it out as text file. | |
run it as `ruby csv_2_list.rb name_of_the_csv_file_.csv` | |
=end | |
require 'csv' | |
filename = ARGV.first | |
#csv_file = CSV.read(filename) | |
# 3 arrays, one for the first list, one for the second and one for the duplicates | |
list_one = [] | |
list_two =[] | |
duplicates =[] | |
# iterate through the csv elements to put colum one in list_one and column two of the csv in list_two array | |
CSV.foreach(filename) do |r| | |
list_one << r[0] | |
if r[1] != nil | |
list_two << r[1] | |
end | |
end | |
puts "### Identifiying Duplicates ###" | |
# using built in method & on two arrays we can create a new array that only contains the duplciates | |
# this mehtod is highly optimised and can handle very long lists. | |
duplicates = list_one & list_two | |
# outputing duplicates to screen | |
puts duplicates.size | |
# Writing duplicates to file, one per line | |
File.open("duplicates.txt", 'w') do |file| | |
# looping through duplicates array | |
duplicates.each do |d| | |
# writing duplicate item to file | |
file.write(d) | |
# adding a new line before the next one | |
file.write("\n") | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment