Skip to content

Instantly share code, notes, and snippets.

@TravisL12
Created November 12, 2013 07:00
Show Gist options
  • Select an option

  • Save TravisL12/7426699 to your computer and use it in GitHub Desktop.

Select an option

Save TravisL12/7426699 to your computer and use it in GitHub Desktop.
Used this to cut up every word in a CSV to count any common places where I made purchases. Such as a certain gas station or how many times at an ATM.
require 'csv'
class String
def titleize
split(/(\W)/).map(&:capitalize).join
end
end
def separate_word(word)
word.gsub(/[0-9\-\/\\\*\#()&'.]/,"").titleize.split(" ")
end
def sort_word_limits(word, min, length)
word.select { |k,v| v >= min && k.length >= length }.sort_by { |k,v| v }.reverse
end
def word_size(word)
word.keys.map { |n| n.length }.max
end
file_name = ARGV[0]
inputfile = CSV.open(file_name).readlines.flatten.map { |row| separate_word(row)}
min_word_length = 4
min_count = 15
word_count = {}
inputfile.each do |items|
items.each { |item| word_count.has_key?(item) ? word_count[item] += 1 : word_count[item] = 1 }
end
big_word = word_size(word_count)
final_count = sort_word_limits(word_count, min_count, min_word_length)
final_count.each { |k,v| print k.rjust(big_word) + " " + v.to_s; puts}
######### Example Input from CSV########
# Tremors Riverside Ca
# Shell Oil 61635192007 Cabazon Ca
# 2995 Iowa Ave. Stater 114Riverside Ca 3319
# ATM Withdrawal - 01/19 3060046 5797 North Victglobal Cashighland Ca 3319
# Non-Wells Fargo ATM Transaction Fee
# UCR Bookstore Riverside Ca
# Circle Bar Santamonica Ca
# ATM Withdrawal - 01/25 Scad4057 *Corona-01 B Of A Corona Ca 3319
# 1294 Universityarco Payporiverside Ca 3319
# Non-Wells Fargo ATM Transaction Fee
# Exxonmobil34 07865660 Riversid Ca
# 2650 N Main St L And L Mariverside Ca 3319
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment