Skip to content

Instantly share code, notes, and snippets.

@arkiver
Created October 7, 2013 11:12
Show Gist options
  • Save arkiver/6866178 to your computer and use it in GitHub Desktop.
Save arkiver/6866178 to your computer and use it in GitHub Desktop.
Common stop words(/characters)
STOP_WORDS = ["the", "be", "and", "of", "a", "in", "to", "have", "to", "it", "I", "that", "for", "you", "he", "with", "on", "do", "say", "this", "they", "at", "but", "we", "his", "from", "that", "not", "n't", "n't", "by", "she", "or", "as", "what", "go", "their", "can", "who", "get", "if", "would", "her", "all", "my", "make", "about", "know", "will", "as", "up", "one", "time", "there", "year", "so", "think", "when", "which", "them", "some", "me", "people", "take", "out", "into", "just", "see", "him", "your", "come", "could", "now", "than", "like", "other", "how", "then", "its", "our", "two", "more", "these", "want", "way", "look", "first", "also", "new", "because", "day", "more", "use", "no", "man", "find", "here", "thing", "give", "many", "well", "only", "those", "tell", "one", "very", "her", "even", "back", "any", "good", "woman", "through", "us", "life", "child", "there", "work", "down", "may", "after", "should", "call", "world", "over", "school", "still", "try", "in", "as", "last", "ask", "need", "too", "feel", "three", "when", "state", "never", "become", "between", "high", "really", "something", "most", "another", "much", "family", "own", "out", "leave", "put", "old", "while", "mean", "on", "keep", "student", "why", "let", "great", "same", "big", "group", "begin", "seem", "country", "help", "talk", "where", "turn", "problem", "every", "start", "hand", "might", "American", "show", "part", "about", "against", "place", "over", "such", "again", "few", "case", "most", "week", "company", "where", "system", "each", "right", "program", "hear", "so", "question", "during", "work", "play", "government", "run", "small", "number", "off", "always", "move", "like", "night", "live", "Mr", "point", "believe", "hold", "today", "bring", "happen", "next", "without", "before", "large", "all", "using", "best", "tags", "web", "development", "news", "trends", "applications", "practices", "tips", "tricks", "tag", "vs", "once", "handier", "easy", 'front', 'rear', 'end', 'design', 'myth', 'buster', 'activity', 'cool', 'ideas', 'study', 'blog', 'picture', 'is', 'an', 'included', 'process']
PUNCTUATION_MARKS = %w(; , : " > < | \ - + = _ ! @ # $ % ^ & * ( ) ~ ` ' { } [ ] . ? / -) rescue []
NUM = (0..9).step(1).to_a.map{|n| n.to_s} rescue []
ALPHA = 'a'.upto('z').to_a rescue []
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment