Skip to content

Instantly share code, notes, and snippets.

@dnch
Created January 18, 2012 00:50
Show Gist options
  • Save dnch/1630080 to your computer and use it in GitHub Desktop.
Save dnch/1630080 to your computer and use it in GitHub Desktop.
# Given a big (~25MB) text file, I want to extract
# and process lines that are prefixed with a certain
# string. In the actual case, known_prefixes is
# approximately 50 elements long with potential for growth
known_prefixes = %w(aa ab ax fy fx)
regex = "^#{known_prefixes.join("|")} "
IO.foreach(big_ass_text_file).each |row|
if row =~ regex
process
end
end
@dnch
Copy link
Author

dnch commented Jan 18, 2012

IO.foreach streams through the file, which is exactly how I want it—the bits of information I'm interested in are located in only a short section within the first 5-8% of the text file (there's some additional checking / breaking logic in the actual problem that I haven't included).

I guess compiling the regexp is my best option at this point—nice tip on the Regexp.quote, though—that would have killed me later.

@jimsynz
Copy link

jimsynz commented Jan 18, 2012

I guess the point I'm trying to illustrate is that passing a block into String#scan might be faster than if =~ - it's worth testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment