Created
          January 18, 2012 00:50 
        
      - 
      
- 
        Save dnch/1630080 to your computer and use it in GitHub Desktop. 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | # Given a big (~25MB) text file, I want to extract | |
| # and process lines that are prefixed with a certain | |
| # string. In the actual case, known_prefixes is | |
| # approximately 50 elements long with potential for growth | |
| known_prefixes = %w(aa ab ax fy fx) | |
| regex = "^#{known_prefixes.join("|")} " | |
| IO.foreach(big_ass_text_file).each |row| | |
| if row =~ regex | |
| process | |
| end | |
| end | 
I guess the point I'm trying to illustrate is that passing a block into String#scan might be faster than if =~ - it's worth testing.
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
            
IO.foreach streams through the file, which is exactly how I want it—the bits of information I'm interested in are located in only a short section within the first 5-8% of the text file (there's some additional checking / breaking logic in the actual problem that I haven't included).
I guess compiling the regexp is my best option at this point—nice tip on the Regexp.quote, though—that would have killed me later.