Skip to content

Instantly share code, notes, and snippets.

@jt
Created August 22, 2011 18:41
Show Gist options
  • Save jt/1163142 to your computer and use it in GitHub Desktop.
Save jt/1163142 to your computer and use it in GitHub Desktop.
Regular expression notes
- any character you use it will literally match it except special characters
^ $ ? . / \ [ ] { } ( ) + * - all the special characters that will need escaping if you don't want them to be special
// - regexp ruby class
Common Patterns (I authored)
/[\w+\.~-]+@[\w~-]+.[\w\.]+/ - match emails, conforms to RFC 3986 section 3.3
/\+?(\d)?[-|\.|\s]?\(?(\d{3})\)?[-|\.|\s]?(\d{3})[-|\.|\s]?(\d{4})/ - match phone numbers, https://gist.github.com/1009331
Strategies
foo(?!.*foo) - negative lookahead, find the foo that does not have a foo following it. use to find the last match in a string.
Anchors
^ - start of line
\A - start of string
$ - end of line
\Z - end of string
\b - any any word boundary character
\B - any non word boundary
\< - start of word
\> - end of word
/^apple/.match 'pear apple' # no match, ^ looks for apple at beginning of string with not whitespace before it
\A - longhand for ^, /\Aapple/ same as /^apple/
/apple$/.match 'apple pear' # no match, $ looks for apple at end of string with not whitespace after it
\Z - longhand for $, /apple\Z/ same as /apple$/
Character Classes
[] - character class, a quasi-wildcard, matches only characters specified
[abc] - match a single character, a, b, or c
[^abc] - match a single character except for a, b, or c
[a-zA-Z] - match single character in the range a-z or A-Z
. - any character
\c - control character
\s - any whitespace character
\S - any non-whitespace character
\d - any digit, shorthand for [0-9]
\D - any non-digit
\w - any word character, shorthand for [0-9a-fA-F_]
\W - any non-word character
\xhh - hexadecimal char hh @expand
\X - ??
\Oxxx - octal char xxx @expand
Also Note: Any special characters within a character class become literal characters unless
they are escaped (e.g. [.] matches a period versus [\.] which is any character)
Quantifiers
a? - nothing or a, ? marks previous character as optional
a* - nothing or more of a
a+ - one or more of a
a{3} - exactly 3 of a
a{3,} - 3 or more of a
a{3,6} - 3 to 6 of a
Ranges
(a|b) - a or b
(...) - contents are captured
(:?...) - passive group. gain the benefits of using parens but without having to capture its match.
\n - nth group/subpattern
Ruby Matching
- there are 2 components to a ruby regexp, the pattern and the modifers. modifers are optional, example
/something/i # something is the pattern, i is the modifier
- every match operation either succeeds or fails, if no match it will always be nil
"an interesting ruby string".match(/ruby/) # returns a matchdata class
"test this".match(/banana/) # returns nil
/ruby/.match("an interesting ruby string") # returns a matchdata class
/ruby/.match("an interesting ruby string") # returns a matchdata class
"test this" =~ /this/ # returns 5, the beginning location of the match
/this/ =~ "test this" # '' ''
- class MatchData has a boolean value of true making it useful for logic operations
- class MatchData also stores information about the match
"before after before".scan(/before/) - returns an array of all matches, if the pattern contains captures, you'll get an array of arrays
"before after before".split(/before/) - returns an array of everything except the matches
MatchData, example methods:
match = /ejected/.match 'ejected'
match.string # ejected, the string we matched agains
match[0] # the entire part of the string matched
match[1] # first match
match[2] # second match
match.captures[0] # first match
match.captures[1] # second match
Modifiers
/i - case insensitive
/m - makes wildcard, . , match newlines
/x - ignore whitespace in pattern
/o - perform #{...} substitutions only once
/s - treat string as single line
/[rd]ejected/imxo - chain multiple modifiers
Substitution
"after it all".gsub(/after/, "before") # "before it all"
"after it all".gsub(/after/, "before \\0") # before after it all, reinsert the first capture. increment for additional
Special Chars
\ - escape char
\n - newline
\r - carriage return
\t - tab
\v - vertical tab
\f - form feed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment