Created
August 22, 2011 18:41
-
-
Save jt/1163142 to your computer and use it in GitHub Desktop.
Regular expression notes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- any character you use it will literally match it except special characters | |
^ $ ? . / \ [ ] { } ( ) + * - all the special characters that will need escaping if you don't want them to be special | |
// - regexp ruby class | |
Common Patterns (I authored) | |
/[\w+\.~-]+@[\w~-]+.[\w\.]+/ - match emails, conforms to RFC 3986 section 3.3 | |
/\+?(\d)?[-|\.|\s]?\(?(\d{3})\)?[-|\.|\s]?(\d{3})[-|\.|\s]?(\d{4})/ - match phone numbers, https://gist.github.com/1009331 | |
Strategies | |
foo(?!.*foo) - negative lookahead, find the foo that does not have a foo following it. use to find the last match in a string. | |
Anchors | |
^ - start of line | |
\A - start of string | |
$ - end of line | |
\Z - end of string | |
\b - any any word boundary character | |
\B - any non word boundary | |
\< - start of word | |
\> - end of word | |
/^apple/.match 'pear apple' # no match, ^ looks for apple at beginning of string with not whitespace before it | |
\A - longhand for ^, /\Aapple/ same as /^apple/ | |
/apple$/.match 'apple pear' # no match, $ looks for apple at end of string with not whitespace after it | |
\Z - longhand for $, /apple\Z/ same as /apple$/ | |
Character Classes | |
[] - character class, a quasi-wildcard, matches only characters specified | |
[abc] - match a single character, a, b, or c | |
[^abc] - match a single character except for a, b, or c | |
[a-zA-Z] - match single character in the range a-z or A-Z | |
. - any character | |
\c - control character | |
\s - any whitespace character | |
\S - any non-whitespace character | |
\d - any digit, shorthand for [0-9] | |
\D - any non-digit | |
\w - any word character, shorthand for [0-9a-fA-F_] | |
\W - any non-word character | |
\xhh - hexadecimal char hh @expand | |
\X - ?? | |
\Oxxx - octal char xxx @expand | |
Also Note: Any special characters within a character class become literal characters unless | |
they are escaped (e.g. [.] matches a period versus [\.] which is any character) | |
Quantifiers | |
a? - nothing or a, ? marks previous character as optional | |
a* - nothing or more of a | |
a+ - one or more of a | |
a{3} - exactly 3 of a | |
a{3,} - 3 or more of a | |
a{3,6} - 3 to 6 of a | |
Ranges | |
(a|b) - a or b | |
(...) - contents are captured | |
(:?...) - passive group. gain the benefits of using parens but without having to capture its match. | |
\n - nth group/subpattern | |
Ruby Matching | |
- there are 2 components to a ruby regexp, the pattern and the modifers. modifers are optional, example | |
/something/i # something is the pattern, i is the modifier | |
- every match operation either succeeds or fails, if no match it will always be nil | |
"an interesting ruby string".match(/ruby/) # returns a matchdata class | |
"test this".match(/banana/) # returns nil | |
/ruby/.match("an interesting ruby string") # returns a matchdata class | |
/ruby/.match("an interesting ruby string") # returns a matchdata class | |
"test this" =~ /this/ # returns 5, the beginning location of the match | |
/this/ =~ "test this" # '' '' | |
- class MatchData has a boolean value of true making it useful for logic operations | |
- class MatchData also stores information about the match | |
"before after before".scan(/before/) - returns an array of all matches, if the pattern contains captures, you'll get an array of arrays | |
"before after before".split(/before/) - returns an array of everything except the matches | |
MatchData, example methods: | |
match = /ejected/.match 'ejected' | |
match.string # ejected, the string we matched agains | |
match[0] # the entire part of the string matched | |
match[1] # first match | |
match[2] # second match | |
match.captures[0] # first match | |
match.captures[1] # second match | |
Modifiers | |
/i - case insensitive | |
/m - makes wildcard, . , match newlines | |
/x - ignore whitespace in pattern | |
/o - perform #{...} substitutions only once | |
/s - treat string as single line | |
/[rd]ejected/imxo - chain multiple modifiers | |
Substitution | |
"after it all".gsub(/after/, "before") # "before it all" | |
"after it all".gsub(/after/, "before \\0") # before after it all, reinsert the first capture. increment for additional | |
Special Chars | |
\ - escape char | |
\n - newline | |
\r - carriage return | |
\t - tab | |
\v - vertical tab | |
\f - form feed |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment