Skip to content

Instantly share code, notes, and snippets.

@resultakak
Forked from ericrasch/RegEx Snippets.md
Created November 21, 2016 14:06
Show Gist options
  • Save resultakak/02629d22def4c311c5f7c267ddd5318e to your computer and use it in GitHub Desktop.
Save resultakak/02629d22def4c311c5f7c267ddd5318e to your computer and use it in GitHub Desktop.
RegEx Snippets

RegEx Snippets


Extract URLs

Find all links

Works pretty well in capturing the full URL when using this in a search (like in Sublime Text 2). (https?|ftps?)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/?

Find all links within a specific subfolder structure and replace with alt subfolders

The following will capture the URL in a SQL dump including escaped quotation marks.

  • URL pattern: http://www.yourwebsite.org/calculator/degrees/sociology
  • RegEx: (https?|ftps?)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/calculator/degrees/([^\\"]+)
  • Replacement: http://www.yourwebsite.org/degrees/$3/
  • Result: http://www.yourwebsite.org/degrees/sociology/

Find all links without trailing slashes

This will find all hrefs that do not contain a trailing slash. NOTE: it will detect links in your <head> and ones that end in .html that purposfully do not have a trailing slash, so becareful performing a find/replace.

  • URL pattern: href="http://www.yourwebsite.org/calculator/degrees/financial"
  • RegEx: href="(\S)+[^/]"

Find all id="*" attributes

This will find all id tags with either ' or ".

  • RegEx: id=("|')[^("|')]*("|')

Find all <a href=""></a> anchor tags

  • RegEx: <(?:\s?)[aA].*?href=[\'\"](?<link>.*?)[\'\"].*?>(?<text>.*)<(?:\s?)\/(?:\s?)[aA](?:\s?)>

Replaces Subdomain URLs

Convert from http://old.yourwebsite.org/whatever/ to http://new.yourwebsite.com/whatever/

EXPLAINED:

  • since the pattern will contain literal forward slashes for the url (eg "schema://domain/path"), we're delimiting the path with pipe chars to avoid having to backslash-escape each forward slash
  • just in case we have a mix of http & https urls, we'll match both with "https?" which means match "http" followed by one or zero "s" chars
  • we're backslash-escaping the dots in the domain, since in regex syntax an unescaped dot normally means "any single character other than a newline"
  • we're capturing everything before and after the "request" in "requestinfo" with parentheses in the pattern, then joining them together in the replacement using backreferences
  • we're making the entire pattern match as case insensitive by adding an "i" flag after the closing pattern delimiter

preg_replace('|(https?://)old(new\.yourwebsite\.)org|i', '$1$2com', $content);


Rewrite subfolder with keywords

#RewriteRule ^calculator/degrees(?:/([\w-]+?)(?:-in.+)?)?/?$ /degrees/$1/ [L,R=301]


Misc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment