Created
September 28, 2013 00:10
-
-
Save dwgill/6736928 to your computer and use it in GitHub Desktop.
This is an attempt at a regular expression that will hopefully match the headers and footers that are included at the beginning of txt formatted ebooks hosted on Project Gutenberg (http://www.gutenberg.org/). The intent is that any expressions matching this regex might be safely replaced with an empty string before computationally processing the…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
((\b(Project Gutenberg's)|(The Project Gutenberg EBook))[\s\S]+?\*{3}[\s\S]+?\*{3}(\s+?[pP]roduced by.+))|(([/bEe][nD][dD] of.+?[Pp]roject [Gg]utenberg[\s\S]+?)?\*{3}\s[eE][nN][dD].+?\*{3}[\s\S]*) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment