Created
September 5, 2022 05:04
-
-
Save victormurcia/5fc5653ea06fb119a61e083c8046f248 to your computer and use it in GitHub Desktop.
routine that makes chapters out of books from project gutemberg
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| book_id = '\\174.txt' #What book are we processing? | |
| book_name = book_dir + book_id #Location of book | |
| #Open the book | |
| book = open(book_name, "r", encoding="utf8") | |
| #Assign the book a name as string | |
| book = str(book.read()) | |
| #Use regex to split the book into chapters by finding instances of the word CHAPTER | |
| chapters = re.split("CHAPTER ", book) | |
| #Remove first 21 CHAPTER instances since they are just fluff | |
| del(chapters[0:20]) | |
| #Loops for the number of chapters in the book, starting at chapter 1 | |
| for i in range(1, len(chapters)+1): | |
| writeBook = open("{}.txt".format(i), "w+", encoding="utf8") #Make a new book | |
| writeBook.write(chapters[i-1]) #Write on book with current chapter content | |
| writeBook.close() #Closes the book |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment