Created
March 31, 2024 20:12
-
-
Save KiaraGrouwstra/670ff90bfafc6a4ed337478c6e6bb5f2 to your computer and use it in GitHub Desktop.
docx to markdown
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
intent | replace regex | with | |
---|---|---|---|
debold headers | # +\*\*(.*)\*\* | # $1 | |
de-underline links | \[<u>(.*)</u>\] | [$1] | |
de-underline links | \[<u>([\n\s\S]*?)</u>\] | [$1] | |
remove empty comments | <!-- -→\n+ | ||
ditch quote blocks | ^(\s*)> ? | $1 | |
upgrade bolded lines to h2 | (\[?)\*\*(.*)\*\* | ## $1$2 | |
normalize quotes | ‘ | ' | |
normalize quotes | ’ | ' | |
normalize quotes | “ | “ | |
normalize quotes | ” | “ | |
clear trailing whitespace | +$ | ||
ditch empty list items | \d\.$\n+ | ||
de-hardcode list item numbers | ^\d\. | 1. | |
clean out empty headers | ^#+\s+\n+ | ||
separate bold markers out to their own lines | ^\*\* | <strong>\n | |
separate bold markers out to their own lines | \*\*$ | \n</strong> | |
infer line breaks from punctuation | ^(1\. [^#\d].*?[\.;,\?:]) | $1\n | |
infer line breaks from punctuation | ^(\s*)([^#\d].*?[\.;,\?:]) | $1$2\n$1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment