Created
May 5, 2013 17:10
-
-
Save davidtheclark/5521432 to your computer and use it in GitHub Desktop.
Convert dumb quotes to smart quotes in Python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def dumb_to_smart_quotes(string): | |
"""Takes a string and returns it with dumb quotes, single and double, | |
replaced by smart quotes. Accounts for the possibility of HTML tags | |
within the string.""" | |
# Find dumb double quotes coming directly after letters or punctuation, | |
# and replace them with right double quotes. | |
string = re.sub(r'([a-zA-Z0-9.,?!;:\'\"])"', r'\1”', string) | |
# Find any remaining dumb double quotes and replace them with | |
# left double quotes. | |
string = string.replace('"', '“') | |
# Reverse: Find any SMART quotes that have been (mistakenly) placed around HTML | |
# attributes (following =) and replace them with dumb quotes. | |
string = re.sub(r'=“(.*?)”', r'="\1"', string) | |
# Follow the same process with dumb/smart single quotes | |
string = re.sub(r"([a-zA-Z0-9.,?!;:\"\'])'", r'\1’', string) | |
string = string.replace("'", '‘') | |
string = re.sub(r'=‘(.*?)’', r"='\1'", string) | |
return string |
Oops. If there is a slash at the end of a URL, the quotes get mixed up within the HTML tag.
<a href="http://url.com/" title="something">text</a>
becomes
<a href="http://url.com/“ title=“something">text</a>
Adding a slash after the colon in the search pattern in line 8 and 16 r'([a-zA-Z0-9.,?!;:/\'\"])"'
seems to work. (But it probably breaks something else.)
I wrote a method to do the opposite, ignoring anything in an HTML tag: https://gist.github.com/dmdeluca/9cffec2edad3d9282dea534692f5b702
Might not be perfect, but it works.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I think this gist and filename are incorrectly named. Could you please change it to read "Convert smart quotes to dumb quotes" @davidtheclark?