Created
December 10, 2019 16:20
-
-
Save phpdude/1ae6f19de213d66286c8183e9e3b9ec1 to your computer and use it in GitHub Desktop.
Efficent way to remove docstrings in python source code
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import ast | |
import astor # read more at https://astor.readthedocs.io/en/latest/ | |
parsed = ast.parse(open('source.py').read()) | |
for node in ast.walk(parsed): | |
# let's work only on functions & classes definitions | |
if not isinstance(node, (ast.FunctionDef, ast.ClassDef, ast.AsyncFunctionDef)): | |
continue | |
if not len(node.body): | |
continue | |
if not isinstance(node.body[0], ast.Expr): | |
continue | |
if not hasattr(node.body[0], 'value') or not isinstance(node.body[0].value, ast.Str): | |
continue | |
# Uncomment lines below if you want print what and where we are removing | |
# print(node) | |
# print(node.body[0].value.s) | |
node.body = node.body[1:] | |
print('***** Processed source code output ******\n=========================================') | |
print(astor.to_source(parsed)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> python clean.py | |
***** Processed source code output ****** | |
========================================= | |
""" | |
Mycopyright (c) | |
""" | |
from abc import d | |
class MyClass(MotherClass): | |
def __init__(self, my_param): | |
self.my_param = my_param | |
def test_fctn(): | |
def _wrapped(omg): | |
pass | |
return True | |
def test_fctn(): | |
some_string = """ | |
Some Docstring | |
""" | |
return some_string |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Mycopyright (c) | |
""" | |
from abc import d | |
class MyClass(MotherClass): | |
""" | |
Some; | |
Multi- | |
Line Docstring: | |
""" | |
def __init__(self, my_param): | |
"""Docstring""" | |
self.my_param = my_param | |
def test_fctn(): | |
""" | |
Some Docstring | |
""" | |
def _wrapped(omg): | |
"some extra docstring" | |
pass | |
return True | |
def test_fctn(): | |
some_string = """ | |
Some Docstring | |
""" | |
return some_string |
incase of method/class with only docstring as body
add this two lines of code to make sure the code is valid
...
node.body = node.body[1:]
#add "pass" statement here
if len(node.body)<1:
node.body.append(ast.Pass())
I need to remove a module level docstrings as well, this was easy to do:
if not isinstance(node, (ast.FunctionDef, ast.ClassDef, ast.AsyncFunctionDef)):
becomes
if not isinstance(node, (ast.FunctionDef, ast.ClassDef, ast.AsyncFunctionDef, ast.Module)):
Is it possible to modify the code above, so that only docstrings should be removed, but no comments?
Or only comments starting with '# TODO' ? Need to clean my code data for a code to text model..
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Related to https://stackoverflow.com/questions/59270042/efficent-way-to-remove-docstring-with-regex/59271643