Created
January 21, 2020 21:05
-
-
Save tanvirstreame/2e6f492e907212b20a5112a29b6caf41 to your computer and use it in GitHub Desktop.
Read text from docx any language via python
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import docxpy | |
| import codecs | |
| # read Input file : Input.docx | |
| file = 'text.docx' | |
| # extract text from file | |
| text = docxpy.process(file) | |
| print(text) | |
| # save the extracted text to a text file | |
| output_txt = codecs.open('Input.txt','w','utf-8') | |
| output_txt.write(text) | |
| output_txt.close() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment