Last active
April 5, 2021 12:04
-
-
Save arthurattwell/ea6fa1764f989398f659ab619b654e1f to your computer and use it in GitHub Desktop.
Batch file to convert HTML files to Word docx with Pandoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
:: This batch file converts HTML files in a folder to docx. | |
:: It requires Pandoc, and a list of files to convert | |
:: named file-list, in which each file is on a separate line, | |
:: and contains no spaces in the filename. | |
:: | |
:: Don't show these commands to the user | |
@ECHO off | |
:: Set the title of the window | |
TITLE Convert html to docx | |
:: This thing that's necessary. | |
Setlocal enabledelayedexpansion | |
:: What're we doing? | |
ECHO Converting to .docx... | |
:: Loop through the list of files in file-list | |
:: and convert them each from .html to .docx. | |
:: We end up with the same filenames, | |
:: with .docx extensions appended. | |
FOR /F "tokens=*" %%F IN (file-list) DO ( | |
pandoc %%F -f html -t docx -s -o %%F.docx | |
) | |
:: What are we doing next? | |
ECHO Fixing file extensions... | |
:: What are we finding and replacing? | |
SET find=.html | |
SET replace= | |
:: Loop through all .docx files and remove the .html | |
:: from those filenames pandoc created. | |
FOR %%# in (.\*.docx) DO ( | |
Set "File=%%~nx#" | |
Ren "%%#" "!File:%find%=%replace%!" | |
) | |
:: Whassup? | |
ECHO Done. | |
:: Let the user exit deliberately | |
:exit | |
SET exit= | |
SET /p exit=Hit return to exit... | |
IF "%repeat%"=="" GOTO:eof | |
GOTO exit |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
But I faced one more issue - my html file was ANSI and not UTF-8 encoded..so when i changed it, it worked....But I have many html files which are ANSI and not UTF-8 encoded, any idea how to export html to word with having ANSI encoding?