Skip to content

Instantly share code, notes, and snippets.

@sulaya86
Last active May 29, 2019 05:48
Show Gist options
  • Save sulaya86/29ec29bba1093dd0b5878b9b22fd4e0d to your computer and use it in GitHub Desktop.
Save sulaya86/29ec29bba1093dd0b5878b9b22fd4e0d to your computer and use it in GitHub Desktop.
The purpose is to compare two list of files (A and B) and get a list of files does not exist in list A, and viceversa. This can be helpful for example when we want to make sure files were downloaded/reloaded succesfuly to avoid missing data.
import os
from pathlib import PureWindowsPath
def print_diff_files():
'''
Compare the content of two files
'''
_local_dir = os.path.dirname(os.path.abspath(__file__))
files_to_extract = PureWindowsPath(_local_dir + '\\' + 'files_to_find.txt')
downloaded_files = PureWindowsPath(_local_dir + '\\' + 'filelist.txt')
not_found_files = PureWindowsPath(_local_dir + '\\' + 'files_not_found.txt')
extra_files = PureWindowsPath(_local_dir + '\\' + 'extra_files_found.txt')
with open(files_to_find, 'r') as file1:
with open(downloaded_files, 'r') as file2:
diff = set(file1) - set(file2)
extras = set(file2) - set(file1)
with open(not_found_files, 'w') as file_out:
for line in diff:
file_out.write(line)
with open(extra_files, 'w') as file_out:
for line in extras:
file_out.write(line)
if __name__ == '__main__':
print_diff_files()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment