Skip to content

Instantly share code, notes, and snippets.

@tripulse
Last active June 24, 2020 04:34
Show Gist options
  • Select an option

  • Save tripulse/f0c948897d863cdd8ec50a2043329803 to your computer and use it in GitHub Desktop.

Select an option

Save tripulse/f0c948897d863cdd8ec50a2043329803 to your computer and use it in GitHub Desktop.
A small script to strip characters in filenames that don't fall in the range of 7-bit ASCII codepoints. This script is useful if the filename should be restricted to English (letters+punctuations).
#!/bin/python
import glob
from sys import argv
from argparse import ArgumentParser
from os import rename
_parser = ArgumentParser(
"Strip28Bit",
description= "Strips multi-byte UNICODE strings in"
" filenames into 7-bit ASCII strings",
allow_abbrev= True
)
strip_non8bit = lambda s: ''.join(filter(lambda c: ord(c) < 128, str(s)))
_parser.add_argument(
"-r", type= bool,
default= False,
help= "Recursively searches through nested folders",
dest= "isRecursive"
)
_parser.add_argument(
"pattern", type=str
)
_args = _parser.parse_args()
files = glob.glob(_args.pattern, recursive=_args.isRecursive)
if len(files) == 0:
print("No files found by the expression provided.")
# Do the actual process of stripping unicode chars that doesn't
# fit into the 7-bit integer (max=127) ASCII range.
for file in files:
rename(file, strip_non8bit(file))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment