Skip to content

Instantly share code, notes, and snippets.

@magnetikonline
Last active March 3, 2025 23:19
Show Gist options
  • Save magnetikonline/7a21ec5f5bcdbf7adb92f9d617e6198f to your computer and use it in GitHub Desktop.
Save magnetikonline/7a21ec5f5bcdbf7adb92f9d617e6198f to your computer and use it in GitHub Desktop.
Python function - test if given file is considered binary.

Python function - is file binary?

Function which determines if a given file is binary.

Test is based on the following algorithm (similar to that implemented within Perl):

  • Empty files are considered text.
  • If not empty, read up to 512 bytes as a buffer. File will be binary if:
    • Null byte is encountered.
    • More than 30% of the buffer consists of "non text" characters.
  • Otherwise, file is text.

Reference

#!/usr/bin/env python
class IsFileBinary:
READ_BYTES = 512
CHAR_THRESHOLD = 0.3
TEXT_CHARACTERS = "".join(
[chr(code) for code in range(32, 127)] + list("\b\f\n\r\t")
)
def test(self, file_path):
# read chunk of file
fh = open(file_path, "r")
file_data = fh.read(IsFileBinary.READ_BYTES)
fh.close()
# store chunk length read
data_length = len(file_data)
if not data_length:
# empty files considered text
return False
if "\x00" in file_data:
# file containing null bytes is binary
return True
# remove all text characters from file chunk, get remaining length
binary_length = len(file_data.translate(None, IsFileBinary.TEXT_CHARACTERS))
# if percentage of binary characters above threshold, binary file
return (float(binary_length) / data_length) >= IsFileBinary.CHAR_THRESHOLD
def main():
is_file_binary = IsFileBinary()
print("Is binary file: {0}".format(is_file_binary.test("./first")))
print("Is binary file: {0}".format(is_file_binary.test("./second")))
print("Is binary file: {0}".format(is_file_binary.test("./third")))
if __name__ == "__main__":
main()
@magnetikonline
Copy link
Author

If all you know is OOP, what you do every day is adjusting your problem to a predetermined solution.

one of the strangest comments I've read for a while. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment