Skip to content

Instantly share code, notes, and snippets.

@alonstern
Last active April 13, 2020 08:01
Show Gist options
  • Select an option

  • Save alonstern/72ae4bc0b710b5232ef7c1edc087b6a7 to your computer and use it in GitHub Desktop.

Select an option

Save alonstern/72ae4bc0b710b5232ef7c1edc087b6a7 to your computer and use it in GitHub Desktop.
Iterates every binary in the dataset
def _preprocess_data(self, root_directory):
files_data = []
files_tags = []
# Iterates over every binary in the dataset
for binary_path in tqdm.tqdm(glob.glob(os.path.join(root_directory, "*", "binary", "*"))):
with open(binary_path, "rb") as binary_file:
binary_elf = ELFFile(binary_file)
# Extract the code from the binary.
data = self._generate_data(binary_elf)
# Extract the tags of each byte in the binary code (1 if it is a start of a function, 0 otherwise).
tags = self._generate_tags(binary_elf)
files_data.append(data)
files_tags.append(tags)
return files_data, files_tags
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment