Skip to content

Instantly share code, notes, and snippets.

@alonstern
Last active April 14, 2020 06:21
Show Gist options
  • Save alonstern/6d4a8b42b884df307ba81d164a44d69b to your computer and use it in GitHub Desktop.
Save alonstern/6d4a8b42b884df307ba81d164a44d69b to your computer and use it in GitHub Desktop.
split the data
argument_parser = argparse.ArgumentParser()
argument_parser.add_argument("dataset_path", help="Path to the directory with the binaries for the dataset "
"(e.g ~/security.ece.cmu.edu/byteweight/elf_32")
args = argument_parser.parse_args()
kernel_size = 20
# We want the padding to be in size kernel_size - 1 so the CNN output will have the same size as the tags
dataset = FunctionIdentificationDataset(args.dataset_path, block_size=1000, padding_size=kernel_size - 1)
train_size = int(len(dataset) * 0.9)
test_size = len(dataset) - train_size
train_dataset, test_dataset = data.random_split(dataset, [train_size, test_size])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment