Last active
April 14, 2020 06:21
-
-
Save alonstern/6d4a8b42b884df307ba81d164a44d69b to your computer and use it in GitHub Desktop.
split the data
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
argument_parser = argparse.ArgumentParser() | |
argument_parser.add_argument("dataset_path", help="Path to the directory with the binaries for the dataset " | |
"(e.g ~/security.ece.cmu.edu/byteweight/elf_32") | |
args = argument_parser.parse_args() | |
kernel_size = 20 | |
# We want the padding to be in size kernel_size - 1 so the CNN output will have the same size as the tags | |
dataset = FunctionIdentificationDataset(args.dataset_path, block_size=1000, padding_size=kernel_size - 1) | |
train_size = int(len(dataset) * 0.9) | |
test_size = len(dataset) - train_size | |
train_dataset, test_dataset = data.random_split(dataset, [train_size, test_size]) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment