This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # load clean descriptions into memory | |
| def load_clean_descriptions(filename, dataset): | |
| # load document | |
| doc = load_doc(filename) | |
| descriptions = dict() | |
| for line in doc.split('\n'): | |
| # split line by white space | |
| tokens = line.split() | |
| # split id from description | |
| image_id, image_desc = tokens[0], tokens[1:] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # load doc into memory | |
| def load_doc(filename): | |
| # open the file as read only | |
| file = open(filename, 'r') | |
| # read all text | |
| text = file.read() | |
| # close the file | |
| file.close() | |
| return text |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| 2252123185_487f21e336 bunch on people are seated in stadium | |
| 2252123185_487f21e336 crowded stadium is full of people watching an event | |
| 2252123185_487f21e336 crowd of people fill up packed stadium | |
| 2252123185_487f21e336 crowd sitting in an indoor stadium | |
| 2252123185_487f21e336 stadium full of people watch game | |
| ... |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # save descriptions to file, one per line | |
| def save_descriptions(descriptions, filename): | |
| lines = list() | |
| for key, desc_list in descriptions.items(): | |
| for desc in desc_list: | |
| lines.append(key + ' ' + desc) | |
| data = '\n'.join(lines) | |
| file = open(filename, 'w') | |
| file.write(data) | |
| file.close() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # convert the loaded descriptions into a vocabulary of words | |
| def to_vocabulary(descriptions): | |
| # build a list of all description strings | |
| all_desc = set() | |
| for key in descriptions.keys(): | |
| [all_desc.update(d.split()) for d in descriptions[key]] | |
| return all_desc | |
| # summarize vocabulary | |
| vocabulary = to_vocabulary(descriptions) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import string | |
| def clean_descriptions(descriptions): | |
| # prepare translation table for removing punctuation | |
| table = str.maketrans('', '', string.punctuation) | |
| for key, desc_list in descriptions.items(): | |
| for i in range(len(desc_list)): | |
| desc = desc_list[i] | |
| # tokenize | |
| desc = desc.split() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # extract descriptions for images | |
| def load_descriptions(doc): | |
| mapping = dict() | |
| # process lines | |
| for line in doc.split('\n'): | |
| # split line by white space | |
| tokens = line.split() | |
| if len(line) < 2: | |
| continue | |
| # take the first token as the image id, the rest as the description |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # load doc into memory | |
| def load_doc(filename): | |
| # open the file as read only | |
| file = open(filename, 'r') | |
| # read all text | |
| text = file.read() | |
| # close the file | |
| file.close() | |
| return text | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 | |
| 9 | |
| 10 | |
| 11 |