Skip to content

Instantly share code, notes, and snippets.

@juan-fdz-hawa
Created June 28, 2022 01:13
Show Gist options
  • Save juan-fdz-hawa/52a9a54646a1cdc26359104d4f1a57e3 to your computer and use it in GitHub Desktop.
Save juan-fdz-hawa/52a9a54646a1cdc26359104d4f1a57e3 to your computer and use it in GitHub Desktop.
Extracts the vendor and product portions from the NVD dataset
import xml.etree.ElementTree as et
from collections import defaultdict
import json
data = []
path = "cpe-dictionary_v2.3.xml"
context = et.iterparse(path, events=("start", "end"))
context = iter(context)
ev, root = next(context)
vendors = defaultdict(set)
for ev, el in context:
if ev == 'start' and el.tag == '{http://scap.nist.gov/schema/cpe-extension/2.3}cpe23-item':
parts = el.attrib['name'].split(':')
vendor = parts[3].replace('\\', '')
product = parts[4].replace('\\', '')
vendors[vendor].add(product)
root.clear()
for k in vendors:
vendors[k] = list(vendors[k])
with open('extract.json', 'w') as f:
json.dump(vendors, f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment