Skip to content

Instantly share code, notes, and snippets.

@juan-fdz-hawa
Created June 28, 2022 01:13
Show Gist options
  • Select an option

  • Save juan-fdz-hawa/52a9a54646a1cdc26359104d4f1a57e3 to your computer and use it in GitHub Desktop.

Select an option

Save juan-fdz-hawa/52a9a54646a1cdc26359104d4f1a57e3 to your computer and use it in GitHub Desktop.
Extracts the vendor and product portions from the NVD dataset
import xml.etree.ElementTree as et
from collections import defaultdict
import json
data = []
path = "cpe-dictionary_v2.3.xml"
context = et.iterparse(path, events=("start", "end"))
context = iter(context)
ev, root = next(context)
vendors = defaultdict(set)
for ev, el in context:
if ev == 'start' and el.tag == '{http://scap.nist.gov/schema/cpe-extension/2.3}cpe23-item':
parts = el.attrib['name'].split(':')
vendor = parts[3].replace('\\', '')
product = parts[4].replace('\\', '')
vendors[vendor].add(product)
root.clear()
for k in vendors:
vendors[k] = list(vendors[k])
with open('extract.json', 'w') as f:
json.dump(vendors, f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment