Skip to content

Instantly share code, notes, and snippets.

@adulau
Last active March 11, 2024 02:14
Show Gist options
  • Save adulau/f9b8ae527a972b8f33c77f628ae60a69 to your computer and use it in GitHub Desktop.
Save adulau/f9b8ae527a972b8f33c77f628ae60a69 to your computer and use it in GitHub Desktop.
Product Name to CPE naming

CPE mapping with the product or software name

Problem

IdentifyingNumber : {D307B5CF-D1F0-48A4-8DA3-54765F535208}
Name              : SQL Server 2012 SQL Data Quality Common
Vendor            : Microsoft Corporation
Version           : 11.2.5058.0
Caption           : SQL Server 2012 SQL Data Quality Common

Extracting the CPE from such unstructured text is a challenge. Where is the vendor, product, version?

Standards

There are many standards to map software packages but the reality is tough. CPE are used in NVD but not available in package managers and operating system. Then SWID tags appear to replace CPE and do it better. But the initiave is stuck and there is no way to map software to machine parseable names.

Idea of implementation

Reverse index

Words are lower-case.

The original dataset is mapped in the following way.

  • Each word from the CPE is built with a type set index. w:<word> -> {vendor, product}
  • Each version is mapped as version in a set for each word with matching version. v:<version> -> {word, word}

Search Algorithm

  • Input is lower-cased.
  • The sentence is split by white-space
  • Each word is searched in w:<word>
  • If one vendor word is found, the remaining words are concatened with space, dash, underscore.
  • Each concatened word is search in w:<word>
  • If one product is found, the remaining words are searched as version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment