IdentifyingNumber : {D307B5CF-D1F0-48A4-8DA3-54765F535208}
Name : SQL Server 2012 SQL Data Quality Common
Vendor : Microsoft Corporation
Version : 11.2.5058.0
Caption : SQL Server 2012 SQL Data Quality Common
Extracting the CPE from such unstructured text is a challenge. Where is the vendor, product, version?
There are many standards to map software packages but the reality is tough. CPE are used in NVD but not available in package managers and operating system. Then SWID tags appear to replace CPE and do it better. But the initiave is stuck and there is no way to map software to machine parseable names.
Words are lower-case.
The original dataset is mapped in the following way.
- Each word from the CPE is built with a type set index.
w:<word>
->{vendor, product}
- Each version is mapped as version in a set for each word with matching version.
v:<version>
->{word, word}
- Input is lower-cased.
- The sentence is split by white-space
- Each word is searched in
w:<word>
- If one vendor word is found, the remaining words are concatened with space, dash, underscore.
- Each concatened word is search in
w:<word>
- If one product is found, the remaining words are searched as version