Objective: create an algorithm that identifies the version of a remote M2 install, by examining at most 5 URIs (limited as to not overload the remote server).
Background: I created a somewhat optimal set of fingerprints for Magento 1. However, for M2 there are fewer unique characteristics. I suspect that combining multiple fingerprints will yield better results. But how to establish the optimal set of fingerprint combinations?
Use a list of 234 static files that have different checksums for different M2 versions.
Inspection of the corpus shows, for example, that fetching lib/web/css/docs/actions-toolbar.html
will reduce the possibilites to:
"EE 2.0.13",
"CE 2.0.13",
"CE 2.0.14",
"EE 2.0.14"
Then, examining lib/web/jquery.js
will eliminate either *.13 or *.14.
Then, examining pub/errors/enterprise/images/favicon.ico
will eliminate either CE or EE.
- Given that all 234 files are examined, can this dataset be used to identify each unique M2 version? If not, which versions are indistinguishable?
- Create an algorithm that produces the most accurate results, by examining up to 234 files.
- Create an algorithm that identifies as many versions as possible, by examining only 5 files. How many versions can you identify?
- Create an algorithm that will identify the version for as many installs as possible, by examining only 5 files. Assume that installs (in the wild) are distributed as follows. The newest EE & CE 2.1.7 each have 15% share. The others have 70% / 41 =~ 1.7% share. Given the distribution, what percentage of installs should you be able to identify with 5 requests?
These are all the 43 appliccable versions.
CE 2.0.0
CE 2.0.10
CE 2.0.11
CE 2.0.12
CE 2.0.13
CE 2.0.14
CE 2.0.1
CE 2.0.2
CE 2.0.3
CE 2.0.4
CE 2.0.5
CE 2.0.6
CE 2.0.7
CE 2.0.8
CE 2.0.9
CE 2.1.0
CE 2.1.1
CE 2.1.2
CE 2.1.3
CE 2.1.4
CE 2.1.5
CE 2.1.6
CE 2.1.7
EE 2.0.10
EE 2.0.11
EE 2.0.12
EE 2.0.13
EE 2.0.14
EE 2.0.2
EE 2.0.4
EE 2.0.5
EE 2.0.6
EE 2.0.7
EE 2.0.8
EE 2.0.9
EE 2.1.0
EE 2.1.1
EE 2.1.2
EE 2.1.3
EE 2.1.4
EE 2.1.5
EE 2.1.6
EE 2.1.7