tl;dr:
- mostly, hosts ignore the mss sent
- most of the interesting behaviour comes from the extremes: mss=1 and mss=1440
- setting mss=(an atypical value, say 1275 or 1411) ought to spot hosts that simply reflect MSS values
- setting mss=1024 apparently will spot a bunch of (Microsoft?) hosts returning MSS=956
- passively, mss=1220, 1360, 1410 may help identify CDN/non-CDN nodes inside yahoo/edgecast, google, facebook respectively
Variation is less than I initially thought; just enough for me to notice it in the first place I guess:
-
when MSS=1 on the SYN:
- 82% of the scans (2018-04-30, tcp port 443) return MSS=1440
- 95% of the scans return typical values, 1220 <= MSS <= 1440
- 3% are MSS<1220
- 2% are MSS>1440
-
when MSS=1440 on the SYN, using the full set from above
- 85% of those hosts return MSS=1440
- 98% of the scans return typical values, 1220 <= MSS <= 1440
- 0.2% are MSS<1220
- 2% are MSS>1440
CDF plots of different MSS values returned from active scans (not zmap, just me hacking around) are in [1]. These plots accumulate fast, hence the log scale.
Inside the range of legal/typical values (1220:1440), the range of responses looks like [2].
Interesting observations from the whole dataset:
- MSS=1 seems to trigger an MSS=1432 in ~3000 extra responses compared to the rest
- MSS=1 triggers an extra ~700 responses with MSS=516
- MSS=1024 triggers an MSS=956 in response from a half-percent of hosts scanned, many from within Microsoft [3]
- there's 100-150 hosts respond with MSS=536 regardless of anything else; for example, hosts from these /64s [8]
There exists a floating 2-3% of hosts in these scans that mostly reflect the MSS value in the SYN. Note: I have not tried atypical large values, only small values. But in these cases, a common pattern seems to be:
- SYN+MSS=1440 gets SYN+ACK+MSS=1440 in return
- SYN+MSS=1220 gets SYN+ACK+MSS=1220 in return
- SYN+MSS=536 gets SYN+ACK+MSS=536 in return
- SYN+MSS=1 gets SYN+ACK+MSS=216 in return
I expect they'll echo very unusual values so could be easy to spot.
Broader behaviour is shown in [4]; circle size is log-proportional.
Regarding other common values, I've been interested in what (other) CDNs are doing: [5], from early 2017 I think. Fastly have since fixed their MSS.
Addresses scanned in ASN15169 (Google) return 1360 and 1440 in the 2018-04-30 sweep. 1360 is their content delivery network. 1440 is ... everything else? Possibly even their CDN on paths they know are good. I haven't looked at address structure, but I imagine there may be distinct structure between those two sets. Of the networks that return MSS=1360, Google is the most visible in the 2018-04-30 results [6].
On 1410, that's more common (across 574 ASNs) [7]. Facebook is in there, but they don't stand out.
Finally: not all addresses are stable even against the same MSS value. In the 2018-04-30 scan alone, sometimes the same address returns two MSS values. An 8921 followed by a more sane value second time around seems common from HE and Amazon [9]
- [1] http://sg-pub.ripe.net/sds/misc/mss-cdf-loglog.pdf
- [2] http://sg-pub.ripe.net/sds/misc/mss-legal-vals-logy.pdf
- [3] https://gist.github.com/sdstrowes/04a685f0c8ea3452658c89d0aff1f204
- [4] http://sg-pub.ripe.net/sds/misc/mssvals-scatter.png
- [5] http://sg-pub.ripe.net/sds/misc/cdn-dist.pdf
- [6] https://gist.github.com/sdstrowes/e15de281dee1e02fa14149ea19124e5e#file-1360
- [7] https://gist.github.com/sdstrowes/e15de281dee1e02fa14149ea19124e5e#file-1410
- [8] https://gist.github.com/sdstrowes/91e715ca85d46bb358a1f387fe2a452f
- [9] https://gist.github.com/sdstrowes/d2c6adb7d584d44eec36ca8691553d50