I've never fully understood how NAT traversal works and I went on a dive today.
Run whatismyip.py on a public server. This plays the role of a STUN server.
Run easynat.py on a laptop behind a home router.
You could run hardnat.py also on a laptop behind a home router, but don't do it on the same home router because
"hairpinning" is almost always broken.
You could also run it on a public server, it should work there too. But to really push it put it in a hostile environment,
like on a phone data. Cellphone data is always carrier-grade NATed, sometimes multiple times over.
If you can get a p2p connection between a cellphone and a laptop you're doing pretty well.
This implements the birthday paradox port scan described by Tailscale. It spawns many sockets behind one NAT then tries to connect to many ports from the other without knowing what external ports the NAT assigned in hopes of stumbling across a port that was chosen. I found I had to bump it from their recommended 22048 to 256,3072 to really make it reliable.
It's more reliable to launch easynat before hardnat.
There's an asymmetry: easynat is meant for running on a "cone NAT", one where the inner port = the outer port. This isn't always available. It's apparently spec'd somewhere that UDP NATs should be "coned" but it's. I wonder if the asymmetry is necessary. Maybe both sides should open many ports, and each port should scan many ports.
I also wonder if the middle server can help more; perhaps it can give start/stop commands to synchronize the sides better.
Refs: