How NAT Traversal Works - Nikola's Digital Garden

#readwise # How NAT Traversal Works ![[50eafc1638d93b9637dcee0d55967a8fa09e05c7-1700x800.png|rw-book-cover]] ## Metadata - Author: [[Tailscale]] - Full Title: How NAT Traversal Works - URL: https://tailscale.com/blog/how-nat-traversal-works/ ## Highlights **Let’s say you’re making your own protocol and that you want NAT traversal. You need two things.** **First, the protocol should be based on UDP.** You *can* do NAT traversal with TCP, but it adds another layer of complexity to an already quite complex problem, and may even require kernel customizations depending on how deep you want to go. We’re going to focus on UDP for the rest of this article. **Second, you need direct control over the network socket that’s sending and receiving network packets.** As a rule, you can’t take an existing network library and make it traverse NATs, because you have to send and receive extra packets that aren’t part of the “main” protocol you’re trying to speak. Some protocols tightly integrate the NAT traversal with the rest (e.g. WebRTC). But if you’re building your own, it’s helpful to think of NAT traversal as a separate entity that shares a socket with your main protocol. Both run in parallel, one enabling the other. --- **Direct socket access may be tough depending on your situation. One workaround is to run a local proxy. Your protocol speaks to this proxy, and the proxy does both NAT traversal and relaying of your packets to the peer.** This layer of indirection lets you benefit from NAT traversal without altering your original program. ([View Highlight](https://read.readwise.io/read/01hff4vd1m88mpy6bsecdmga8b)) --- With prerequisites out of the way, let’s go through NAT traversal from first principles. Our goal is to get UDP packets flowing bidirectionally between two devices, so that our other protocol (WireGuard, QUIC, WebRTC, …) can do something cool. There are two obstacles to having this Just Work: stateful firewalls and NAT devices. ([View Highlight](https://read.readwise.io/read/01hff4w4vf9zm7rvwe74vrt9ra)) --- ## Figuring out firewalls Connections and “direction” are a figment of the protocol designer’s imagination. On the wire, every connection ends up being bidirectional; it’s all individual packets flying back and forth. How does the firewall know what’s inbound and what’s outbound? That’s where the stateful part comes in. **Stateful firewalls remember what packets they’ve seen in the past and can use that knowledge when deciding what to do with new packets that show up.** ![[2b44043b374217d6fa2d3a138b77c171df05bdf2-1600x740.png]] **For UDP, the rule is very simple: the firewall allows an inbound UDP packet if it previously saw a matching outbound packet.** For example, if our laptop firewall sees a UDP packet leaving the laptop from `2.2.2.2:1234` to `7.7.7.7:5678`, it’ll make a note that incoming packets from `7.7.7.7:5678` to `2.2.2.2:1234` are also fine. The trusted side of the world clearly intended to communicate with `7.7.7.7:5678`, so we should let them talk back. ... **Our only constraint is that the machine that’s *behind* the firewall must be the one initiating all connections. Nothing can talk to it, unless it talks first.** ![[e4277f239f9dbd1344451fc7d5e7da067447fd4d-2100x788.png]] This is fine, but not very interesting: we’ve reinvented client/server communication, where the server makes itself easily reachable to clients. In the VPN world, this leads to a hub-and-spoke topology: the hub has no firewalls blocking access to it and the firewalled spokes connect to the hub. ![[2bbe8d0fa6f3a66b71f0aec01bf831a3cc1a8a65-2210x1082.png]] **The problems start when two of our “clients” want to talk directly. Now the firewalls are facing each other. According to the rule we established above, this means both sides must go first, but also that neither can go first, because the other side has to go first!** ![[928409c960e0b0bcd53560edf80a934b24eaec11-1740x620.png]] --- ### Finessing finicky firewalls **The trick is to carefully read the rule we established for our stateful firewalls. For UDP, the rule is: packets must flow out before packets can flow back in.** **However, nothing says the packets must be *related* to each other beyond the IPs and ports lining up correctly. As long as *some* packet flowed outwards with the right source and destination, any packet that *looks like* a response will be allowed back in, even if the other side never received your packet!** **So, to traverse these multiple stateful firewalls, we need to share some information to get underway: the peers have to know in advance the `ip:port` their counterpart is using.** One approach is to statically configure each peer by hand, but this approach doesn’t scale very far. To move beyond that, we built a [coordination server](https://tailscale.com/blog/how-tailscale-works#the-control-plane-key-exchange-and-coordination) to keep the `ip:port` information synchronized in a flexible, secure manner. **Then, the peers start sending UDP packets to each other.** They must expect some of these packets to get lost, so they can’t carry any precious information unless you’re prepared to retransmit them. This is generally true of UDP, but especially true here. We’re *going* to lose some packets in this process. Our laptop and workstation are now listening on fixed ports, so that they both know exactly what `ip:port` to talk to. Let’s take a look at what happens. ![[d7310815a3e9f715b549b2043f60b94b04b42b6d-1740x680.png]] The laptop’s first packet, from `2.2.2.2:1234` to `7.7.7.7:5678`, goes through the Windows Defender firewall and out to the internet. The corporate firewall on the other end blocks the packet, since it has no record of `7.7.7.7:5678` ever talking to `2.2.2.2:1234`. However, Windows Defender now remembers that it should expect and allow responses from `7.7.7.7:5678` to `2.2.2.2:1234`. ![[9cfd8653ec918a72d6909e5603d65c2ca4b6e5c9-1740x680.png]] Next, the workstation’s first packet from `7.7.7.7:5678` to `2.2.2.2:1234` goes through the corporate firewall and across the internet. When it arrives at the laptop, Windows Defender thinks “ah, a response to that outbound request I saw”, and lets the packet through! Additionally, the corporate firewall now remembers that it should expect responses from `2.2.2.2:1234` to `7.7.7.7:5678`, and that those packets are also okay. Encouraged by the receipt of a packet from the workstation, the laptop sends another packet back. It goes through the Windows Defender firewall, through the corporate firewall (because it’s a “response” to a previously sent packet), and arrives at the workstation. ![[7189e9a0b2caf65998dc2e50e84e8f4a35e73bbb-1740x680.png]] Success! We’ve established two-way communication through a pair of firewalls that, at first glance, would have prevented it. ([View Highlight](https://read.readwise.io/read/01hfgmz81cb01sk68j7w2bptcv)) --- ### Creative connectivity caveats **Stateful firewalls have limited memory, meaning that we need periodic communication to keep connections alive. If no packets are seen for a while (a common value for UDP is 30 seconds), the firewall forgets about the session, and we have to start over. To avoid this, we use a timer and must either send packets regularly to reset the timers, or have some out-of-band way of restarting the connection on demand.** ([View Highlight](https://read.readwise.io/read/01hff5fr7027am66gf3bykfckq)) --- ## The nature of NATs **A NAT device is anything that does any kind of Network Address Translation, i.e. altering the source or destination IP address or port.** However, when talking about connectivity problems and NAT traversal, all the problems come from Source NAT, or SNAT for short. ([View Highlight](https://read.readwise.io/read/01hff5k1wrbh0vcebbj4vvf1a6)) --- **The most common use of SNAT is to connect many devices to the internet, using fewer IP addresses than the number of devices. In the case of consumer-grade routers, we map all devices onto a single public-facing IP address. This is desirable because it turns out that there are way more devices in the world that want internet access, than IP addresses to give them ... NATs let us have many devices sharing a single IP address, so despite the global shortage of IPv4 addresses, we can scale the internet further with the addresses at hand.** --- ### Navigating a NATty network Let’s look at what happens when your laptop is connected to your home Wi-Fi and talks to a server on the internet. ![[6eb196d17f6a5fc312a5ed657f6f5a2a47213cf3-2000x760.png]] Your laptop sends UDP packets from `192.168.0.20:1234` to `7.7.7.7:5678`. This is exactly the same as if the laptop had a public IP. But that won’t work on the internet: `192.168.0.20` is a private IP address, which appears on many different peoples’ private networks. The internet won’t know how to get responses back to us. Enter the home router. The laptop’s packets flow through the home router on their way to the internet, and the router sees that this is a new session that it’s never seen before. It knows that `192.168.0.20` won’t fly on the internet, but it can work around that: it picks some unused UDP port on its own public IP address — we’ll use `2.2.2.2:4242` — and creates a *NAT mapping* that establishes an equivalence: `192.168.0.20:1234` on the LAN side is the same as `2.2.2.2:4242` on the internet side. From now on, whenever it sees packets that match that mapping, it will rewrite the IPs and ports in the packet appropriately. ![[953404277d143f8e1ece8df72697208593faccb0-2080x640.png]] Resuming our packet’s journey: the home router applies the NAT mapping it just created, and sends the packet onwards to the internet. Only now, the packet is from `2.2.2.2:4242`, not `192.168.0.20:1234`. It goes on to the server, which is none the wiser. It’s communicating with `2.2.2.2:4242`, like in our previous examples sans NAT. Responses from the server flow back the other way as you’d expect, with the home router rewriting `2.2.2.2:4242` back to `192.168.0.20:1234`. The laptop is *also* none the wiser, from its perspective the internet magically figured out what to do with its private IP address. Our example here was with a home router, but the same principle applies on corporate networks. The usual difference there is that the NAT layer consists of multiple machines (for high availability or capacity reasons), and they can have more than one public IP address, so that they have more public `ip:port` combinations to choose from and can sustain more active clients at once. ![[cdd34c97cc748ad3a478656650dc5c3f6091dc12-2300x1076.png|Multiple NATs on a single layer allow for higher availability or capacity, but function the same as a single NAT.]] Multiple NATs on a single layer allow for higher availability or capacity, but function the same as a single NAT. ([View Highlight](https://read.readwise.io/read/01hfgn3xtw8kcvmvsw3jb7v4jq)) --- ### A study in STUN **We now have a problem that looks like our earlier scenario with the stateful firewalls, but with NAT devices**: ![[20d2f883c81f9771bc15fcde173334b46f7beabb-2180x620.png]] **Our problem is that our two peers don’t know what the `ip:port` of their peer is. Worse, strictly speaking there is *no* `ip:port` until the other peer sends packets, since NAT mappings only get created when outbound traffic towards the internet requires it. We’re back to our stateful firewall problem, only worse: both sides have to speak first, but neither side knows to whom to speak, and can’t know until the other side speaks first.** ([View Highlight](https://read.readwise.io/read/01hfgn836h8je9yv7cyygxsgmk)) --- **STUN is both a set of studies of the detailed behavior of NAT devices, and a protocol that aids in NAT traversal. The main thing we care about for now is the network protocol.** **STUN relies on a simple observation: when you talk to a server on the internet from a NATed client, the server sees the public `ip:port` that your NAT device created for you, not your LAN `ip:port`. So, the server can *tell* you what `ip:port` it saw. That way, you know what traffic from your LAN `ip:port` looks like on the internet, you can tell your peers about that mapping, and now they know where to send packets! We’re back to our “simple” case of firewall traversal.** ^25ohkh **That’s fundamentally all that the STUN protocol is: your machine sends a “what’s my endpoint from your point of view?” request to a STUN server, and the server replies with “here’s the `ip:port` that I saw your UDP packet coming from.”** ^3zksy4 ![[b48af5689535a521c586d6f535bac4f6a95d62b7-1840x976.png]] Incidentally, this is why we said in the introduction that, if you want to implement this yourself, the NAT traversal logic and your main protocol have to share a network socket. Each socket gets a different mapping on the NAT device, so in order to discover your public `ip:port`, you have to send and receive STUN packets from the socket that you intend to use for communication, otherwise you’ll get a useless answer. --- ### How this helps Given STUN as a tool, it seems like we’re close to done. Each machine can do STUN to discover the public-facing `ip:port` for its local socket, tell its peers what that is, everyone does the firewall traversal stuff, and we’re all set… Right? This’ll work in some cases, but not others. **Generally speaking, this’ll work with most home routers, and will fail with some corporate NAT gateways. The probability of failure increases the more the NAT device’s brochure mentions that it’s a security device.**[^1] (NATs do not enhance security in any meaningful way, but that’s a rant for another time.) ^dzpvms [^1]: pfSense is one such NAT, but it is configurable. See [[Configuring Outbound NAT on pfSense]] **The problem is an assumption we made earlier: when the STUN server told us that we’re `2.2.2.2:4242` from its perspective, we assumed that meant that we’re `2.2.2.2:4242` from the entire internet’s perspective, and that therefore anyone can reach us by talking to `2.2.2.2:4242`.** ^a5ax4b As it turns out, that’s not always true. **Some NAT devices behave exactly in line with our assumptions. Their stateful firewall component still wants to see packets flowing in the right order, but we can reliably figure out the correct `ip:port` to give to our peer and do our simultaneous transmission trick to get through. Those NATs are great, and our combination of STUN and the simultaneous packet sending will work fine with those. ... Other NAT devices are more difficult, and create a completely different NAT mapping for every different destination that you talk to. On such a device, if we use the same socket to send to `5.5.5.5:1234` and `7.7.7.7:2345`, we’ll end up with two different ports on 2.2.2.2, one for each destination. If you use the wrong port to talk back, you don’t get through.** ^9lcu5z ![[c9edd473a0702412836a0f0efa1024b2df60a22e-2000x1076.png]] --- ### Naming our NATs Now that we’ve discovered that not all NAT devices behave in the same way, we should talk terminology. If you’ve done anything related to NAT traversal before, you might have heard of “Full Cone”, “Restricted Cone”, “Port-Restricted Cone” and “Symmetric” NATs. These are terms that come from early research into NAT traversal. That terminology is honestly quite confusing. I always look up what a Restricted Cone NAT is supposed to be. Empirically, I’m not alone in this, because most of the internet calls “easy” NATs Full Cone, when these days they’re much more likely to be Port-Restricted Cone. More recent research and RFCs have come up with a much better taxonomy. First of all, they recognize that there are many more varying dimensions of behavior than the single “cone” dimension of earlier research, so focusing on the cone-ness of your NAT isn’t necessarily helpful. Second, they came up with words that more plainly convey what the NAT is doing. **The “easy” and “hard” NATs above differ in a single dimension: whether or not their NAT mappings depend on what the destination is. [RFC 4787](https://tools.ietf.org/html/rfc4787) calls the easy variant “Endpoint-Independent Mapping” (EIM for short), and the hard variant “Endpoint-Dependent Mapping” (EDM for short).** There’s a subcategory of EDM that specifies whether the mapping varies only on the destination IP, or on both the destination IP and port. For NAT traversal, the distinction doesn’t matter. Both kinds of EDM NATs are equally bad news for us. ^y4c3an In the grand tradition of naming things being hard, endpoint-independent NATs still depend on an endpoint: each *source* `ip:port` gets a different mapping, because otherwise your packets would get mixed up with someone else’s packets, and that would be chaos. Strictly speaking, we should say “Destination Endpoint Independent Mapping” (DEIM?), but that’s a mouthful, and since “Source Endpoint Independent Mapping” would be another way to say “broken”, we don’t specify. Endpoint always means “Destination Endpoint.” You might be wondering how 2 kinds of endpoint dependence maps into 4 kinds of cone-ness. The answer is that cone-ness encompasses two orthogonal dimensions of NAT behavior. One is NAT mapping behavior, which we looked at above, and the other is stateful firewall behavior. Like NAT mapping behavior, the firewalls can be Endpoint-Independent or a couple of variants of Endpoint-Dependent. If you throw all of these into a matrix, you can reconstruct the cone-ness of a NAT from its more fundamental properties: ([View Highlight](https://read.readwise.io/read/01jjd07wr9axgbqwy74nya4x6t)) --- | | Endpoint-Independent NAT mapping | Endpoint-Dependent NAT mapping (all types) | | ------------------------------------------- | -------------------------------- | ------------------------------------------ | | Endpoint-Independent firewall | Full Cone NAT | N/A* | | Endpoint-Dependent firewall (dest. IP only) | Restricted Cone NAT | N/A* | | Endpoint-Dependent firewall (dest. IP+port) | Port-Restricted Cone NAT | Symmetric NAT | Combinations marked with * can theoretically exist, but don't show up in the wild. **Once broken down like this, we can see that cone-ness isn’t terribly useful to us. The major distinction we care about is Symmetric versus anything else — in other words, we care about whether a NAT device is EIM or EDM.** ([View Highlight](https://read.readwise.io/read/01hfgsk9bscf65ks2w5kndm23k)) - Note: The table above covers firewalls and NAT. The columns represent how NAT behaves while rows represent how the firewall behaves. --- **While it’s neat to know exactly how your firewall behaves, we don’t care from the point of view of writing NAT traversal code. Our simultaneous transmission trick will get through all three variants of firewalls.** In the wild we’re overwhelmingly dealing only with IP-and-port endpoint-dependent firewalls. So, for practical code, we can simplify the table down to: | | Endpoint-Independent NAT mapping | Endpoint-Dependent NAT mapping (dest. IP only) | | --------------- | -------------------------------- | ---------------------------------------------- | | Firewall is yes | Easy NAT | Hard NAT | ([View Highlight](https://read.readwise.io/read/01hfgst0c4jdbhay7rv0kbkxbm)) --- ## NAT notes for nerds ### The benefits of birthdays Let’s revisit our problem with hard NATs. The key issue is that the side with the easy NAT doesn’t know what `ip:port` to send to on the hard side. But *must* send to the right `ip:port` in order to open up its firewall to return traffic. What can we do about that? ![[647364b5f593aafded475c9018f5a299f9893104-2000x760.png]] Well, we know *some* `ip:port` for the hard side, because we ran STUN. Let’s assume that the IP address we got is correct. That’s not *necessarily* true, but let’s run with the assumption for now. As it turns out, it’s mostly safe to assume this. (If you’re curious why, see REQ-2 in [RFC 4787](https://tools.ietf.org/html/rfc4787).) If the IP address is correct, our only unknown is the port. There’s 65,535 possibilities… Could we try all of them? At 100 packets/sec, that’s a worst case of 10 minutes to find the right one. It’s better than nothing, but not great. And it *really* looks like a port scan (because in fairness, it is), which may anger network intrusion detection software. We can do much better than that, with the help of the [birthday paradox](https://en.wikipedia.org/wiki/Birthday_problem). Rather than open 1 port on the hard side and have the easy side try 65,535 possibilities, let’s open, say, 256 ports on the hard side (by having 256 sockets sending to the easy side’s `ip:port`), and have the easy side probe target ports at random. I’ll spare you the detailed math, but you can check out the dinky [python calculator](https://github.com/danderson/nat-birthday-paradox) I made while working it out. The calculation is a very slight variant on the “classic” birthday paradox, because it’s looking at collisions between two sets containing distinct elements, rather than collisions within a single set. Fortunately, the difference works out slightly in our favor! Here’s the chances of a collision of open ports (i.e. successful communication), as the number of random probes from the easy side increases, and assuming 256 ports on the hard side: | Number of random probes | Chance of success | | ----------------------- | ----------------- | | 174 | 50% | | 256 | 64% | | 1024 | 98% | | 2048 | 99.9% | If we stick with a fairly modest probing rate of 100 ports/sec, half the time we’ll get through in under 2 seconds. And even if we get unlucky, 20 seconds in we’re virtually guaranteed to have found a way in, after probing less than 4% of the total search space. That’s great! With this additional trick, one hard NAT in the path is an annoying speedbump, but we can manage. What about two? ![[82546090404cf5ab862b7f1ed541ee13a34d2d5a-2000x760.png]] We can try to apply the same trick, but now the search is much harder: each random destination port we probe through a hard NAT also results in a random *source* port. That means we’re now looking for a collision on a `{source port, destination port}` pair, rather than just the destination port. Again I’ll spare you the calculations, but after 20 seconds in the same regime as the previous setup (256 probes from one side, 2048 from the other), our chance of success is… 0.01%. This shouldn’t be surprising if you’ve studied the birthday paradox before. The birthday paradox lets us convert `N` “effort” into something on the order of `sqrt(N)`. But we squared the size of the search space, so even the reduced amount of effort is still a lot more effort. To hit a 99.9% chance of success, we need each side to send 170,000 probes. At 100 packets/sec, that’s 28 minutes of trying before we can communicate. 50% of the time we’ll succeed after “only” 54,000 packets, but that’s still 9 minutes of waiting around with no connection. ([View Highlight](https://read.readwise.io/read/01hfgtjq9vnk4s63sfgp4vcnm9)) --- ## Concoluding out connectivity chat Here’s a parting “TL;DR” recap: For robust NAT traversal, you need the following ingredients: - A UDP-based protocol to augment - Direct access to a socket in your program - A communication side channel with your peers - A couple of STUN servers - A network of fallback relays (optional, but highly recommended) Then, you need to: - Enumerate all the `ip:ports` for your socket on your directly connected interfaces - Query STUN servers to discover WAN `ip:ports` and the “difficulty” of your NAT, if any - Try using the port mapping protocols to find more WAN `ip:ports` - Check for NAT64 and discover a WAN `ip:port` through that as well, if applicable - Exchange all those `ip:ports` with your peer through your side channel, along with some cryptographic keys to secure everything. - Begin communicating with your peer through fallback relays (optional, for quick connection establishment) - Probe all of your peer’s `ip:ports` for connectivity and if necessary/desired, also execute birthday attacks to get through harder NATs - As you discover connectivity paths that are better than the one you’re currently using, transparently upgrade away from the previous paths. - If the active path stops working, downgrade as needed to maintain connectivity. - Make sure everything is encrypted and authenticated end-to-end. ([View Highlight](https://read.readwise.io/read/01hfgwk0sbkssjcxj1vcykyrgf)) ---