Main Content

I hate using things I don’t understand.

To understand something, really, you have to build it. That’s what I was thinking when I tried to build a sewing machine out of old printers. And I stand by it! The only way I really got to grips with UART, SPI, I2C and so on was by bit-banging them.

The internet. Having built many websites and online games, I’m pretty sure I understand how the higher-level protocols of the internet work. But the lowest I’ve ever had to program with is sockets – opening a raw UDP or TCP data stream. TCP, especially, is a mysterious protocol that can deliver a continuous stream of bytes, guaranteed, without you having to ever understand it.

So. It is time to understand the internet.

I think there’s no single aspect of the internet that’s especially complicated, the scary part is just the number of acronyms. There must be hundreds of them for all the different bits of the dozens and dozens of protocols. Since explaining things is often a good way to spot holes in your own knowledge, I’ll start by trying to summarize the TCP/IP stack.

Layers
The protocols that make up the internet are split up into them. The combination of all these layers forms a stack.
PHY & data link
The lowest layer in the stack is the physical layer, or PHY. This is the actual method of transferring zeros and ones from one device to the next. In the case of the ethernet port on my laptop, this is a cat5 cable with twisted copper wires carrying a manchester-encoded differential signal. Other physical interfaces include 2.4GHz wireless (for wifi) and fibre-optic links. Physical layer. Simple.
The second-lowest layer in the stack is the Data Link (or “network”) layer. In the case of ethernet, this is the ethernet packet format. The distinction is that 10Mbit, 100Mbit and Gigabit ethernet all use the same data link layer, but have different physical layers.

All devices with an ethernet (or wifi) interface have a hard-coded MAC address. This 48-bit number is programmed into the chip at the factory, and is supposed to be unique in the world. Every ethernet message has a “Source” and a “Destination” MAC address in its header.

Rambling aside: The name “ethernet” doesn’t really make sense any more. The old style of ethernet used a coaxial cable for its physical layer. All the computers in a network would be wired to this same bit of coaxial cable, and if any one of them sent a signal onto the coax, all the others would receive it. The cable was like a virtual “ether” that signals would traverse. You can immediately see the problem with this set-up: what if two devices wanted to talk at once? They had to wait until the line was clear, and then start sending in the hope that no-one else would interrupt. If two messages collided, the rule was to stop transmitting, and wait a random amount of time before starting again.

Each device that hears the message would look at the destination address, and if it wasn’t its own MAC address, it would throw the packet away. Nowadays when you use ethernet, you’re almost certainly plugging your computer into a network switch via a full duplex line, so collisions are impossible. The switch does the job of looking at where messages are headed, and sending them there.”

Link to article