MSS - Maximum Segment Size
RTT - Round-Trip Time
ICMP - Internet Control Message Protocol
Application
Transport
Network
Link
Physical
Tier 1 Content Providers IXPs
Tier 2
Tier 3
Clients
Circuit switching | Packet switching |
---|---|
resources reserved for duration of communication | resources not reserved |
messages may have to queue | |
e.g. telephone lines | e.g. Internet |
Frequency-division multiplexing (FTM) | Time-division multiplexing (TDM) |
---|---|
the frequency spectrum is divided among connections | time divided in frames |
e.g. for telephone 4kHz band | each frame divided in slots for each connection |
e.g. radio | simpler setup |
wasteful during silent period | could lose packets |
Key differences
- Address size: 32 bit v4, 128 bit v6
- v6 - fixed length 40 bit header
- v6 - no fragmentation allowed
- v6 - checksum removed to reduce processing
- v6 - includes ICMPv6
Transition
- Dual-stack: IPv6 routers replace IPv4 routers but support protocol and translate packages to older routers
- Tunneling: IPv6 datagram carried as payload among IPv4 routers
-
Application architectures
Client-server
P2P
Hybrid -
Application requirements
Data loss
Timing
Throughput
Security -
Transport protocols
TCP
UDP -
Sockets
- Web and HTTP - over TCP
- HTTP messages
request
response - Cookies - unique ID generated by server (cookie locally, entry in DB remotely)
authorization
shopping carts
recommendations
user session - Web cache
acts both as client and server
typically at ISP
+ reduce response time for client
+ reduce traffic
+ enables poor content providers to deliver effectively
- HTTP messages
- FTP :21 - over TCP
- Server opens second TCP connection when file transfer initiated
- Command/response
commands: ASCI text
response: status code and phrase
- Server opens second TCP connection when file transfer initiated
- Email :25 - SMTP over TCP
- Components:
user agents
mail servers
SMTP protocol (between mail servers) - Direct transfer: sending to receiving server
handshaking
transfer
closure - Command/response
commands: ASCI text
response: status code and phrase
- Components:
- DNS
-
Decentralized
-
Services
hostname to IP translation
hostname aliasing
mail server aliasing
load distribution - sets of IPs for one canonical nameRoot DNS Servers / | \ com org edu ... | e.g. yahoo.com DNS servers ...
-
13 root DNS servers worldwide
-
Types of DNS servers
- Top-level Domain (TLD) servers
responsible for com, org, net, edu, aero, jobs, museums and all country domains
"Network Solutions" maintains .com TLD
"Educause" maintains .edu TLD - Authoritative DNS servers
organization's DNS servers - mapping for own servers
can be maintained by organization or service provider - Local namer server
does not strictly belong to hierarchy
each ISP has one (a.k.a. DNS)
acts as proxy
- Top-level Domain (TLD) servers
-
-
Demultiplexing at rcv host
delivering received segments to correct socket -
Multiplexing at send host
gathering data from multiple sockets, enveloping data with headers (used for demux) -
Connectionless demultiplexing
- each datagram has source and destination IP address
- each datagram carries one transport-layer segment
- each segment has source and destination port number
-
Connection-oriented demultiplexing
- TCP socket identified by 4-tuple
source IP address, port number
destination IP address, port number - Server may support many simultaneous TCP sockets
web servers have sockets for each connecting client
- TCP socket identified by 4-tuple
-
UDP: User Datagram Protocol
- Why UDP?
no connection establishment (removes delay)
simple (no states)
small segment header
no congestion control (UDP can blast data full speed) - Uses
Media - loss tolerant, rate sensitive
DNS
SNMP - Could add reliability at application level
UDP checksum
- Why UDP?
- Stop-and-Wait
- Go-back-N
- Selective repeat
window size * 2 < ACK #s
-
End-to-end congestion control
Network layer does not provide any support for congestion control
End systems must infer congestions as no explicit feedback is returned by network
TCP has to employ end-to-end congestion control -
Network assisted congestion control
Router sends package indicating congestion or specifying transfer rates
Data feedback:
Direct network feedback - router indicates to client,
Network feedback via receiver.Example: ATM ABR congestion control
-
TCP congestion control
TCP controls its data-rate based on the perceived congestion on the network
Self-clocking: TCP adjusts window size based on received ACKs
Congestion control principles- A lost packet implies congestion and send rate should be reduced
- When an ACK arrives for a previously unacknowledged packet, window could be increased
- Bandwidth probing - increase rate until packet is lost - thus finding optimal send rate
TCP congestion control algorithm
- Slow start
Start at 1MSS and increase window by 1 for each previously unacknowledged packet
Effectively grow window exponentially (since +1 holds for each MSS)
Stop on a loss event and set sstresh to cwnd/2 and cwnd to 1
Stop when cwnd >= sstresh and go to congestion avoidance mode
Stop if three duplicate ACKs and go to fast recovery - Congestion avoidance
Grow linearly by 1MSS - Fast recovery
Grow when double ACK is received (thus recovering from lost packs?)
Additive-increase, multiplicative-decrease (AIMD)
Increase by one when ACK, half when loss
Fairness
UDB does not throttle thus may suffocate TCP
Nothing can prevent application from parallel TCP (opening many connections to server)
-
Datagram network - network-layer connectionless service (e.g. Internet)
no call setup at network level
packets forwarded using destination host address (packets between same source-dest may take different paths)
routers don't keep state about end-to-end connections (no network-level concept of "connection")
Longest prefix matching - in table use regex to forward ranges of addresses to link interface -
Virtual Circuit - network-layer connection service (e.g. ATM)
each packet carries VC id (not destination host address)
every router on source-dest path maintains "state" for each connection
VC number can be changed on each link (new VC number comes from forwarding table)
A VC consists of:- path from src to dest
- VC numbers, one number for each link along path
- entries in forwarding tables in routers along path
Internet (datagram) | ATM (VC) |
---|---|
data exchange among computers | evolved from telephony |
"smart" end systems (control, error recovery) | human conversion (strict timing, reliability, guaranteed service) |
many link types (uniform service difficult) | "dumb" end systems (e.g. telephones) - complexity inside network |
- 32 bit IP address
- 20 bytes IP overhead + 20 bytes TCP overhead
- Three types of switching fabrics:
- memory
switching under direct control of CPU
limited by memory bandwidth (2 bus crossings per datagram) - bus
limited by bus bandwidth
32 Gbps by Cisco 5600: sufficient speed for enterprise - crossbar (mutliplexer)
60 Gbps by Cisco 12000: f*ck yeh
- memory
- Links have MTU (max. transfer size). Some links fragment datagrams and destination reassembles them
When fragmenting, length = data.length + 20 bytes overhead => 4000b = 1480b + 1480b + 1040b, where length = (1500, 1500, 1060)
Offset = data.length/8 = (0, 185, 370) - IPv4 addressing
- Subnets: high order bits, device don't need router to reach each other
- CIDR (Classless InterDomain Routing)
- ICMP addressing ()
- Links State (LS): Dijkstra's algorithm
Given the cost between neighbors, traverse the net and find optimal path - Distance Vector: Dx(y) = min{Cx(v)+Dv(y)}
nodes exchange tables and adjust values
Metric | Link State (LS) | Distance Vector (DV) |
---|---|---|
Msg complexity | O(nE), n nodes, E links | Msgs between neighbors only |
Spd of convergence | O(n^2), may oscillate | varies, count-to-infinity problem |
Robustness (rtr malfunction) | node advertises incorrect costs | DV node can advertise incorrect path cost |
each node computes its own table | error propagate through net |
- Hierarchical algorithm: aggregate routers into regions, "autonomous systems (ASs)"
routers in same region run same protocol, "intra-AS protocol"
gateway router - at edge of AS, has link to router in another AS
Greedy search
-
Routing Information Protocol (RIP): distance vector algorithm
Distance metric: number of hops
Routing table for router X:Destination subnet: A
Next router: Y
# hops to dest: n
If no advertisement within 180sec, remove router and destroy links using router. Send updated table to neighbors
RIP routing managed by application-level process called route-d
advertisement sent using UDP, periodically repeated
- Open Shortest Path First (OSPF)
publicly available
uses Link State algorithm
advertisements disseminated to entire AS (via flooding)
carried in OSPF messages directly over IP, rather than TCP or UDP
Advantages over RIP:
hierarchical OSFP in large domains
multiple same-cost paths allowed (only one in RIP)
for each link, multiple metrics (e.g. best effort, real-time)
integrated uni- and multi-cast support (MOSPF)
security - all OSPF messages authenticated
Hierarchy
local area: LS advertisements backbone
area border routers: "summarize" distances to nets in local area, advertise to other area border routers backbone routers: run OSPF routing limited to backbone boundary routers: connect to other AS's
- Interior Gateway Routing Protocol (IGRP) - Cisco proprietary
the de facto inter-domain routing algorithm (the one that glues the Internet together)
BGP provides each AS means to:
eBGP: obtain subnet reachability information from neighboring ASs
iBGP: propagate reachability to all AS-internal routers
obtain "good" routes to other networks based on reachability information and other policies
Why different Intra- and Inter-AS routing?
-
Policy:
- Inter-AS domain: admin wants control over how its traffic is routed, who routes through its net
- Intra-AS: single admin, so no policy decisions needed
-
Scale:
- Hierarchical routing saves table size, reduced update traffic
-
Performance:
- Inter-AS: policy may dominate over performance
- Intra-AS: can focus on performance
-
-
Parity checking
Single bit parity
Two dimensional bit parity: detect and correct single bit error -
Internet checksum: detect errors in transmitted packet
Cyclic Redundancy Check (CRC)
Packet - <D,R>
Generator - G; G.length = r+1
R s.t. <D,R> divisible by G
Receiver divides <D,R> by G, if non-zero reminder - error!
-
Link types:
point-to-point - dial-up, Ethernet
broadcast - old-fashioned Ethernet, Wi-Fi, etc. -
Multiple Access Protocols
single shared broadcast channel
two or more simultaneous transmissions by nodes
distributed algorithm that determines how nodes share channel, i.e. determine when nodes can transmit
communication about channel sharing must use channel itself -
Ideal Multiple Access Protocol
- efficient - channel speed - R bps, 1 node, send at rate R
- fair - M nodes send at R/M
- fully decentralized - no special node to coordinate transmission, no sync of clocks, slots
- simple
-
MAC protocols
- Channel Partitioning:
divide channel into smaller pieces (time slots, frequency, code)
allocate piece to node for exclusive use
a priory decision assuming the works case in terms of #users and uniform use by all usersTime Division Multiple Access (TDMA)
access to channels in "rounds"
each gets fixed length sloth
unused slots stay idleFrequency Division Multiple Access (FDMA)
channel spectrum divided into freq bands
each station assigned fixed band
unused transmission time in freq band goes idle- Random Access
channel not divided, collisions allowed
"recover" from collisionsSlotted ALOHA
Assumptions Operation all frames same size when node obtains frame, transfers in next slot time divided into equal slots no collision: send in next slot nodes start to transmit only at slot beginning collision: node retransmits in each cons. slot until nodes are synchronized if >=2 nodes transm. in same slot, all detect collision Pros Cons single active node transmit at full speed collisions, wasting slots decentralized: only slots in nodes need be in sync idle slots simple nodes may detect collision in less than time to transm clock sync Efficiency - long-run fraction of successful slots, many nodes, many frames to send
NB: At best channel used 37% if the time!Pure (unslotted) ALOHA
simpler
no synchronization
when frame first arrives - transmit immediately
NB: At best channel used 18% if the time! Even worse!Carrier Sense Multiple Access (CSMA)
listen before transmit
if channel sensed idle - transmit frame
else - deffer transmissionCSMA\CD (Collision Detection)
collision detected within short time
colliding transmissions aborted, reducing channel wastage
CD easy with wired LAN (signal strength), difficult in wireless LAN- "Taking Turns"
nodes take turns, nodes with more to send may take longer turns
"adaptive round-robin"Polling
Master node invites Slave nodes to transmit in turns
typically used with dumb slaves
Concerns: polling overhead, latency, single point of failure (master)Token Passing
control token passed from one node to another sequentially
Concerns: token overhead, latency, single point of failure (token) - Channel Partitioning:
Many different Ethernet standards
Common MAC protocol and frame format
Different speed: 1 Mbps, 10Mbps, 100Mbps, 1Gbps, 10Gbps
different physical layer media: fiber, cable
Star topology: switch in center, each client runs on separate line
Ethernet Frame
| Preamble | Dest. Addr. | Source Addr. | Type | Data | CRC |
Preamble used to sync sender and receiver clock rates
Manchester Encoding - each bit has voltage transition, allows for clock sync
Connectionless: no handshake between sending and receiving NICs
unreliable: no ACKs or NACKs. Gaps are possible - to be filled if using TCP
Ethernet's MAC protocol: unslotted CSMA/CD
-
Hubs - physical-layer repeaters
bits coming in one ling go out all other links at same rate
all nodes can collide with one another
no frame buffering
no CSMA/CD, host NICs detect collisions -
Switches - link-layer device - smarten than hubs, take active role
store and forward Ethernet frames
examine incoming frame's MAC address, selectively forward to one-or-more links, use CSMA/CD
transparent - hosts are unaware of presence of switches
plug-and-play, self-learning - switches do not need to be configuredallow multiple simultaneous transmissions
hosts have dedicated direct connection to switch
switch buffers packets
full duplexlearning of MAC addresses:
- record link associated with sending host
- index switch table using MAC dest address
- if entry found - send or update
else - flood (forward on all but receiving interface)
-
Switches vs. Routers
Both store-and-forward devices:
Routers | Switches |
---|---|
network-level | link-level |
maintain routing tables | maintain switch tables |
implement routing algorithms | implement filtering and learning algorithms |
- VLANs
Port-based VLAN: switch port grouped by software so that single physical switch operates as multiple virtual switches.
traffic isolation: frames to/from some port range can only reach these ports. Can also define VLAN based on MAC addresses of endpoints.
dynamic membership: ports can be dynamically assigned among VLANs.
forwarding between VLANs: done via routing (just as with separate switches)