test patch D49540

patch link: D49540

Contents

test patch D49540

VM environment with a 1GE switch

single path testbed

Virtual machines (VMs) as TCP traffic senders using iperf3 are hosted by Bhyve in a physical box (Beelink SER5 AMD Mini PC). This box uses FreeBSD 14.2 release. Another box that is the same type uses Ubuntu Linux desktop version. The two boxes are connected through a 5-Port Gigabit Ethernet Switch (TP-Link TL-SG105). The switch has a shared 1Mb (125KB) Packet Buffer Memory.

testbed:

functionality show case

This test will show a general TCP congestion control window (cwnd) growth pattern before/after the patch. And also compare with the cwnd growth pattern from the Linux kernel.

additional 40ms latency is added/emulated in the receiver (Ubuntu Linux with traffic shaping: The minimum bandwidth delay product (BDP) is 1000Mbps x 40ms == 5 Mbytes. So I configured sender/receiver with 10MB buffer.)

cc@Linux:~ % sudo tc qdisc add dev enp1s0 root netem delay 40ms
cc@Linux:~ % tc qdisc show dev enp1s0
qdisc netem 8001: root refcnt 2 limit 1000 delay 40ms
cc@Linux:~ %

root@n1fbsd:~ # ping -c 4 -S n1fbsd Linux
PING Linux (192.168.50.46) from n1fbsdvm: 56 data bytes
64 bytes from 192.168.50.46: icmp_seq=0 ttl=64 time=41.053 ms
64 bytes from 192.168.50.46: icmp_seq=1 ttl=64 time=41.005 ms
64 bytes from 192.168.50.46: icmp_seq=2 ttl=64 time=41.011 ms
64 bytes from 192.168.50.46: icmp_seq=3 ttl=64 time=41.168 ms

--- Linux ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 41.005/41.059/41.168/0.066 ms
root@n1fbsd:~ #

root@n1ubuntu24:~ # ping -c 4 -I 192.168.50.161 Linux
PING Linux (192.168.50.46) from 192.168.50.161 : 56(84) bytes of data.
64 bytes from Linux (192.168.50.46): icmp_seq=1 ttl=64 time=40.9 ms
64 bytes from Linux (192.168.50.46): icmp_seq=2 ttl=64 time=40.9 ms
64 bytes from Linux (192.168.50.46): icmp_seq=3 ttl=64 time=41.0 ms
64 bytes from Linux (192.168.50.46): icmp_seq=4 ttl=64 time=41.1 ms

--- Linux ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3061ms
rtt min/avg/max/mdev = 40.905/40.981/41.108/0.078 ms
root@n1ubuntu24:~ #

root@n1fbsd:~ # sysctl -f /etc/sysctl.conf
net.inet.tcp.hostcache.enable: 0 -> 0
kern.ipc.maxsockbuf: 10485760 -> 10485760
net.inet.tcp.sendbuf_max: 10485760 -> 10485760
net.inet.tcp.recvbuf_max: 10485760 -> 10485760
root@n1fbsd:~ #

root@n1ubuntu24:~ # sysctl -p
net.core.rmem_max = 10485760
net.core.wmem_max = 10485760
net.ipv4.tcp_rmem = 4096 131072 10485760
net.ipv4.tcp_wmem = 4096 16384 10485760
net.ipv4.tcp_no_metrics_save = 1
root@n1ubuntu24:~ #

cc@Linux:~ % sudo sysctl -p
net.core.rmem_max = 10485760
net.core.wmem_max = 10485760
net.ipv4.tcp_rmem = 4096 131072 10485760
net.ipv4.tcp_wmem = 4096 16384 10485760
net.ipv4.tcp_no_metrics_save = 1
cc@Linux:~ %

info for switching to the FreeBSD RACK TCP stack

root@n1fbsd:~ # kldstat
Id Refs Address                Size Name
 1    5 0xffffffff80200000  1f75ca0 kernel
 2    1 0xffffffff82810000    368d8 tcp_rack.ko
 3    1 0xffffffff82847000     f0f0 tcphpts.ko
root@n1fbsd:~ # sysctl net.inet.tcp.functions_default=rack
net.inet.tcp.functions_default: freebsd -> rack
root@n1fbsd:~ #

FreeBSD VM sender info	FreeBSD 15.0-CURRENT (GENERIC) #0 main-6f6c07813b38: Fri Mar 21 2025
Linux VM sender info	Ubuntu server 24.04.2 LTS (GNU/Linux 6.8.0-55-generic x86_64)
Linux receiver info	Ubuntu desktop 24.04.2 LTS (GNU/Linux 6.11.0-19-generic x86_64)

iperf3 command

receiver: iperf3 -s -p 5201 --affinity 1
sender:   iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -p 5201 -l 1M -t 300 -i 1 -f m -VC ${name}

CUBIC growth pattern before/after patch in FreeBSD default stack:

CUBIC growth pattern before/after patch in FreeBSD RACK stack:

For reference, the CUBIC in Linux kernel growth pattern is also attached:

single flow 300 seconds performance with 1Gbps x 40ms BDP

TCP CC	version	iperf3 single flow 300s average throughput	comment
CUBIC in freebsd default stack	before patch	715 Mbits/sec	base
CUBIC in freebsd default stack	after patch	693 Mbits/sec	a benign -3.1% performance lose (maybe caused by the RTO between 60s and 70s), but cwnd growth has finer gratitude and less delta (grows smoother)
CUBIC in freebsd RACK stack	before patch	738 Mbits/sec	base
CUBIC in freebsd RACK stack	after patch	730 Mbits/sec	negligible -1.1% performance lose
CUBIC in Linux	kernel 6.8.0	725 Mbits/sec	reference

CUBIC cwnd and throughput before patch in FreeBSD default stack:

CUBIC cwnd and throughput after patch in FreeBSD default stack:

CUBIC cwnd and throughput before patch in FreeBSD RACK stack:

CUBIC cwnd and throughput after patch in FreeBSD RACK stack:

CUBIC cwnd and throughput in Linux stack:

loss resilience in LAN test (no additional latency added)

Similar to testD46046, this configure tests TCP CUBIC in the VM traffic sender, with 1%, 2%, 3% or 4% incoming packet drop rate at the Linux receiver. No additional latency is added.

root@n1fbsd:~ # ping -c 4 -S n1fbsd Linux
PING Linux (192.168.50.46) from n1fbsdvm: 56 data bytes
64 bytes from 192.168.50.46: icmp_seq=0 ttl=64 time=0.499 ms
64 bytes from 192.168.50.46: icmp_seq=1 ttl=64 time=0.797 ms
64 bytes from 192.168.50.46: icmp_seq=2 ttl=64 time=0.860 ms
64 bytes from 192.168.50.46: icmp_seq=3 ttl=64 time=0.871 ms

--- Linux ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.499/0.757/0.871/0.151 ms
root@n1fbsd:~ #

root@n1ubuntu24:~ # ping -c 4 -I 192.168.50.161 Linux
PING Linux (192.168.50.46) from 192.168.50.161 : 56(84) bytes of data.
64 bytes from Linux (192.168.50.46): icmp_seq=1 ttl=64 time=0.528 ms
64 bytes from Linux (192.168.50.46): icmp_seq=2 ttl=64 time=0.583 ms
64 bytes from Linux (192.168.50.46): icmp_seq=3 ttl=64 time=0.750 ms
64 bytes from Linux (192.168.50.46): icmp_seq=4 ttl=64 time=0.590 ms

--- Linux ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3165ms
rtt min/avg/max/mdev = 0.528/0.612/0.750/0.082 ms
root@n1ubuntu24:~ #

## list of different packet loss rate
cc@Linux:~ % sudo iptables -A INPUT -p tcp --dport 5201 -m statistic --mode nth --every 100 --packet 0 -j DROP
cc@Linux:~ % sudo iptables -A INPUT -p tcp --dport 5201 -m statistic --mode nth --every 50 --packet 0 -j DROP
cc@Linux:~ % sudo iptables -A INPUT -p tcp --dport 5201 -m statistic --mode nth --every 33 --packet 0 -j DROP
cc@Linux:~ % sudo iptables -A INPUT -p tcp --dport 5201 -m statistic --mode nth --every 25 --packet 0 -j DROP
...

iperf3 command

cc@Linux:~ % iperf3 -s -p 5201 --affinity 1
root@n1fbsd:~ # iperf3 -B n1fbsd --cport 54321 -c Linux -p 5201 -l 1M -t 30 -i 1 -f m -VC cubic

iperf3 single flow 30s average throughput under different packet loss rate at the Linux receiver

TCP CC	version	1% loss rate	2% loss rate	3% loss rate	4% loss rate	comment
CUBIC in freebsd default stack	before patch	854 Mbits/sec	331 Mbits/sec	53.1 Mbits/sec	22.9 Mbits/sec
CUBIC in freebsd default stack	after patch	814 Mbits/sec	150 Mbits/sec	45.1 Mbits/sec	24.2 Mbits/sec
NewReno in freebsd default stack	15-CURRENT	814 Mbits/sec	110 Mbits/sec	40.4 Mbits/sec	21.5 Mbits/sec

CUBIC in freebsd RACK stack	before patch	552 Mbits/sec	353 Mbits/sec	185 Mbits/sec	130 Mbits/sec
CUBIC in freebsd RACK stack	after patch	515 Mbits/sec	337 Mbits/sec	181 Mbits/sec	127 Mbits/sec
NewReno in freebsd RACK stack	15-CURRENT	462 Mbits/sec	281 Mbits/sec	163 Mbits/sec	118 Mbits/sec

CUBIC in Linux	kernel 6.8.0	665 Mbits/sec	645 Mbits/sec	550 Mbits/sec	554 Mbits/sec	reference
NewReno in Linux	kernel 6.8.0	924 Mbits/sec	862 Mbits/sec	740 Mbits/sec	599 Mbits/sec	reference

tri-point topology testbed

Now there are two virtual machines (VMs) as traffic senders at the same time. They are hosted by Bhyve in two separate physical boxs (Beelink SER5 AMD Mini PCs). The traffic receiver is the same Linux box as before, for the simplicity of a tri-point topology.

The three physical boxes are connected through a 5 Port Gigabit Ethernet Switch (TP-Link TL-SG105). The switch has a shared 1Mb (0.125MB) Packet Buffer Memory, which is just 2.5% of the 5 Mbytes BDP (1000Mbps x 40ms).

testbed:

additional 40ms latency is added/emulated in the receiver and senders/receiver have 2.5MB buffer (50% BDP)
iperf3 commands

iperf3 -s -p 5201
iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -p 5201 -l 1M -t 200 -i 1 -f m -VC ${name}
iperf3 -s -p 5202
iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -p 5202 -l 1M -t 200 -i 1 -f m -VC ${name}

FreeBSD VM sender1 & sender2 info	FreeBSD 15.0-CURRENT (GENERIC) #0 main-6f6c07813b38: Fri Mar 21 2025
Linux VM sender1 & sender2 info	Ubuntu server 24.04.2 LTS (GNU/Linux 6.8.0-55-generic x86_64)
Linux receiver info	Ubuntu desktop 24.04.2 LTS (GNU/Linux 6.11.0-19-generic x86_64)

link utilization under two competing TCP flows

the link utilization of two competing TCP flows starting at the same time from each VM

TCP CC	version	flow1 average throughput	flow2 average throughput	link utilization: average(flow1 + flow2)	TCP fairness	comment
CUBIC in freebsd default stack	before patch	31.1 Mbits/sec	31.5 Mbits/sec	62.6 Mbits/sec	100%
CUBIC in freebsd default stack	after patch	23.5 Mbits/sec	21.0 Mbits/sec	44.5 Mbits/sec	99.7%
NewReno in freebsd default stack	15-CURRENT	21.1 Mbits/sec	24.2 Mbits/sec	45.3 Mbits/sec	99.5%

CUBIC in freebsd RACK stack	before patch	21.8 Mbits/sec	21.4 Mbits/sec	43.2 Mbits/sec	100%
CUBIC in freebsd RACK stack	after patch	19.4 Mbits/sec	18.4 Mbits/sec	37.8 Mbits/sec	99.9%
NewReno in freebsd RACK stack	15-CURRENT	16.0 Mbits/sec	15.8 Mbits/sec	31.8 Mbits/sec	100%

CUBIC in Linux	kernel 6.8.0	19.2 Mbits/sec	19.2 Mbits/sec	38.4 Mbits/sec	100%	reference
NewReno in Linux	kernel 6.8.0	20.4 Mbits/sec	20.5 Mbits/sec	40.9 Mbits/sec	100%	reference

CUBIC cwnd and throughput before patch in FreeBSD default stack:

CUBIC cwnd and throughput after patch in FreeBSD default stack:

NewReno cwnd and throughput in FreeBSD default stack:

CUBIC cwnd and throughput before patch in FreeBSD RACK stack:

CUBIC cwnd and throughput after patch in FreeBSD RACK stack:

NewReno cwnd and throughput in FreeBSD RACK stack:

CUBIC cwnd and throughput in Linux:

NewReno cwnd and throughput in Linux:

VM environment with a 1GE router

tri-point topology testbed

The three physical boxes are connected through a 5-Port Gigabit Router (EdgeRouter X).

testbed:

Turns out the internal virtual interface itf0 used by EdgeOS for inter-process communication is dropping packets before the qdisc (queueing discipline) level takes effects. So we could not see any TCP vs. AQM policy effects on the bottleneck. Even with MAX MTU=2018, the qdisc still could not take effects. I also found that the five Gigabit Ethernet ports are connected through a single internal switch chip, which may introduce limitations in certain configurations. In my test with MAX MTU=2018 and subnet configurate in wiki page Configure a 1Gbps EdgeRouter-X, iperf3 reports the max TCP performance through its IPv4 forwarding rate at 404 Mbits/sec.

Therefore, we can only see the TCP performance under this unknown inter-fabric buffer, instead of AQM.

for example:

ubnt@EdgeRouter-X-5-Port:~$ ip -s link show itf0
2: itf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2018 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 80:2a:a8:5e:13:f7 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    18897611234 18722289 0       24580   0       0                              <<< RX dropped = packets dropped before reaching the qdisc
    TX: bytes  packets  errors  dropped carrier collsns
    18974731668 18571885 0       0       0       0
ubnt@EdgeRouter-X-5-Port:~$ sudo tc -s qdisc show dev itf0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 18826039239 bytes 18571902 pkt (dropped 0, overlimits 0 requeues 530)     <<< if it is dropped here, it means packets dropped by the queue itself, usually because it’s full
 backlog 0b 0p requeues 530
ubnt@EdgeRouter-X-5-Port:~$
ubnt@EdgeRouter-X-5-Port# ip -s link show eth1
5: eth1@itf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2018 qdisc pfifo state UP mode DEFAULT group default qlen 1000
    link/ether 80:2a:a8:5e:13:f3 brd ff:ff:ff:ff:ff:ff
    alias Local
    RX: bytes  packets  errors  dropped overrun mcast
    13075123883 9714165  0       0       0       70
    TX: bytes  packets  errors  dropped carrier collsns
    3966141427 6636939  0       0       0       0
[edit]
ubnt@EdgeRouter-X-5-Port# ip -s link show eth2
6: eth2@itf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2018 qdisc pfifo state UP mode DEFAULT group default qlen 1000
    link/ether 80:2a:a8:5e:13:f4 brd ff:ff:ff:ff:ff:ff
    alias Local
    RX: bytes  packets  errors  dropped overrun mcast
    6654162760 8064107  0       0       0       139
    TX: bytes  packets  errors  dropped carrier collsns
    12910616151 10469529 0       0       0       0
[edit]
ubnt@EdgeRouter-X-5-Port# ip -s link show eth3
7: eth3@itf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2018 qdisc pfifo state UP mode DEFAULT group default qlen 1000
    link/ether 80:2a:a8:5e:13:f5 brd ff:ff:ff:ff:ff:ff
    alias Local
    RX: bytes  packets  errors  dropped overrun mcast
    7385125291 8078869  0       0       0       160
    TX: bytes  packets  errors  dropped carrier collsns
    10652730803 8729464  0       0       0       0
[edit]
ubnt@EdgeRouter-X-5-Port#
ubnt@EdgeRouter-X-5-Port# sudo tc -s qdisc show dev eth1
qdisc pfifo 8007: root refcnt 2 limit 3500p
 Sent 3906923804 bytes 5905040 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
[edit]
ubnt@EdgeRouter-X-5-Port# sudo tc -s qdisc show dev eth2
qdisc pfifo 8008: root refcnt 2 limit 3500p
 Sent 12887938442 bytes 10128384 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
[edit]
ubnt@EdgeRouter-X-5-Port# sudo tc -s qdisc show dev eth3
qdisc pfifo 8009: root refcnt 2 limit 3500p
 Sent 7430101832 bytes 6590019 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
[edit]

additional 40ms latency is added/emulated in the receiver and senders/receiver have 5MB buffer (100% BDP)

root@n1fbsd:~ # ping -c 4 Linux
PING Linux (192.168.3.10): 56 data bytes
64 bytes from 192.168.3.10: icmp_seq=0 ttl=63 time=41.228 ms
64 bytes from 192.168.3.10: icmp_seq=1 ttl=63 time=41.275 ms
64 bytes from 192.168.3.10: icmp_seq=2 ttl=63 time=41.190 ms
64 bytes from 192.168.3.10: icmp_seq=3 ttl=63 time=41.436 ms

--- Linux ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 41.190/41.282/41.436/0.094 ms
root@n1fbsd:~ #

root@n2fbsd:~ # ping -c 4 Linux
PING Linux (192.168.3.10): 56 data bytes
64 bytes from 192.168.3.10: icmp_seq=0 ttl=63 time=41.165 ms
64 bytes from 192.168.3.10: icmp_seq=1 ttl=63 time=41.304 ms
64 bytes from 192.168.3.10: icmp_seq=2 ttl=63 time=40.898 ms
64 bytes from 192.168.3.10: icmp_seq=3 ttl=63 time=41.280 ms

--- Linux ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 40.898/41.162/41.304/0.161 ms
root@n2fbsd:~ #

root@n1fbsd:~ # sysctl -f /etc/sysctl.conf
net.inet.tcp.hostcache.enable: 0 -> 0
kern.ipc.maxsockbuf: 10485760 -> 10485760
net.inet.tcp.sendbuf_max: 5242880 -> 5242880
net.inet.tcp.recvbuf_max: 5242880 -> 5242880
root@n1fbsd:~ #

root@n2fbsd:~ # sysctl -f /etc/sysctl.conf
net.inet.tcp.hostcache.enable: 0 -> 0
kern.ipc.maxsockbuf: 10485760 -> 10485760
net.inet.tcp.sendbuf_max: 5242880 -> 5242880
net.inet.tcp.recvbuf_max: 5242880 -> 5242880
root@n2fbsd:~ #

cc@Linux:~ % sudo sysctl -p
net.ipv4.tcp_no_metrics_save = 1
net.core.rmem_max = 10485760
net.core.wmem_max = 10485760
net.ipv4.tcp_rmem = 4096 131072 5242880
net.ipv4.tcp_wmem = 4096  16384 5242880
cc@Linux:~ %

iperf3 commands

iperf3 -s -p 5201
iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -p 5201 -l 1M -t 300 -i 1 -f m -VC ${name}
iperf3 -s -p 5202
iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -p 5202 -l 1M -t 300 -i 1 -f m -VC ${name}

FreeBSD VM sender1 & sender2 info	FreeBSD 15.0-CURRENT (GENERIC) #0 main-6f6c07813b38: Fri Mar 21 2025
Linux VM sender1 & sender2 info	Ubuntu server 24.04.2 LTS (GNU/Linux 6.8.0-58-generic x86_64)
Linux receiver info	Ubuntu desktop 24.04.2 LTS (GNU/Linux 6.11.0-19-generic x86_64)

the link utilization of two competing TCP flows starting at the same time from each VM

TCP CC	version	flow1 average throughput	flow2 average throughput	link utilization: average(flow1 + flow2)	TCP fairness	additional stats	comment
CUBIC in freebsd default stack	before patch	80.1 Mbits/sec	61.5 Mbits/sec	141.6 Mbits/sec	98.3%	flow1: avg_srtt: 99002, max_srtt: 180000 µs flow2: avg_srtt: 107239, max_srtt: 172343 µs
CUBIC in freebsd default stack	after patch	66.8 Mbits/sec	74.2 Mbits/sec	141.0 Mbits/sec	99.7%	flow1: avg_srtt: 125410, max_srtt: 170625 µs flow2: avg_srtt: 124167, max_srtt: 173906 µs
NewReno in freebsd default stack	15-CURRENT	76.5 Mbits/sec	65.5 Mbits/sec	142.0 Mbits/sec	99.4%	flow1: avg_srtt: 97936, max_srtt: 173437 µs flow2: avg_srtt: 104448, max_srtt: 155312 µs

CUBIC in freebsd RACK stack	before patch	68.1 Mbits/sec	75.4 Mbits/sec	143.5 Mbits/sec	99.7%	flow1: avg_srtt: 116167, max_srtt: 190875 µs flow2: avg_srtt: 114196, max_srtt: 188734 µs
CUBIC in freebsd RACK stack	after patch	75.4 Mbits/sec	67.7 Mbits/sec	143.1 Mbits/sec	99.7%	flow1: avg_srtt: 107442, max_srtt: 194170 µs flow2: avg_srtt: 110193, max_srtt: 206855 µs
NewReno in freebsd RACK stack	15-CURRENT	68.8 Mbits/sec	73.9 Mbits/sec	142.7 Mbits/sec	99.9%	flow1: avg_srtt: 100263, max_srtt: 170952 µs flow2: avg_srtt: 104037, max_srtt: 161804 µs	Above flows' srtt: 98ms ~ 125ms. So the router's inter-fabric buffer is estimated as around 2.5x ~ 3x BDP (12.5MB ~ 15MB).

CUBIC in Linux	kernel 6.8.0	108 Mbits/sec	125 Mbits/sec	233 Mbits/sec	99.5%	flow1: avg_srtt: 73931.1, max_srtt: 126305 µs flow2: avg_srtt: 57389.8, max_srtt: 90480 µs	As reference, I believe Linux's TCP small queue plays a critical role in reducing queueing delay.
NewReno in Linux	kernel 6.8.0	86.8 Mbits/sec	149.0 Mbits/sec	235.8 Mbits/sec	93.5%	flow1: avg_srtt: 76593.3, max_srtt: 142904 µs flow2: avg_srtt: 60543.5, max_srtt: 109339 µs	average queueing delay: 60ms ~ 75ms

CUBIC cwnd and throughput before patch in FreeBSD default stack:

CUBIC cwnd and throughput after patch in FreeBSD default stack:

NewReno cwnd and throughput in FreeBSD default stack:

CUBIC cwnd and throughput before patch in FreeBSD RACK stack:

CUBIC cwnd and throughput after patch in FreeBSD RACK stack:

NewReno cwnd and throughput in FreeBSD RACK stack:

CUBIC cwnd and throughput in Linux:

NewReno cwnd and throughput in Linux:

Emulab bare metal testbed with a 1GE Linux router

Testbed in a dumbbell topology (pc3000 nodes with 1Gbps links) used here are from Emulab.net: Emulab hardware

testbed design

The topology is a dumbbell with nodes (s1, s2) acting as TCP traffic senders using iperf3, nodes (rt1, rt2) serving as routers with a single bottleneck link, and nodes (r1, r2) functioning as TCP traffic receivers. Nodes s1, s2 are running the OS targets, and nodes r1, r2 are running Ubuntu Linux 24.04 to take advantage of ACK compression. Router nodes rt1 and rt2 are running Ubuntu Linux 24.04, with NIC and Layer 3 tuning to optimize IP forwarding. For instance, the bottleneck interface can be configured with shallow L2 and L3 buffers totaling approximately 96 packets (48 for L2 and 48 for L3). A dummynet box introduces a 40ms round-trip delay (RTT) on the bottleneck link. Each sender transmits data to its corresponding receiver (e.g., s1 -> r1, s2 -> r2).

All the physical links have 1Gbps bandwidth. A middle box running Dummynet adds 40ms delay emulation that at least provides a 1000Mbps x 40ms == 5MB Bandwidth Delay Product(BDP) environment. So I configured sender/receiver with 10MB TCP buffer. Active Queueing Management (AQM) is applied on the bottleneck port of rt1.

testbed topology:

40ms propagation delay and senders/receivers have 10MB buffer (200% BDP)

root@s1:~# ping -c 4 r1
PING r1-link5 (10.1.4.3) 56(84) bytes of data.
64 bytes from r1-link5 (10.1.4.3): icmp_seq=1 ttl=62 time=40.1 ms
64 bytes from r1-link5 (10.1.4.3): icmp_seq=2 ttl=62 time=40.0 ms
64 bytes from r1-link5 (10.1.4.3): icmp_seq=3 ttl=62 time=40.1 ms
64 bytes from r1-link5 (10.1.4.3): icmp_seq=4 ttl=62 time=40.0 ms

--- r1-link5 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 40.039/40.070/40.120/0.031 ms
root@s1:~#
root@s2:~# ping -c 4 r2
PING r2-link6 (10.1.1.3) 56(84) bytes of data.
64 bytes from r2-link6 (10.1.1.3): icmp_seq=1 ttl=62 time=40.2 ms
64 bytes from r2-link6 (10.1.1.3): icmp_seq=2 ttl=62 time=40.1 ms
64 bytes from r2-link6 (10.1.1.3): icmp_seq=3 ttl=62 time=40.1 ms
64 bytes from r2-link6 (10.1.1.3): icmp_seq=4 ttl=62 time=40.0 ms

--- r2-link6 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 40.033/40.107/40.214/0.071 ms
root@s2:~#

root@s1:~ # sysctl -f /etc/sysctl.conf
kern.log_console_output: 0 -> 0
net.inet.tcp.hostcache.enable: 0 -> 0
kern.ipc.maxsockbuf: 10485760 -> 10485760
net.inet.tcp.sendbuf_max: 10485760 -> 10485760
net.inet.tcp.recvbuf_max: 10485760 -> 10485760
root@s1:~ #

root@s2:~ # sysctl -f /etc/sysctl.conf
kern.log_console_output: 0 -> 0
net.inet.tcp.hostcache.enable: 0 -> 0
kern.ipc.maxsockbuf: 10485760 -> 10485760
net.inet.tcp.sendbuf_max: 10485760 -> 10485760
net.inet.tcp.recvbuf_max: 10485760 -> 10485760
root@s2:~ #

cc@r1:~ % sudo sysctl -p
net.ipv4.tcp_no_metrics_save = 1
net.core.rmem_max = 10485760
net.core.wmem_max = 10485760
net.ipv4.tcp_rmem = 4096 131072 10485760
net.ipv4.tcp_wmem = 4096  16384 10485760
cc@r1:~ %

cc@r2:~ % sudo sysctl -p
net.ipv4.tcp_no_metrics_save = 1
net.core.rmem_max = 10485760
net.core.wmem_max = 10485760
net.ipv4.tcp_rmem = 4096 131072 10485760
net.ipv4.tcp_wmem = 4096  16384 10485760
cc@r2:~ %

iperf3 commands

receiver: iperf3 -s -p 5201 --affinity 1
sender:   iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -p 5201 -l 1M -t 300 -i 1 -f m -VC ${name}

FreeBSD s1 & s2 info	FreeBSD 15.0-CURRENT (GENERIC) #0 main-d4763484f911: Mon Apr 21 2025
Linux r1 & r2 info	Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-53-generic x86_64)
Linux rt1 & rt2 info	Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-53-generic x86_64) with `rx/tx` driver and `qdisc` tuning for IP forwarding performance

link utilization under two competing TCP flows

need to mention about the Linux qdisc on dropping packets in the bottleneck port, for example after 300 seconds of two competing TCP flows traffic:

root@rt1:~# tc -s qdisc show dev eth2
qdisc pfifo 8007: root refcnt 2 limit 256p
 Sent 105132572 bytes 1592669 pkt (dropped 0, overlimits 0 requeues 8) 
 backlog 0b 0p requeues 8
root@rt1:~# tc -s qdisc show dev eth4
qdisc pfifo 8009: root refcnt 2 limit 256p
 Sent 119886021 bytes 1816294 pkt (dropped 0, overlimits 0 requeues 3) 
 backlog 0b 0p requeues 3
root@rt1:~# tc -s qdisc show dev eth5
qdisc pfifo 800a: root refcnt 2 limit 48p
 Sent 20842988722 bytes 13766899 pkt (dropped 789, overlimits 0 requeues 58847)  <<< bottleneck drops
 backlog 0b 0p requeues 58847
root@rt1:~#

link utilization of two competing TCP flows under a bottleneck of rx/tx=48p and qdisc pfifo mode =48p (MTU=1500: L2+L3 buffer = 96 packets = 140KB = 0.27%BDP), all the other router ports are using the default rx/tx=256p and qdisc pfifo=256p

TCP CC	version	flow1 average throughput	flow2 average throughput	link utilization: average(flow1 + flow2)	TCP fairness	comment
CUBIC in freebsd default stack	before patch	92.5 Mbits/sec	101.0 Mbits/sec	193.5 Mbits/sec	99.8%	base
CUBIC in freebsd default stack	after patch	199.0 Mbits/sec	211.0 Mbits/sec	410.0 Mbits/sec	99.9%	+111.9%
NewReno in freebsd default stack	15-CURRENT	95.4 Mbits/sec	83.9 Mbits/sec	179.3 Mbits/sec	99.6%

CUBIC in freebsd RACK stack	before patch	122.0 Mbits/sec	129.0 Mbits/sec	251.0 Mbits/sec	99.9%	base
CUBIC in freebsd RACK stack	after patch	117.0 Mbits/sec	129.0 Mbits/sec	246.0 Mbits/sec	99.8%	-2.0%
NewReno in freebsd RACK stack	15-CURRENT	95.1 Mbits/sec	106.0 Mbits/sec	201.1 Mbits/sec	99.7%

CUBIC in Linux	kernel 6.8.0	304.0 Mbits/sec	222.0 Mbits/sec	526.0 Mbits/sec	97.6%	reference
NewReno in Linux	kernel 6.8.0	199.0 Mbits/sec	247.0 Mbits/sec	446.0 Mbits/sec	98.9%	reference