test patch D49540

patch link: D49540


VM environment with a 1GE switch

single path testbed

Virtual machines (VMs) as TCP traffic senders using iperf3 are hosted by Bhyve in a physical box (Beelink SER5 AMD Mini PC). This box uses FreeBSD 14.2 release. Another box that is the same type uses Ubuntu Linux desktop version. The two boxes are connected through a 5-Port Gigabit Ethernet Switch (TP-Link TL-SG105). The switch has a shared 1Mb (125KB) Packet Buffer Memory.

testbed: attachment:single-path_topology.png

functionality show case

This test will show a general TCP congestion control window (cwnd) growth pattern before/after the patch. And also compare with the cwnd growth pattern from the Linux kernel.

cc@Linux:~ % sudo tc qdisc add dev enp1s0 root netem delay 40ms
cc@Linux:~ % tc qdisc show dev enp1s0
qdisc netem 8001: root refcnt 2 limit 1000 delay 40ms
cc@Linux:~ %

root@n1fbsd:~ # ping -c 4 -S n1fbsd Linux
PING Linux (192.168.50.46) from n1fbsdvm: 56 data bytes
64 bytes from 192.168.50.46: icmp_seq=0 ttl=64 time=41.053 ms
64 bytes from 192.168.50.46: icmp_seq=1 ttl=64 time=41.005 ms
64 bytes from 192.168.50.46: icmp_seq=2 ttl=64 time=41.011 ms
64 bytes from 192.168.50.46: icmp_seq=3 ttl=64 time=41.168 ms

--- Linux ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 41.005/41.059/41.168/0.066 ms
root@n1fbsd:~ #

root@n1ubuntu24:~ # ping -c 4 -I 192.168.50.161 Linux
PING Linux (192.168.50.46) from 192.168.50.161 : 56(84) bytes of data.
64 bytes from Linux (192.168.50.46): icmp_seq=1 ttl=64 time=40.9 ms
64 bytes from Linux (192.168.50.46): icmp_seq=2 ttl=64 time=40.9 ms
64 bytes from Linux (192.168.50.46): icmp_seq=3 ttl=64 time=41.0 ms
64 bytes from Linux (192.168.50.46): icmp_seq=4 ttl=64 time=41.1 ms

--- Linux ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3061ms
rtt min/avg/max/mdev = 40.905/40.981/41.108/0.078 ms
root@n1ubuntu24:~ #

root@n1fbsd:~ # sysctl -f /etc/sysctl.conf
net.inet.tcp.hostcache.enable: 0 -> 0
kern.ipc.maxsockbuf: 10485760 -> 10485760
net.inet.tcp.sendbuf_max: 10485760 -> 10485760
net.inet.tcp.recvbuf_max: 10485760 -> 10485760
root@n1fbsd:~ #

root@n1ubuntu24:~ # sysctl -p
net.core.rmem_max = 10485760
net.core.wmem_max = 10485760
net.ipv4.tcp_rmem = 4096 131072 10485760
net.ipv4.tcp_wmem = 4096 16384 10485760
net.ipv4.tcp_no_metrics_save = 1
root@n1ubuntu24:~ #

cc@Linux:~ % sudo sysctl -p
net.core.rmem_max = 10485760
net.core.wmem_max = 10485760
net.ipv4.tcp_rmem = 4096 131072 10485760
net.ipv4.tcp_wmem = 4096 16384 10485760
net.ipv4.tcp_no_metrics_save = 1
cc@Linux:~ %

root@n1fbsd:~ # kldstat
Id Refs Address                Size Name
 1    5 0xffffffff80200000  1f75ca0 kernel
 2    1 0xffffffff82810000    368d8 tcp_rack.ko
 3    1 0xffffffff82847000     f0f0 tcphpts.ko
root@n1fbsd:~ # sysctl net.inet.tcp.functions_default=rack
net.inet.tcp.functions_default: freebsd -> rack
root@n1fbsd:~ #

FreeBSD VM sender info

FreeBSD 15.0-CURRENT (GENERIC) #0 main-6f6c07813b38: Fri Mar 21 2025

Linux VM sender info

Ubuntu server 24.04.2 LTS (GNU/Linux 6.8.0-55-generic x86_64)

Linux receiver info

Ubuntu desktop 24.04.2 LTS (GNU/Linux 6.11.0-19-generic x86_64)

receiver: iperf3 -s -p 5201 --affinity 1
sender:   iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -p 5201 -l 1M -t 300 -i 1 -f m -VC ${name}

CUBIC growth pattern before/after patch in FreeBSD default stack: attachment:before_patch_singlepath_cubic_def_cwnd_chart100.png attachment:after_patch_singlepath_cubic_def_cwnd_chart100.png

CUBIC growth pattern before/after patch in FreeBSD RACK stack: attachment:before_patch_singlepath_cubic_rack_cwnd_chart100s.png attachment:after_patch_singlepath_cubic_rack_cwnd_chart100s.png

For reference, the CUBIC in Linux kernel growth pattern is also attached: attachment:linux_cwnd_chart100s.png

single flow 300 seconds performance with 1Gbps x 40ms BDP

TCP CC

version

iperf3 single flow 300s average throughput

comment

CUBIC in freebsd default stack

before patch

715 Mbits/sec

base

after patch

693 Mbits/sec

a benign -3.1% performance lose (maybe caused by the RTO between 60s and 70s), but cwnd growth has finer gratitude and less delta (grows smoother)

CUBIC in freebsd RACK stack

before patch

738 Mbits/sec

base

after patch

730 Mbits/sec

negligible -1.1% performance lose

CUBIC in Linux

kernel 6.8.0

725 Mbits/sec

reference

CUBIC cwnd and throughput before patch in FreeBSD default stack:

attachment:before_patch_singlepath_cubic_def_cwnd_chart.png attachment:before_patch_singlepath_cubic_def_throughput_chart.png

CUBIC cwnd and throughput after patch in FreeBSD default stack:

attachment:after_patch_singlepath_cubic_def_cwnd_chart.png attachment:after_patch_singlepath_cubic_def_throughput_chart.png

CUBIC cwnd and throughput before patch in FreeBSD RACK stack:

attachment:before_patch_singlepath_cubic_rack_cwnd_chart.png attachment:before_patch_singlepath_cubic_rack_throughput_chart.png

CUBIC cwnd and throughput after patch in FreeBSD RACK stack:

attachment:after_patch_singlepath_cubic_rack_cwnd_chart.png attachment:after_patch_singlepath_cubic_rack_throughput_chart.png

CUBIC cwnd and throughput in Linux stack:

attachment:linux_singlepath_cubic_cwnd_chart.png attachment:linux_singlepath_cubic_throughput_chart.png

loss resilience in LAN test (no additional latency added)

Similar to testD46046, this configure tests TCP CUBIC in the VM traffic sender, with 1%, 2%, 3% or 4% incoming packet drop rate at the Linux receiver. No additional latency is added.

root@n1fbsd:~ # ping -c 4 -S n1fbsd Linux
PING Linux (192.168.50.46) from n1fbsdvm: 56 data bytes
64 bytes from 192.168.50.46: icmp_seq=0 ttl=64 time=0.499 ms
64 bytes from 192.168.50.46: icmp_seq=1 ttl=64 time=0.797 ms
64 bytes from 192.168.50.46: icmp_seq=2 ttl=64 time=0.860 ms
64 bytes from 192.168.50.46: icmp_seq=3 ttl=64 time=0.871 ms

--- Linux ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.499/0.757/0.871/0.151 ms
root@n1fbsd:~ #

root@n1ubuntu24:~ # ping -c 4 -I 192.168.50.161 Linux
PING Linux (192.168.50.46) from 192.168.50.161 : 56(84) bytes of data.
64 bytes from Linux (192.168.50.46): icmp_seq=1 ttl=64 time=0.528 ms
64 bytes from Linux (192.168.50.46): icmp_seq=2 ttl=64 time=0.583 ms
64 bytes from Linux (192.168.50.46): icmp_seq=3 ttl=64 time=0.750 ms
64 bytes from Linux (192.168.50.46): icmp_seq=4 ttl=64 time=0.590 ms

--- Linux ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3165ms
rtt min/avg/max/mdev = 0.528/0.612/0.750/0.082 ms
root@n1ubuntu24:~ #

## list of different packet loss rate
cc@Linux:~ % sudo iptables -A INPUT -p tcp --dport 5201 -m statistic --mode nth --every 100 --packet 0 -j DROP
cc@Linux:~ % sudo iptables -A INPUT -p tcp --dport 5201 -m statistic --mode nth --every 50 --packet 0 -j DROP
cc@Linux:~ % sudo iptables -A INPUT -p tcp --dport 5201 -m statistic --mode nth --every 33 --packet 0 -j DROP
cc@Linux:~ % sudo iptables -A INPUT -p tcp --dport 5201 -m statistic --mode nth --every 25 --packet 0 -j DROP
...

cc@Linux:~ % iperf3 -s -p 5201 --affinity 1
root@n1fbsd:~ # iperf3 -B n1fbsd --cport 54321 -c Linux -p 5201 -l 1M -t 30 -i 1 -f m -VC cubic

TCP CC

version

1% loss rate

2% loss rate

3% loss rate

4% loss rate

comment

CUBIC in freebsd default stack

before patch

854 Mbits/sec

331 Mbits/sec

53.1 Mbits/sec

22.9 Mbits/sec

after patch

814 Mbits/sec

150 Mbits/sec

45.1 Mbits/sec

24.2 Mbits/sec

NewReno in freebsd default stack

15-CURRENT

814 Mbits/sec

110 Mbits/sec

40.4 Mbits/sec

21.5 Mbits/sec

CUBIC in freebsd RACK stack

before patch

552 Mbits/sec

353 Mbits/sec

185 Mbits/sec

130 Mbits/sec

after patch

515 Mbits/sec

337 Mbits/sec

181 Mbits/sec

127 Mbits/sec

NewReno in freebsd RACK stack

15-CURRENT

462 Mbits/sec

281 Mbits/sec

163 Mbits/sec

118 Mbits/sec

CUBIC in Linux

kernel 6.8.0

665 Mbits/sec

645 Mbits/sec

550 Mbits/sec

554 Mbits/sec

reference

NewReno in Linux

kernel 6.8.0

924 Mbits/sec

862 Mbits/sec

740 Mbits/sec

599 Mbits/sec

reference

tri-point topology testbed

Now there are two virtual machines (VMs) as traffic senders at the same time. They are hosted by Bhyve in two separate physical boxs (Beelink SER5 AMD Mini PCs). The traffic receiver is the same Linux box as before, for the simplicity of a tri-point topology.

The three physical boxes are connected through a 5 Port Gigabit Ethernet Switch (TP-Link TL-SG105). The switch has a shared 1Mb (0.125MB) Packet Buffer Memory, which is just 2.5% of the 5 Mbytes BDP (1000Mbps x 40ms).

testbed: attachment:tri-point_topology.png

iperf3 -s -p 5201
iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -p 5201 -l 1M -t 200 -i 1 -f m -VC ${name}
iperf3 -s -p 5202
iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -p 5202 -l 1M -t 200 -i 1 -f m -VC ${name}

FreeBSD VM sender1 & sender2 info

FreeBSD 15.0-CURRENT (GENERIC) #0 main-6f6c07813b38: Fri Mar 21 2025

Linux VM sender1 & sender2 info

Ubuntu server 24.04.2 LTS (GNU/Linux 6.8.0-55-generic x86_64)

Linux receiver info

Ubuntu desktop 24.04.2 LTS (GNU/Linux 6.11.0-19-generic x86_64)

TCP CC

version

flow1 average throughput

flow2 average throughput

link utilization: average(flow1 + flow2)

TCP fairness

comment

CUBIC in freebsd default stack

before patch

31.1 Mbits/sec

31.5 Mbits/sec

62.6 Mbits/sec

100%

after patch

23.5 Mbits/sec

21.0 Mbits/sec

44.5 Mbits/sec

99.7%

NewReno in freebsd default stack

15-CURRENT

21.1 Mbits/sec

24.2 Mbits/sec

45.3 Mbits/sec

99.5%

CUBIC in freebsd RACK stack

before patch

21.8 Mbits/sec

21.4 Mbits/sec

43.2 Mbits/sec

100%

after patch

19.4 Mbits/sec

18.4 Mbits/sec

37.8 Mbits/sec

99.9%

NewReno in freebsd RACK stack

15-CURRENT

16.0 Mbits/sec

15.8 Mbits/sec

31.8 Mbits/sec

100%

CUBIC in Linux

kernel 6.8.0

19.2 Mbits/sec

19.2 Mbits/sec

38.4 Mbits/sec

100%

reference

NewReno in Linux

kernel 6.8.0

20.4 Mbits/sec

20.5 Mbits/sec

40.9 Mbits/sec

100%

reference

CUBIC cwnd and throughput before patch in FreeBSD default stack:

attachment:before_tri_fbsd_def_cubic_cwnd_chart.png attachment:before_tri_fbsd_def_cubic_throughput_chart.png

CUBIC cwnd and throughput after patch in FreeBSD default stack:

attachment:after_tri_fbsd_def_cubic_cwnd_chart.png attachment:after_tri_fbsd_def_cubic_throughput_chart.png

NewReno cwnd and throughput in FreeBSD default stack:

attachment:tri_fbsd_def_newreno_cwnd_chart.png attachment:tri_fbsd_def_newreno_throughput_chart.png

CUBIC cwnd and throughput before patch in FreeBSD RACK stack:

attachment:before_tri_fbsd_rack_cubic_cwnd_chart.png attachment:before_tri_fbsd_rack_cubic_throughput_chart.png

CUBIC cwnd and throughput after patch in FreeBSD RACK stack:

attachment:after_tri_fbsd_rack_cubic_cwnd_chart.png attachment:after_tri_fbsd_rack_cubic_throughput_chart.png

NewReno cwnd and throughput in FreeBSD RACK stack:

attachment:tri_fbsd_rack_newreno_cwnd_chart.png attachment:tri_fbsd_rack_newreno_throughput_chart.png

CUBIC cwnd and throughput in Linux:

attachment:tri_linux_cubic_cwnd_chart.png attachment:tri_linux_cubic_throughput_chart.png

NewReno cwnd and throughput in Linux:

attachment:tri_linux_reno_cwnd_chart.png attachment:tri_linux_reno_throughput_chart.png

VM environment with a 1GE router

tri-point topology testbed

The three physical boxes are connected through a 5-Port Gigabit Router (EdgeRouter X).

testbed: attachment:tri_point_router_topology.png

Turns out the internal virtual interface itf0 used by EdgeOS for inter-process communication is dropping packets before the qdisc (queueing discipline) level takes effects. So we could not see any TCP vs. AQM policy effects on the bottleneck. Even with MAX MTU=2018, the qdisc still could not take effects. I also found that the five Gigabit Ethernet ports are connected through a single internal switch chip, which may introduce limitations in certain configurations. In my test with MAX MTU=2018 and subnet configurate in wiki page Configure a 1Gbps EdgeRouter-X, iperf3 reports the max TCP performance through its IPv4 forwarding rate at 404 Mbits/sec.

Therefore, we can only see the TCP performance under this unknown inter-fabric buffer, instead of AQM.

ubnt@EdgeRouter-X-5-Port:~$ ip -s link show itf0
2: itf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2018 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 80:2a:a8:5e:13:f7 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    18897611234 18722289 0       24580   0       0                              <<< RX dropped = packets dropped before reaching the qdisc
    TX: bytes  packets  errors  dropped carrier collsns
    18974731668 18571885 0       0       0       0
ubnt@EdgeRouter-X-5-Port:~$ sudo tc -s qdisc show dev itf0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 18826039239 bytes 18571902 pkt (dropped 0, overlimits 0 requeues 530)     <<< if it is dropped here, it means packets dropped by the queue itself, usually because it’s full
 backlog 0b 0p requeues 530
ubnt@EdgeRouter-X-5-Port:~$
ubnt@EdgeRouter-X-5-Port# ip -s link show eth1
5: eth1@itf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2018 qdisc pfifo state UP mode DEFAULT group default qlen 1000
    link/ether 80:2a:a8:5e:13:f3 brd ff:ff:ff:ff:ff:ff
    alias Local
    RX: bytes  packets  errors  dropped overrun mcast
    13075123883 9714165  0       0       0       70
    TX: bytes  packets  errors  dropped carrier collsns
    3966141427 6636939  0       0       0       0
[edit]
ubnt@EdgeRouter-X-5-Port# ip -s link show eth2
6: eth2@itf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2018 qdisc pfifo state UP mode DEFAULT group default qlen 1000
    link/ether 80:2a:a8:5e:13:f4 brd ff:ff:ff:ff:ff:ff
    alias Local
    RX: bytes  packets  errors  dropped overrun mcast
    6654162760 8064107  0       0       0       139
    TX: bytes  packets  errors  dropped carrier collsns
    12910616151 10469529 0       0       0       0
[edit]
ubnt@EdgeRouter-X-5-Port# ip -s link show eth3
7: eth3@itf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2018 qdisc pfifo state UP mode DEFAULT group default qlen 1000
    link/ether 80:2a:a8:5e:13:f5 brd ff:ff:ff:ff:ff:ff
    alias Local
    RX: bytes  packets  errors  dropped overrun mcast
    7385125291 8078869  0       0       0       160
    TX: bytes  packets  errors  dropped carrier collsns
    10652730803 8729464  0       0       0       0
[edit]
ubnt@EdgeRouter-X-5-Port#
ubnt@EdgeRouter-X-5-Port# sudo tc -s qdisc show dev eth1
qdisc pfifo 8007: root refcnt 2 limit 3500p
 Sent 3906923804 bytes 5905040 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
[edit]
ubnt@EdgeRouter-X-5-Port# sudo tc -s qdisc show dev eth2
qdisc pfifo 8008: root refcnt 2 limit 3500p
 Sent 12887938442 bytes 10128384 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
[edit]
ubnt@EdgeRouter-X-5-Port# sudo tc -s qdisc show dev eth3
qdisc pfifo 8009: root refcnt 2 limit 3500p
 Sent 7430101832 bytes 6590019 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
[edit]

root@n1fbsd:~ # ping -c 4 Linux
PING Linux (192.168.3.10): 56 data bytes
64 bytes from 192.168.3.10: icmp_seq=0 ttl=63 time=41.228 ms
64 bytes from 192.168.3.10: icmp_seq=1 ttl=63 time=41.275 ms
64 bytes from 192.168.3.10: icmp_seq=2 ttl=63 time=41.190 ms
64 bytes from 192.168.3.10: icmp_seq=3 ttl=63 time=41.436 ms

--- Linux ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 41.190/41.282/41.436/0.094 ms
root@n1fbsd:~ #

root@n2fbsd:~ # ping -c 4 Linux
PING Linux (192.168.3.10): 56 data bytes
64 bytes from 192.168.3.10: icmp_seq=0 ttl=63 time=41.165 ms
64 bytes from 192.168.3.10: icmp_seq=1 ttl=63 time=41.304 ms
64 bytes from 192.168.3.10: icmp_seq=2 ttl=63 time=40.898 ms
64 bytes from 192.168.3.10: icmp_seq=3 ttl=63 time=41.280 ms

--- Linux ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 40.898/41.162/41.304/0.161 ms
root@n2fbsd:~ #

root@n1fbsd:~ # sysctl -f /etc/sysctl.conf
net.inet.tcp.hostcache.enable: 0 -> 0
kern.ipc.maxsockbuf: 10485760 -> 10485760
net.inet.tcp.sendbuf_max: 5242880 -> 5242880
net.inet.tcp.recvbuf_max: 5242880 -> 5242880
root@n1fbsd:~ #

root@n2fbsd:~ # sysctl -f /etc/sysctl.conf
net.inet.tcp.hostcache.enable: 0 -> 0
kern.ipc.maxsockbuf: 10485760 -> 10485760
net.inet.tcp.sendbuf_max: 5242880 -> 5242880
net.inet.tcp.recvbuf_max: 5242880 -> 5242880
root@n2fbsd:~ #

cc@Linux:~ % sudo sysctl -p
net.ipv4.tcp_no_metrics_save = 1
net.core.rmem_max = 10485760
net.core.wmem_max = 10485760
net.ipv4.tcp_rmem = 4096 131072 5242880
net.ipv4.tcp_wmem = 4096  16384 5242880
cc@Linux:~ %

iperf3 -s -p 5201
iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -p 5201 -l 1M -t 300 -i 1 -f m -VC ${name}
iperf3 -s -p 5202
iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -p 5202 -l 1M -t 300 -i 1 -f m -VC ${name}

FreeBSD VM sender1 & sender2 info

FreeBSD 15.0-CURRENT (GENERIC) #0 main-6f6c07813b38: Fri Mar 21 2025

Linux VM sender1 & sender2 info

Ubuntu server 24.04.2 LTS (GNU/Linux 6.8.0-58-generic x86_64)

Linux receiver info

Ubuntu desktop 24.04.2 LTS (GNU/Linux 6.11.0-19-generic x86_64)

TCP CC

version

flow1 average throughput

flow2 average throughput

link utilization: average(flow1 + flow2)

TCP fairness

additional stats

comment

CUBIC in freebsd default stack

before patch

80.1 Mbits/sec

61.5 Mbits/sec

141.6 Mbits/sec

98.3%

flow1: avg_srtt: 99002, max_srtt: 180000 µs
flow2: avg_srtt: 107239, max_srtt: 172343 µs

after patch

66.8 Mbits/sec

74.2 Mbits/sec

141.0 Mbits/sec

99.7%

flow1: avg_srtt: 125410, max_srtt: 170625 µs
flow2: avg_srtt: 124167, max_srtt: 173906 µs

NewReno in freebsd default stack

15-CURRENT

76.5 Mbits/sec

65.5 Mbits/sec

142.0 Mbits/sec

99.4%

flow1: avg_srtt: 97936, max_srtt: 173437 µs
flow2: avg_srtt: 104448, max_srtt: 155312 µs

CUBIC in freebsd RACK stack

before patch

68.1 Mbits/sec

75.4 Mbits/sec

143.5 Mbits/sec

99.7%

flow1: avg_srtt: 116167, max_srtt: 190875 µs
flow2: avg_srtt: 114196, max_srtt: 188734 µs

after patch

75.4 Mbits/sec

67.7 Mbits/sec

143.1 Mbits/sec

99.7%

flow1: avg_srtt: 107442, max_srtt: 194170 µs
flow2: avg_srtt: 110193, max_srtt: 206855 µs

NewReno in freebsd RACK stack

15-CURRENT

68.8 Mbits/sec

73.9 Mbits/sec

142.7 Mbits/sec

99.9%

flow1: avg_srtt: 100263, max_srtt: 170952 µs
flow2: avg_srtt: 104037, max_srtt: 161804 µs

Above flows' srtt: 98ms ~ 125ms.
So the router's inter-fabric buffer is estimated as around 2.5x ~ 3x BDP (12.5MB ~ 15MB).

CUBIC in Linux

kernel 6.8.0

108 Mbits/sec

125 Mbits/sec

233 Mbits/sec

99.5%

flow1: avg_srtt: 73931.1, max_srtt: 126305 µs
flow2: avg_srtt: 57389.8, max_srtt: 90480 µs

As reference, I believe Linux's TCP small queue plays a critical role in reducing queueing delay.

NewReno in Linux

kernel 6.8.0

86.8 Mbits/sec

149.0 Mbits/sec

235.8 Mbits/sec

93.5%

flow1: avg_srtt: 76593.3, max_srtt: 142904 µs
flow2: avg_srtt: 60543.5, max_srtt: 109339 µs

average queueing delay: 60ms ~ 75ms

CUBIC cwnd and throughput before patch in FreeBSD default stack:

attachment:before_tri_router_fbsd_def_cubic_cwnd_chart.png attachment:before_tri_router_fbsd_def_cubic_throughput_chart.png

CUBIC cwnd and throughput after patch in FreeBSD default stack:

attachment:after_tri_router_fbsd_def_cubic_cwnd_chart.png attachment:after_tri_router_fbsd_def_cubic_throughput_chart.png

NewReno cwnd and throughput in FreeBSD default stack:

attachment:tri_router_fbsd_def_newreno_cwnd_chart.png attachment:tri_router_fbsd_def_newreno_throughput_chart.png

CUBIC cwnd and throughput before patch in FreeBSD RACK stack:

attachment:before_tri_router_fbsd_rack_cubic_cwnd_chart.png attachment:before_tri_router_fbsd_rack_cubic_throughput_chart.png

CUBIC cwnd and throughput after patch in FreeBSD RACK stack:

attachment:after_tri_router_fbsd_rack_cubic_cwnd_chart.png attachment:after_tri_router_fbsd_rack_cubic_throughput_chart.png

NewReno cwnd and throughput in FreeBSD RACK stack:

attachment:tri_router_fbsd_rack_newreno_cwnd_chart.png attachment:tri_router_fbsd_rack_newreno_throughput_chart.png

CUBIC cwnd and throughput in Linux:

attachment:tri_router_linux_cubic_cwnd_chart.png attachment:tri_router_linux_cubic_throughput_chart.png

NewReno cwnd and throughput in Linux:

attachment:tri_router_linux_reno_cwnd_chart.png attachment:tri_router_linux_reno_throughput_chart.png

Emulab bare metal testbed with a 1GE Linux router

Testbed in a dumbbell topology (pc3000 nodes with 1Gbps links) used here are from Emulab.net: Emulab hardware

testbed design

The topology is a dumbbell with nodes (s1, s2) acting as TCP traffic senders using iperf3, nodes (rt1, rt2) serving as routers with a single bottleneck link, and nodes (r1, r2) functioning as TCP traffic receivers. Nodes s1, s2 are running the OS targets, and nodes r1, r2 are running Ubuntu Linux 24.04 to take advantage of ACK compression. Router nodes rt1 and rt2 are running Ubuntu Linux 24.04, with NIC and Layer 3 tuning to optimize IP forwarding. For instance, the bottleneck interface can be configured with shallow L2 and L3 buffers totaling approximately 96 packets (48 for L2 and 48 for L3). A dummynet box introduces a 40ms round-trip delay (RTT) on the bottleneck link. Each sender transmits data to its corresponding receiver (e.g., s1 -> r1, s2 -> r2).

All the physical links have 1Gbps bandwidth. A middle box running Dummynet adds 40ms delay emulation that at least provides a 1000Mbps x 40ms == 5MB Bandwidth Delay Product(BDP) environment. So I configured sender/receiver with 10MB TCP buffer. Active Queueing Management (AQM) is applied on the bottleneck port of rt1.

testbed topology: attachment:emulab_dummbell_testbed.png

root@s1:~# ping -c 4 r1
PING r1-link5 (10.1.4.3) 56(84) bytes of data.
64 bytes from r1-link5 (10.1.4.3): icmp_seq=1 ttl=62 time=40.1 ms
64 bytes from r1-link5 (10.1.4.3): icmp_seq=2 ttl=62 time=40.0 ms
64 bytes from r1-link5 (10.1.4.3): icmp_seq=3 ttl=62 time=40.1 ms
64 bytes from r1-link5 (10.1.4.3): icmp_seq=4 ttl=62 time=40.0 ms

--- r1-link5 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 40.039/40.070/40.120/0.031 ms
root@s1:~#
root@s2:~# ping -c 4 r2
PING r2-link6 (10.1.1.3) 56(84) bytes of data.
64 bytes from r2-link6 (10.1.1.3): icmp_seq=1 ttl=62 time=40.2 ms
64 bytes from r2-link6 (10.1.1.3): icmp_seq=2 ttl=62 time=40.1 ms
64 bytes from r2-link6 (10.1.1.3): icmp_seq=3 ttl=62 time=40.1 ms
64 bytes from r2-link6 (10.1.1.3): icmp_seq=4 ttl=62 time=40.0 ms

--- r2-link6 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 40.033/40.107/40.214/0.071 ms
root@s2:~#

root@s1:~ # sysctl -f /etc/sysctl.conf
kern.log_console_output: 0 -> 0
net.inet.tcp.hostcache.enable: 0 -> 0
kern.ipc.maxsockbuf: 10485760 -> 10485760
net.inet.tcp.sendbuf_max: 10485760 -> 10485760
net.inet.tcp.recvbuf_max: 10485760 -> 10485760
root@s1:~ #

root@s2:~ # sysctl -f /etc/sysctl.conf
kern.log_console_output: 0 -> 0
net.inet.tcp.hostcache.enable: 0 -> 0
kern.ipc.maxsockbuf: 10485760 -> 10485760
net.inet.tcp.sendbuf_max: 10485760 -> 10485760
net.inet.tcp.recvbuf_max: 10485760 -> 10485760
root@s2:~ #

cc@r1:~ % sudo sysctl -p
net.ipv4.tcp_no_metrics_save = 1
net.core.rmem_max = 10485760
net.core.wmem_max = 10485760
net.ipv4.tcp_rmem = 4096 131072 10485760
net.ipv4.tcp_wmem = 4096  16384 10485760
cc@r1:~ %

cc@r2:~ % sudo sysctl -p
net.ipv4.tcp_no_metrics_save = 1
net.core.rmem_max = 10485760
net.core.wmem_max = 10485760
net.ipv4.tcp_rmem = 4096 131072 10485760
net.ipv4.tcp_wmem = 4096  16384 10485760
cc@r2:~ %

receiver: iperf3 -s -p 5201 --affinity 1
sender:   iperf3 -B ${src} --cport ${tcp_port} -c ${dst} -p 5201 -l 1M -t 300 -i 1 -f m -VC ${name}

FreeBSD s1 & s2 info

FreeBSD 15.0-CURRENT (GENERIC) #0 main-d4763484f911: Mon Apr 21 2025

Linux r1 & r2 info

Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-53-generic x86_64)

Linux rt1 & rt2 info

Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-53-generic x86_64) with rx/tx driver and qdisc tuning for IP forwarding performance

root@rt1:~# tc -s qdisc show dev eth2
qdisc pfifo 8007: root refcnt 2 limit 256p
 Sent 105132572 bytes 1592669 pkt (dropped 0, overlimits 0 requeues 8) 
 backlog 0b 0p requeues 8
root@rt1:~# tc -s qdisc show dev eth4
qdisc pfifo 8009: root refcnt 2 limit 256p
 Sent 119886021 bytes 1816294 pkt (dropped 0, overlimits 0 requeues 3) 
 backlog 0b 0p requeues 3
root@rt1:~# tc -s qdisc show dev eth5
qdisc pfifo 800a: root refcnt 2 limit 48p
 Sent 20842988722 bytes 13766899 pkt (dropped 789, overlimits 0 requeues 58847)  <<< bottleneck drops
 backlog 0b 0p requeues 58847
root@rt1:~# 

TCP CC

version

flow1 average throughput

flow2 average throughput

link utilization: average(flow1 + flow2)

TCP fairness

comment

CUBIC in freebsd default stack

before patch

92.5 Mbits/sec

101.0 Mbits/sec

193.5 Mbits/sec

99.8%

base

after patch

199.0 Mbits/sec

211.0 Mbits/sec

410.0 Mbits/sec

99.9%

+111.9%

NewReno in freebsd default stack

15-CURRENT

95.4 Mbits/sec

83.9 Mbits/sec

179.3 Mbits/sec

99.6%

CUBIC in freebsd RACK stack

before patch

122.0 Mbits/sec

129.0 Mbits/sec

251.0 Mbits/sec

99.9%

base

after patch

117.0 Mbits/sec

129.0 Mbits/sec

246.0 Mbits/sec

99.8%

-2.0%

NewReno in freebsd RACK stack

15-CURRENT

95.1 Mbits/sec

106.0 Mbits/sec

201.1 Mbits/sec

99.7%

CUBIC in Linux

kernel 6.8.0

304.0 Mbits/sec

222.0 Mbits/sec

526.0 Mbits/sec

97.6%

reference

NewReno in Linux

kernel 6.8.0

199.0 Mbits/sec

247.0 Mbits/sec

446.0 Mbits/sec

98.9%

reference

chengcui/testD49540 (last edited 2025-04-25T19:23:13+0000 by chengcui)