Re: Performance test for CUBIC in stable/14

From: void <void_at_f-m.fm>
Date: Wed, 23 Oct 2024 21:43:21 UTC
On Wed, Oct 23, 2024 at 03:14:08PM -0400, Cheng Cui wrote:
>I see. The result of `newreno` vs. `cubic` shows non-constant/infrequent
>packet
>retransmission. So TCP congestion control has little impact on improving the
>performance.
>
>The performance bottleneck may come from somewhere else. For example, the
>sender CPU shows 97.7% utilization. Would there be any way to reduce CPU
>usage?

There are 11 VMs running on the bhyve server. None of them are very busy but the
server shows 
% uptime
  9:54p.m.  up 8 days,  6:08, 22 users, load averages: 0.82, 1.25, 1.74

The test vm vm4-fbsd14s:
% uptime
  9:55PM  up 2 days,  3:12, 5 users, load averages: 0.35, 0.31, 0.21

It has 
% sysctl hw.ncpu
hw.ncpu: 8

and
avail memory = 66843062272 (63746 MB)

so it's not short of resources.

A test just now gave these results:
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-20.04  sec  1.31 GBytes   563 Mbits/sec    0             sender
[  5]   0.00-20.06  sec  1.31 GBytes   563 Mbits/sec                  receiver
CPU Utilization: local/sender 94.1% (0.1%u/94.1%s), remote/receiver 15.5%
(1.5%u/13.9%s)
snd_tcp_congestion cubic
rcv_tcp_congestion cubic

iperf Done.

so I'm not sure how the utilization figure was synthesised, unless it's derived
from something like 'top' where 1.00 is 100%. Load when running the test got to
0.83 as observed in 'top' in another terminal. Five mins after the test, 
load in the vm is: 0.32, 0.31, 0.26
on the bhyve host: 0.39, 0.61, 1.11

Before we began testing, I was looking at the speed issue as being caused by
something to do with interrupts and/or polling, and/or HZ, somehting that linux
handles differently and gives better results on the same bhyve host.
Maybe rebuilding the kernel with a different scheduler on both the host and the
freebsd vms will give a better result for freebsd if tweaking sysctls doesn't
make much of a difference.

In terms of real-world bandwidth, I found that the combination of your modified
cc_cubic + rack gave the best results in terms of overall throughput in a
speedtest context, although it's slower to get to its max throughput than cubic
alone. I'm still testing with a webdav/rsync context (cubic against cubic+rack)

The next lot of testing after changing the scheduler will be on a KVM host, 
with various *BSDs as guests.

There may be a tradeoff of stability against speed I guess.
--