Re: Request for Testing: TCP RACK

From: Nuno Teixeira <eduardo_at_freebsd.org>
Date: Wed, 10 Apr 2024 11:40:16 UTC
Hello all,

@ current 1500018 and fetching torrents with net-p2p/qbittorrent finished
~2GB download and connection UP until the end:

---
Apr 10 11:26:46 leg kernel: re0: watchdog timeout
Apr 10 11:26:46 leg kernel: re0: link state changed to DOWN
Apr 10 11:26:49 leg dhclient[58810]: New IP Address (re0): 192.168.1.67
Apr 10 11:26:49 leg dhclient[58814]: New Subnet Mask (re0): 255.255.255.0
Apr 10 11:26:49 leg dhclient[58818]: New Broadcast Address (re0):
192.168.1.255
Apr 10 11:26:49 leg kernel: re0: link state changed to UP
Apr 10 11:26:49 leg dhclient[58822]: New Routers (re0): 192.168.1.1
---

In the past tests, I've got more watchdog timeouts, connection goes down
and a reboot needed to put it back (`service netif restart` didn't work).

Other way to reproduce this is using sysutils/restic (backup program) to
read/check all files from a remote server via sftp:

`restic -r sftp:user@remote:restic-repo check --read-data` from a 60GB
compressed backup.

---
watchdog timeout x3 as above
---

restic check fail log @ 15% progress:
---
<snip>
Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying after
1.7670599s: connection lost
Load(<data/d27a0abe0f>, 17456892, 0) returned error, retrying after
4.619104908s: connection lost
Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying after
5.477648517s: connection lost
List(lock) returned error, retrying after 293.057766ms: connection lost
List(lock) returned error, retrying after 385.206693ms: connection lost
List(lock) returned error, retrying after 1.577594281s: connection lost
<snip>

Connection continues UP.

Cheers,

<tuexen@freebsd.org> escreveu (quinta, 28/03/2024 à(s) 15:53):

> > On 28. Mar 2024, at 15:00, Nuno Teixeira <eduardo@freebsd.org> wrote:
> >
> > Hello all!
> >
> > Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop
> (amd64)!
> >
> > Thanks all!
> Thanks for the feedback!
>
> Best regards
> Michael
> >
> > Drew Gallatin <gallatin@freebsd.org> escreveu (quinta, 21/03/2024 à(s)
> 12:58):
> > The entire point is to *NOT* go through the overhead of scheduling
> something asynchronously, but to take advantage of the fact that a
> user/kernel transition is going to trash the cache anyway.
> >
> > In the common case of a system which has less than the threshold  number
> of connections , we access the tcp_hpts_softclock function pointer, make
> one function call, and access hpts_that_need_softclock, and then return.
> So that's 2 variables and a function call.
> >
> > I think it would be preferable to avoid that call, and to move the
> declaration of tcp_hpts_softclock and hpts_that_need_softclock so that they
> are in the same cacheline.  Then we'd be hitting just a single line in the
> common case.  (I've made comments on the review to that effect).
> >
> > Also, I wonder if the threshold could get higher by default, so that
> hpts is never called in this context unless we're to the point where we're
> scheduling thousands of runs of the hpts thread (and taking all those clock
> interrupts).
> >
> > Drew
> >
> > On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:
> >> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:
> >>> Ok I have created
> >>>
> >>> https://reviews.freebsd.org/D44420
> >>>
> >>>
> >>> To address the issue. I also attach a short version of the patch that
> Nuno
> >>> can try and validate
> >>>
> >>> it works. Drew you may want to try this and validate the optimization
> does
> >>> kick in since I can
> >>>
> >>> only now test that it does not on my local box :)
> >> The patch still causes access to all cpu's cachelines on each userret.
> >> It would be much better to inc/check the threshold and only schedule the
> >> call when exceeded.  Then the call can occur in some dedicated context,
> >> like per-CPU thread, instead of userret.
> >>
> >>>
> >>>
> >>> R
> >>>
> >>>
> >>>
> >>> On 3/18/24 3:42 PM, Drew Gallatin wrote:
> >>>> No.  The goal is to run on every return to userspace for every thread.
> >>>>
> >>>> Drew
> >>>>
> >>>> On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote:
> >>>>> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote:
> >>>>>> I got the idea from
> >>>>>>
> https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf
> >>>>>> The gist is that the TCP pacing stuff needs to run frequently, and
> >>>>>> rather than run it out of a clock interrupt, its more efficient to
> run
> >>>>>> it out of a system call context at just the point where we return to
> >>>>>> userspace and the cache is trashed anyway. The current
> implementation
> >>>>>> is fine for our workload, but probably not idea for a generic
> system.
> >>>>>> Especially one where something is banging on system calls.
> >>>>>>
> >>>>>> Ast's could be the right tool for this, but I'm super unfamiliar
> with
> >>>>>> them, and I can't find any docs on them.
> >>>>>>
> >>>>>> Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to
> >>>>>> what's happening here?
> >>>>> This call would need some AST number added, and then it registers the
> >>>>> ast to run on next return to userspace, for the current thread.
> >>>>>
> >>>>> Is it enough?
> >>>>>>
> >>>>>> Drew
> >>>>>
> >>>>>>
> >>>>>> On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:
> >>>>>>> On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote:
> >>>>>>>> On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote:
> >>>>>>>>
> >>>>>>>>>> On 18. Mar 2024, at 12:42, Nuno Teixeira
> >>>>> <eduardo@freebsd.org> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hello all!
> >>>>>>>>>>
> >>>>>>>>>> It works just fine!
> >>>>>>>>>> System performance is OK.
> >>>>>>>>>> Using patch on main-n268841-b0aaf8beb126(-dirty).
> >>>>>>>>>>
> >>>>>>>>>> ---
> >>>>>>>>>> net.inet.tcp.functions_available:
> >>>>>>>>>> Stack                           D
> >>>>> Alias                            PCB count
> >>>>>>>>>> freebsd freebsd                          0
> >>>>>>>>>> rack                            *
> >>>>> rack                             38
> >>>>>>>>>> ---
> >>>>>>>>>>
> >>>>>>>>>> It would be so nice that we can have a sysctl tunnable for
> >>>>> this patch
> >>>>>>>>>> so we could do more tests without recompiling kernel.
> >>>>>>>>> Thanks for testing!
> >>>>>>>>>
> >>>>>>>>> @gallatin: can you come up with a patch that is acceptable
> >>>>> for Netflix
> >>>>>>>>> and allows to mitigate the performance regression.
> >>>>>>>>
> >>>>>>>> Ideally, tcphpts could enable this automatically when it
> >>>>> starts to be
> >>>>>>>> used (enough?), but a sysctl could select auto/on/off.
> >>>>>>> There is already a well-known mechanism to request execution of the
> >>>>>>> specific function on return to userspace, namely AST.  The
> difference
> >>>>>>> with the current hack is that the execution is requested for one
> >>>>> callback
> >>>>>>> in the context of the specific thread.
> >>>>>>>
> >>>>>>> Still, it might be worth a try to use it; what is the reason to
> >>>>> hit a thread
> >>>>>>> that does not do networking, with TCP processing?
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Mike
> >>>>>>>>
> >>>>>>>>> Best regards
> >>>>>>>>> Michael
> >>>>>>>>>>
> >>>>>>>>>> Thanks all!
> >>>>>>>>>> Really happy here :)
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>>
> >>>>>>>>>> Nuno Teixeira <eduardo@freebsd.org> escreveu (domingo,
> >>>>> 17/03/2024 à(s) 20:26):
> >>>>>>>>>>>
> >>>>>>>>>>> Hello,
> >>>>>>>>>>>
> >>>>>>>>>>>> I don't have the full context, but it seems like the
> >>>>> complaint is a performance regression in bonnie++ and perhaps other
> >>>>> things when tcp_hpts is loaded, even when it is not used.  Is that
> >>>>> correct?
> >>>>>>>>>>>>
> >>>>>>>>>>>> If so, I suspect its because we drive the
> >>>>> tcp_hpts_softclock() routine from userret(), in order to avoid tons
> >>>>> of timer interrupts and context switches.  To test this theory,  you
> >>>>> could apply a patch like:
> >>>>>>>>>>>
> >>>>>>>>>>> It's affecting overall system performance, bonnie was just
> >>>>> a way to
> >>>>>>>>>>> get some numbers to compare.
> >>>>>>>>>>>
> >>>>>>>>>>> Tomorrow I will test patch.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks!
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Nuno Teixeira
> >>>>>>>>>>> FreeBSD Committer (ports)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Nuno Teixeira
> >>>>>>>>>> FreeBSD Committer (ports)
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> >>> diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c
> >>> index 8c4d2d41a3eb..eadbee19f69c 100644
> >>> --- a/sys/netinet/tcp_hpts.c
> >>> +++ b/sys/netinet/tcp_hpts.c
> >>> @@ -216,6 +216,7 @@ struct tcp_hpts_entry {
> >>> void *ie_cookie;
> >>> uint16_t p_num; /* The hpts number one per cpu */
> >>> uint16_t p_cpu; /* The hpts CPU */
> >>> + uint8_t hit_callout_thresh;
> >>> /* There is extra space in here */
> >>> /* Cache line 0x100 */
> >>> struct callout co __aligned(CACHE_LINE_SIZE);
> >>> @@ -269,6 +270,11 @@ static struct hpts_domain_info {
> >>> int cpu[MAXCPU];
> >>> } hpts_domains[MAXMEMDOM];
> >>>
> >>> +counter_u64_t hpts_that_need_softclock;
> >>> +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needsoftclock,
> CTLFLAG_RD,
> >>> +    &hpts_that_need_softclock,
> >>> +    "Number of hpts threads that need softclock");
> >>> +
> >>> counter_u64_t hpts_hopelessly_behind;
> >>>
> >>> SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeless,
> CTLFLAG_RD,
> >>> @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO,
> precision, CTLFLAG_RW,
> >>>     &tcp_hpts_precision, 120,
> >>>     "Value for PRE() precision of callout");
> >>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_RW,
> >>> -    &conn_cnt_thresh, 0,
> >>> +    &conn_cnt_thresh, DEFAULT_CONNECTION_THESHOLD,
> >>>     "How many connections (below) make us use the callout based
> mechanism");
> >>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW,
> >>>     &hpts_does_tp_logging, 0,
> >>> @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void)
> >>> struct tcp_hpts_entry *hpts;
> >>> int ticks_ran;
> >>>
> >>> + if (counter_u64_fetch(hpts_that_need_softclock) == 0)
> >>> + return;
> >>> +
> >>> hpts = tcp_choose_hpts_to_run();
> >>>
> >>> if (hpts->p_hpts_active) {
> >>> @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx)
> >>> ticks_ran = tcp_hptsi(hpts, 1);
> >>> tv.tv_sec = 0;
> >>> tv.tv_usec = hpts->p_hpts_sleep_time * HPTS_TICKS_PER_SLOT;
> >>> + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) &&
> (hpts->hit_callout_thresh == 0)) {
> >>> + hpts->hit_callout_thresh = 1;
> >>> + counter_u64_add(hpts_that_need_softclock, 1);
> >>> + } else if ((hpts->p_on_queue_cnt <= conn_cnt_thresh) &&
> (hpts->hit_callout_thresh == 1)) {
> >>> + hpts->hit_callout_thresh = 0;
> >>> + counter_u64_add(hpts_that_need_softclock, -1);
> >>> + }
> >>> if (hpts->p_on_queue_cnt >= conn_cnt_thresh) {
> >>> if(hpts->p_direct_wake == 0) {
> >>> /*
> >>> @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void)
> >>> cpu_top = NULL;
> >>> #endif
> >>> tcp_pace.rp_num_hptss = ncpus;
> >>> + hpts_that_need_softclock = counter_u64_alloc(M_WAITOK);
> >>> hpts_hopelessly_behind = counter_u64_alloc(M_WAITOK);
> >>> hpts_loops = counter_u64_alloc(M_WAITOK);
> >>> back_tosleep = counter_u64_alloc(M_WAITOK);
> >>> @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void)
> >>> free(tcp_pace.grps, M_TCPHPTS);
> >>> #endif
> >>>
> >>> + counter_u64_free(hpts_that_need_softclock);
> >>> counter_u64_free(hpts_hopelessly_behind);
> >>> counter_u64_free(hpts_loops);
> >>> counter_u64_free(back_tosleep);
> >>
> >>
> >
> >
> >
> > --
> > Nuno Teixeira
> > FreeBSD Committer (ports)
>
>

-- 
Nuno Teixeira
FreeBSD Committer (ports)