Re: Request for Testing: TCP RACK
- Reply: tuexen_a_freebsd.org: "Re: Request for Testing: TCP RACK"
- In reply to: tuexen_a_freebsd.org: "Re: Request for Testing: TCP RACK"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 10 Apr 2024 11:40:16 UTC
Hello all, @ current 1500018 and fetching torrents with net-p2p/qbittorrent finished ~2GB download and connection UP until the end: --- Apr 10 11:26:46 leg kernel: re0: watchdog timeout Apr 10 11:26:46 leg kernel: re0: link state changed to DOWN Apr 10 11:26:49 leg dhclient[58810]: New IP Address (re0): 192.168.1.67 Apr 10 11:26:49 leg dhclient[58814]: New Subnet Mask (re0): 255.255.255.0 Apr 10 11:26:49 leg dhclient[58818]: New Broadcast Address (re0): 192.168.1.255 Apr 10 11:26:49 leg kernel: re0: link state changed to UP Apr 10 11:26:49 leg dhclient[58822]: New Routers (re0): 192.168.1.1 --- In the past tests, I've got more watchdog timeouts, connection goes down and a reboot needed to put it back (`service netif restart` didn't work). Other way to reproduce this is using sysutils/restic (backup program) to read/check all files from a remote server via sftp: `restic -r sftp:user@remote:restic-repo check --read-data` from a 60GB compressed backup. --- watchdog timeout x3 as above --- restic check fail log @ 15% progress: --- <snip> Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying after 1.7670599s: connection lost Load(<data/d27a0abe0f>, 17456892, 0) returned error, retrying after 4.619104908s: connection lost Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying after 5.477648517s: connection lost List(lock) returned error, retrying after 293.057766ms: connection lost List(lock) returned error, retrying after 385.206693ms: connection lost List(lock) returned error, retrying after 1.577594281s: connection lost <snip> Connection continues UP. Cheers, <tuexen@freebsd.org> escreveu (quinta, 28/03/2024 à(s) 15:53): > > On 28. Mar 2024, at 15:00, Nuno Teixeira <eduardo@freebsd.org> wrote: > > > > Hello all! > > > > Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop > (amd64)! > > > > Thanks all! > Thanks for the feedback! > > Best regards > Michael > > > > Drew Gallatin <gallatin@freebsd.org> escreveu (quinta, 21/03/2024 à(s) > 12:58): > > The entire point is to *NOT* go through the overhead of scheduling > something asynchronously, but to take advantage of the fact that a > user/kernel transition is going to trash the cache anyway. > > > > In the common case of a system which has less than the threshold number > of connections , we access the tcp_hpts_softclock function pointer, make > one function call, and access hpts_that_need_softclock, and then return. > So that's 2 variables and a function call. > > > > I think it would be preferable to avoid that call, and to move the > declaration of tcp_hpts_softclock and hpts_that_need_softclock so that they > are in the same cacheline. Then we'd be hitting just a single line in the > common case. (I've made comments on the review to that effect). > > > > Also, I wonder if the threshold could get higher by default, so that > hpts is never called in this context unless we're to the point where we're > scheduling thousands of runs of the hpts thread (and taking all those clock > interrupts). > > > > Drew > > > > On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote: > >> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote: > >>> Ok I have created > >>> > >>> https://reviews.freebsd.org/D44420 > >>> > >>> > >>> To address the issue. I also attach a short version of the patch that > Nuno > >>> can try and validate > >>> > >>> it works. Drew you may want to try this and validate the optimization > does > >>> kick in since I can > >>> > >>> only now test that it does not on my local box :) > >> The patch still causes access to all cpu's cachelines on each userret. > >> It would be much better to inc/check the threshold and only schedule the > >> call when exceeded. Then the call can occur in some dedicated context, > >> like per-CPU thread, instead of userret. > >> > >>> > >>> > >>> R > >>> > >>> > >>> > >>> On 3/18/24 3:42 PM, Drew Gallatin wrote: > >>>> No. The goal is to run on every return to userspace for every thread. > >>>> > >>>> Drew > >>>> > >>>> On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote: > >>>>> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote: > >>>>>> I got the idea from > >>>>>> > https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf > >>>>>> The gist is that the TCP pacing stuff needs to run frequently, and > >>>>>> rather than run it out of a clock interrupt, its more efficient to > run > >>>>>> it out of a system call context at just the point where we return to > >>>>>> userspace and the cache is trashed anyway. The current > implementation > >>>>>> is fine for our workload, but probably not idea for a generic > system. > >>>>>> Especially one where something is banging on system calls. > >>>>>> > >>>>>> Ast's could be the right tool for this, but I'm super unfamiliar > with > >>>>>> them, and I can't find any docs on them. > >>>>>> > >>>>>> Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to > >>>>>> what's happening here? > >>>>> This call would need some AST number added, and then it registers the > >>>>> ast to run on next return to userspace, for the current thread. > >>>>> > >>>>> Is it enough? > >>>>>> > >>>>>> Drew > >>>>> > >>>>>> > >>>>>> On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote: > >>>>>>> On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote: > >>>>>>>> On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote: > >>>>>>>> > >>>>>>>>>> On 18. Mar 2024, at 12:42, Nuno Teixeira > >>>>> <eduardo@freebsd.org> wrote: > >>>>>>>>>> > >>>>>>>>>> Hello all! > >>>>>>>>>> > >>>>>>>>>> It works just fine! > >>>>>>>>>> System performance is OK. > >>>>>>>>>> Using patch on main-n268841-b0aaf8beb126(-dirty). > >>>>>>>>>> > >>>>>>>>>> --- > >>>>>>>>>> net.inet.tcp.functions_available: > >>>>>>>>>> Stack D > >>>>> Alias PCB count > >>>>>>>>>> freebsd freebsd 0 > >>>>>>>>>> rack * > >>>>> rack 38 > >>>>>>>>>> --- > >>>>>>>>>> > >>>>>>>>>> It would be so nice that we can have a sysctl tunnable for > >>>>> this patch > >>>>>>>>>> so we could do more tests without recompiling kernel. > >>>>>>>>> Thanks for testing! > >>>>>>>>> > >>>>>>>>> @gallatin: can you come up with a patch that is acceptable > >>>>> for Netflix > >>>>>>>>> and allows to mitigate the performance regression. > >>>>>>>> > >>>>>>>> Ideally, tcphpts could enable this automatically when it > >>>>> starts to be > >>>>>>>> used (enough?), but a sysctl could select auto/on/off. > >>>>>>> There is already a well-known mechanism to request execution of the > >>>>>>> specific function on return to userspace, namely AST. The > difference > >>>>>>> with the current hack is that the execution is requested for one > >>>>> callback > >>>>>>> in the context of the specific thread. > >>>>>>> > >>>>>>> Still, it might be worth a try to use it; what is the reason to > >>>>> hit a thread > >>>>>>> that does not do networking, with TCP processing? > >>>>>>> > >>>>>>>> > >>>>>>>> Mike > >>>>>>>> > >>>>>>>>> Best regards > >>>>>>>>> Michael > >>>>>>>>>> > >>>>>>>>>> Thanks all! > >>>>>>>>>> Really happy here :) > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> > >>>>>>>>>> Nuno Teixeira <eduardo@freebsd.org> escreveu (domingo, > >>>>> 17/03/2024 à(s) 20:26): > >>>>>>>>>>> > >>>>>>>>>>> Hello, > >>>>>>>>>>> > >>>>>>>>>>>> I don't have the full context, but it seems like the > >>>>> complaint is a performance regression in bonnie++ and perhaps other > >>>>> things when tcp_hpts is loaded, even when it is not used. Is that > >>>>> correct? > >>>>>>>>>>>> > >>>>>>>>>>>> If so, I suspect its because we drive the > >>>>> tcp_hpts_softclock() routine from userret(), in order to avoid tons > >>>>> of timer interrupts and context switches. To test this theory, you > >>>>> could apply a patch like: > >>>>>>>>>>> > >>>>>>>>>>> It's affecting overall system performance, bonnie was just > >>>>> a way to > >>>>>>>>>>> get some numbers to compare. > >>>>>>>>>>> > >>>>>>>>>>> Tomorrow I will test patch. > >>>>>>>>>>> > >>>>>>>>>>> Thanks! > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Nuno Teixeira > >>>>>>>>>>> FreeBSD Committer (ports) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> Nuno Teixeira > >>>>>>>>>> FreeBSD Committer (ports) > >>>>>>>> > >>>>>>> > >>>>> > >>>> > >> > >>> diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c > >>> index 8c4d2d41a3eb..eadbee19f69c 100644 > >>> --- a/sys/netinet/tcp_hpts.c > >>> +++ b/sys/netinet/tcp_hpts.c > >>> @@ -216,6 +216,7 @@ struct tcp_hpts_entry { > >>> void *ie_cookie; > >>> uint16_t p_num; /* The hpts number one per cpu */ > >>> uint16_t p_cpu; /* The hpts CPU */ > >>> + uint8_t hit_callout_thresh; > >>> /* There is extra space in here */ > >>> /* Cache line 0x100 */ > >>> struct callout co __aligned(CACHE_LINE_SIZE); > >>> @@ -269,6 +270,11 @@ static struct hpts_domain_info { > >>> int cpu[MAXCPU]; > >>> } hpts_domains[MAXMEMDOM]; > >>> > >>> +counter_u64_t hpts_that_need_softclock; > >>> +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, needsoftclock, > CTLFLAG_RD, > >>> + &hpts_that_need_softclock, > >>> + "Number of hpts threads that need softclock"); > >>> + > >>> counter_u64_t hpts_hopelessly_behind; > >>> > >>> SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeless, > CTLFLAG_RD, > >>> @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, > precision, CTLFLAG_RW, > >>> &tcp_hpts_precision, 120, > >>> "Value for PRE() precision of callout"); > >>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_RW, > >>> - &conn_cnt_thresh, 0, > >>> + &conn_cnt_thresh, DEFAULT_CONNECTION_THESHOLD, > >>> "How many connections (below) make us use the callout based > mechanism"); > >>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW, > >>> &hpts_does_tp_logging, 0, > >>> @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void) > >>> struct tcp_hpts_entry *hpts; > >>> int ticks_ran; > >>> > >>> + if (counter_u64_fetch(hpts_that_need_softclock) == 0) > >>> + return; > >>> + > >>> hpts = tcp_choose_hpts_to_run(); > >>> > >>> if (hpts->p_hpts_active) { > >>> @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx) > >>> ticks_ran = tcp_hptsi(hpts, 1); > >>> tv.tv_sec = 0; > >>> tv.tv_usec = hpts->p_hpts_sleep_time * HPTS_TICKS_PER_SLOT; > >>> + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) && > (hpts->hit_callout_thresh == 0)) { > >>> + hpts->hit_callout_thresh = 1; > >>> + counter_u64_add(hpts_that_need_softclock, 1); > >>> + } else if ((hpts->p_on_queue_cnt <= conn_cnt_thresh) && > (hpts->hit_callout_thresh == 1)) { > >>> + hpts->hit_callout_thresh = 0; > >>> + counter_u64_add(hpts_that_need_softclock, -1); > >>> + } > >>> if (hpts->p_on_queue_cnt >= conn_cnt_thresh) { > >>> if(hpts->p_direct_wake == 0) { > >>> /* > >>> @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void) > >>> cpu_top = NULL; > >>> #endif > >>> tcp_pace.rp_num_hptss = ncpus; > >>> + hpts_that_need_softclock = counter_u64_alloc(M_WAITOK); > >>> hpts_hopelessly_behind = counter_u64_alloc(M_WAITOK); > >>> hpts_loops = counter_u64_alloc(M_WAITOK); > >>> back_tosleep = counter_u64_alloc(M_WAITOK); > >>> @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void) > >>> free(tcp_pace.grps, M_TCPHPTS); > >>> #endif > >>> > >>> + counter_u64_free(hpts_that_need_softclock); > >>> counter_u64_free(hpts_hopelessly_behind); > >>> counter_u64_free(hpts_loops); > >>> counter_u64_free(back_tosleep); > >> > >> > > > > > > > > -- > > Nuno Teixeira > > FreeBSD Committer (ports) > > -- Nuno Teixeira FreeBSD Committer (ports)