From nobody Tue Apr 18 15:58:12 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Q17qm74zTz45Htf; Tue, 18 Apr 2023 15:58:12 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Q17qm69Krz3lGN; Tue, 18 Apr 2023 15:58:12 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1681833492; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=n/kIu9/H6ELvRk1XW9axMgfCGklim/YDR20Ol3m0Omk=; b=EbGq+jYUJpARbF5qcMzPCaKnqWusKCIHKBNIN/AvBDyiGgGoc1s7n22PtAUiOZIyfA8A1a +fpugx4LhZZN4j+QdaXuIU2PGUhSzRB2rkYewbYsLZBUZIjqa3In+c7eXTY3RkQAw+vWjH OzpQ126+BIhQyEEwx854h3+Wi5pEIixDdCec206Qg3VUuosyxunG8PmyRLOzC6WWwNkgtN pWpsfTY8t4c7PaNpg2bitIIJBHfd6IaVPVJ71SeBQjmhj6hnVfNgdWqteRo7OMI73gK46W g+xQ0oBeuLzeVMD0/sUb4TAcnWXjncoLirrlU7x6iVT+rpF2y53r5KPLP8PtsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1681833492; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=n/kIu9/H6ELvRk1XW9axMgfCGklim/YDR20Ol3m0Omk=; b=Tx6Jg7e/YNgL7vW6qE4R0hVg0FTLtu3cB+rV6rT0lGbwxfi9eMAkm7gx+W1ISpDPcxUCY1 HkK5lKnFVQ53ed6JZ/A5pfFjEVvaXrtZbwdWEJw+237ELWHkOA6oWvxdXql8dzgLUtpC4Z 9mBYMgdfKHnGW8uU8pAxVVLUM43v8clG8tNrFhqvVybzvxmPixYQxE2KgqYLSemUbBXZhD ZnKu9aVYFTWez6ZxwlY97BnaQQxoUJQRiUWu/cnp24d4pcRu2yTHifdCRVgdmGtRFCb1o3 6o22zzy7nGfsuf0G+CnyxI21g/qHQ+i3sNRR+dR7bYKcvGsc3S32fmbJJA3DSg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1681833492; a=rsa-sha256; cv=none; b=qj6Bdp9Qy8mkEkjpHw39fdy6tmcX7PontbSpQU4ooN0EHb0kXP63glXc5lY7o7arWf/rWI mvXr1NJQVkTMY7ZqdYyGJycrAuJn0QvaB7YR3zlg0bxNOEmDA2sWj8bmuRJIspSAWXtAjq W0o/0MWSab8BZh8fbC4x45wXeaBSXXcF17zWDbZ6F2WPVLRqqJ1+W6oLjGmR0vIjyWqxeT dXC14WikPmtXBfJbXmmlqtao4e2TvsZRgevvihsati4X9rmyphAfKBhfW1Qfel+iZO7bkw T/QzSIVLXg0HzrE2F0ielEJDNWWiIozjTQAIm9RJgtAW00fY/Z1kez4tYYHbzw== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Q17qm4y9qzHrb; Tue, 18 Apr 2023 15:58:12 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 33IFwCnX060916; Tue, 18 Apr 2023 15:58:12 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 33IFwCg4060915; Tue, 18 Apr 2023 15:58:12 GMT (envelope-from git) Date: Tue, 18 Apr 2023 15:58:12 GMT Message-Id: <202304181558.33IFwCg4060915@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Randall Stewart Subject: git: 2ad584c5551d - main - tcp: Inconsistent use of hpts_calling flag List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: rrs X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 2ad584c5551d4d385742b6c6a46d726a8cfc7c57 Auto-Submitted: auto-generated X-ThisMailContainsUnwantedMimeParts: N The branch main has been updated by rrs: URL: https://cgit.FreeBSD.org/src/commit/?id=2ad584c5551d4d385742b6c6a46d726a8cfc7c57 commit 2ad584c5551d4d385742b6c6a46d726a8cfc7c57 Author: Randall Stewart AuthorDate: 2023-04-17 21:10:26 +0000 Commit: Randall Stewart CommitDate: 2023-04-17 21:10:26 +0000 tcp: Inconsistent use of hpts_calling flag Gleb has noticed there were some inconsistency's in the way the inp_hpts_calls flag was being used. One such inconsistency results in a bug when we can't allocate enough sendmap entries to entertain a call to rack_output().. basically a timer won't get started like it should. Also in cleaning this up I find that the "no_output" side of input needs to be adjusted to make sure we don't try to re-pace too quickly outside the hpts assurance of 250useconds. Another thing here is we end up with duplicate calls to tcp_output() which we should not. If packets go from hpts for processing the input side of tcp will call the output side of tcp on the last packet if it is needed. This means that when that occurs a second call to tcp_output would be made that is not needed and if pacing is going on may be harmful. Lets fix all this and explicitly state the contract that hpts is making with transports that care about the flag. Reviewed by: tuexen, glebius Sponsored by: Netflix Inc Differential Revision:https://reviews.freebsd.org/D39653 --- sys/netinet/tcp_hpts.c | 20 +++++++++++++++----- sys/netinet/tcp_stacks/bbr.c | 9 ++++++--- sys/netinet/tcp_stacks/rack.c | 22 ++++++++++++++++++---- 3 files changed, 39 insertions(+), 12 deletions(-) diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c index e61547104775..2aa4135c7a25 100644 --- a/sys/netinet/tcp_hpts.c +++ b/sys/netinet/tcp_hpts.c @@ -1330,6 +1330,17 @@ again: kern_prefetch(tp->t_fb_ptr, &did_prefetch); did_prefetch = 1; } + /* + * We set inp_hpts_calls to 1 before any possible output. + * The contract with the transport is that if it cares about + * hpts calling it should clear the flag. That way next time + * it is called it will know it is hpts. + * + * We also only call tfb_do_queued_segments() tcp_output() + * it is expected that if segments are queued and come in that + * the final input mbuf will cause a call to output if it is needed. + */ + inp->inp_hpts_calls = 1; if ((inp->inp_flags2 & INP_SUPPORTS_MBUFQ) && !STAILQ_EMPTY(&tp->t_inqueue)) { error = (*tp->t_fb->tfb_do_queued_segments)(tp, 0); @@ -1337,12 +1348,11 @@ again: /* The input killed the connection */ goto skip_pacing; } + } else { + error = tcp_output(tp); + if (error < 0) + goto skip_pacing; } - inp->inp_hpts_calls = 1; - error = tcp_output(tp); - if (error < 0) - goto skip_pacing; - inp->inp_hpts_calls = 0; if (ninp) { /* * If we have a nxt inp, see if we can diff --git a/sys/netinet/tcp_stacks/bbr.c b/sys/netinet/tcp_stacks/bbr.c index fab85e88a382..f5cf362a57dc 100644 --- a/sys/netinet/tcp_stacks/bbr.c +++ b/sys/netinet/tcp_stacks/bbr.c @@ -11581,6 +11581,9 @@ bbr_do_segment_nounlock(struct tcpcb *tp, struct mbuf *m, struct tcphdr *th, /* Do we have the correct timer running? */ bbr_timer_audit(tp, bbr, lcts, &so->so_snd); } + /* Clear the flag, it may have been cleared by output but we may not have */ + if ((nxt_pkt == 0) && (inp->inp_hpts_calls)) + inp->inp_hpts_calls = 0; /* Do we have a new state */ if (bbr->r_state != tp->t_state) bbr_set_state(tp, bbr, tiwin); @@ -11850,6 +11853,8 @@ bbr_output_wtime(struct tcpcb *tp, const struct timeval *tv) memcpy(&bbr->rc_tv, tv, sizeof(struct timeval)); cts = tcp_tv_to_usectick(&bbr->rc_tv); inp = bbr->rc_inp; + hpts_calling = inp->inp_hpts_calls; + inp->inp_hpts_calls = 0; so = inp->inp_socket; sb = &so->so_snd; if (tp->t_nic_ktls_xmit) @@ -11985,7 +11990,7 @@ bbr_output_wtime(struct tcpcb *tp, const struct timeval *tv) merged_val = bbr->rc_pacer_started; merged_val <<= 32; merged_val |= bbr->r_ctl.rc_last_delay_val; - bbr_log_pacing_delay_calc(bbr, inp->inp_hpts_calls, + bbr_log_pacing_delay_calc(bbr, hpts_calling, bbr->r_ctl.rc_agg_early, cts, delay_calc, merged_val, bbr->r_agg_early_set, 3); bbr->r_ctl.rc_last_delay_val = 0; @@ -12028,8 +12033,6 @@ bbr_output_wtime(struct tcpcb *tp, const struct timeval *tv) * c) We had room to send something. * */ - hpts_calling = inp->inp_hpts_calls; - inp->inp_hpts_calls = 0; if (bbr->r_ctl.rc_hpts_flags & PACE_TMR_MASK) { int retval; diff --git a/sys/netinet/tcp_stacks/rack.c b/sys/netinet/tcp_stacks/rack.c index c134c059ec89..40706d5ebfca 100644 --- a/sys/netinet/tcp_stacks/rack.c +++ b/sys/netinet/tcp_stacks/rack.c @@ -16427,6 +16427,8 @@ rack_do_compressed_ack_processing(struct tcpcb *tp, struct socket *so, struct mb } did_out = 1; } + if (rack->rc_inp->inp_hpts_calls) + rack->rc_inp->inp_hpts_calls = 0; rack_free_trim(rack); #ifdef TCP_ACCOUNTING sched_unpin(); @@ -16525,6 +16527,17 @@ rack_do_segment_nounlock(struct tcpcb *tp, struct mbuf *m, struct tcphdr *th, (*ts_ptr == TCP_LRO_TS_OPTION))) no_output = 1; } + if ((no_output == 1) && (slot_remaining < tcp_min_hptsi_time)) { + /* + * It is unrealistic to think we can pace in less than + * the minimum granularity of the pacer (def:250usec). So + * if we have less than that time remaining we should go + * ahead and allow output to be "early". We will attempt to + * make up for it in any pacing time we try to apply on + * the outbound packet. + */ + no_output = 0; + } } if (m->m_flags & M_ACKCMP) { /* @@ -16984,6 +16997,9 @@ do_output_now: rack_start_hpts_timer(rack, tp, cts, slot_remaining, 0, 0); rack_free_trim(rack); } + /* Clear the flag, it may have been cleared by output but we may not have */ + if ((nxt_pkt == 0) && (inp->inp_hpts_calls)) + inp->inp_hpts_calls = 0; /* Update any rounds needed */ if (rack_verbose_logging && tcp_bblogging_on(rack->rc_tp)) rack_log_hystart_event(rack, high_seq, 8); @@ -19637,6 +19653,7 @@ rack_output(struct tcpcb *tp) ts_val = get_cyclecount(); #endif hpts_calling = inp->inp_hpts_calls; + rack->rc_inp->inp_hpts_calls = 0; #ifdef TCP_OFFLOAD if (tp->t_flags & TF_TOE) { #ifdef TCP_ACCOUNTING @@ -19759,7 +19776,6 @@ rack_output(struct tcpcb *tp) counter_u64_add(rack_out_size[TCP_MSS_ACCT_INPACE], 1); return (0); } - rack->rc_inp->inp_hpts_calls = 0; /* Finish out both pacing early and late accounting */ if ((rack->r_ctl.rc_hpts_flags & PACE_PKT_OUTPUT) && TSTMP_GT(rack->r_ctl.rc_last_output_to, cts)) { @@ -19876,7 +19892,7 @@ again: while (rack->rc_free_cnt < rack_free_cache) { rsm = rack_alloc(rack); if (rsm == NULL) { - if (inp->inp_hpts_calls) + if (hpts_calling) /* Retry in a ms */ slot = (1 * HPTS_USEC_IN_MSEC); so = inp->inp_socket; @@ -19887,8 +19903,6 @@ again: rack->rc_free_cnt++; rsm = NULL; } - if (inp->inp_hpts_calls) - inp->inp_hpts_calls = 0; sack_rxmit = 0; len = 0; rsm = NULL;