From nobody Fri Dec 17 07:29:19 2021 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 5585719031D8; Fri, 17 Dec 2021 07:29:20 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JFgbN0b1Cz3JFl; Fri, 17 Dec 2021 07:29:20 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id E971817C7B; Fri, 17 Dec 2021 07:29:19 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 1BH7TJOd036465; Fri, 17 Dec 2021 07:29:19 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 1BH7TJud036464; Fri, 17 Dec 2021 07:29:19 GMT (envelope-from git) Date: Fri, 17 Dec 2021 07:29:19 GMT Message-Id: <202112170729.1BH7TJud036464@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Andriy Gapon Subject: git: 194be9b71121 - stable/12 - kern_tc: unify timecounter to bintime delta conversion List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: avg X-Git-Repository: src X-Git-Refname: refs/heads/stable/12 X-Git-Reftype: branch X-Git-Commit: 194be9b71121ba21eb6150a8c7873a2ed0e5e74e Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1639726160; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=gI5wZu4AiWmCNb+zVXdrCnmwvm3ceU/J4ejJhEAh78Y=; b=xn0MjwO0aAnTu76Hq73BE5abTD0XYr/rWbv7H303UWsH/3F4hNr7guXuO1hx0zoapo4F/H y6rMtL/t5A6FWzj4KanxTqDKqw7wHM6tz69gvGWFe+1iM8Vlf9FEYqTxvIAJVQCyThWNi1 kNLuQNbKqtmcrBZWyQYpfibMrXt3Nu++ycnX0kIXNxE44cJseKka7CnfvTkT/9wM9UzPbx drzbfgJJNoL2IwIKh+KX0YNtjMBXi4bRuucfpgkUzZxqBDYU+c6x9rVD5aZ/fmYq6gz/+d ytA+5Cygw8QnSo1d1L1tEnt3JlgSXd3SaNaUMxRODryu410Qj2vhjBYmwb37hg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1639726160; a=rsa-sha256; cv=none; b=S9QvPC1NSxyc1tyMZtSWVbBkHG1di8XSzotGiQxuUxvpg1C32BPQKB376Qu+1rPSrsUxTL ZDARuKIG7IDhXWrYYNdEmJUVOaxPmK/1TeYmz/1kUvuu/SyiwMMcYyIZuFC/c/DB5VbKzI lLvBE/7RcGWICDTzCBjNPQx3+QJVWE9pfAn3PNvDlcj08EP7TtIUMY8PNOkdpJY4LFZ+iQ JaCRGN2VS5Pp831G0D65D962jBpM/b370jl2pHAtjWNrBFqTe4R1Ha34ygqLuc3QG8CcUa 6URc2VaPB4m4c5/peKvzhE1G4MQ8bzhAZgkhUCHxVQa0PBA7Oh3pbH0y/pCjPQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch stable/12 has been updated by avg: URL: https://cgit.FreeBSD.org/src/commit/?id=194be9b71121ba21eb6150a8c7873a2ed0e5e74e commit 194be9b71121ba21eb6150a8c7873a2ed0e5e74e Author: Andriy Gapon AuthorDate: 2021-11-30 13:23:23 +0000 Commit: Andriy Gapon CommitDate: 2021-12-17 07:29:08 +0000 kern_tc: unify timecounter to bintime delta conversion There are two places where we convert from a timecounter delta to a bintime delta: tc_windup and bintime_off. Both functions use the same calculations when the timecounter delta is small. But for a large delta (greater than approximately an equivalent of 1 second) the calculations were different. Both functions use approximate calculations based on th_scale that avoid division. Both produce values slightly greater than a true value, calculated with division by tc_frequency, would be. tc_windup is slightly more accurate, so its result is closer to the true value and, thus, smaller than bintime_off result. As a consequence there can be a jump back in time when time hands are switched after a long period of time (a large delta). Just before the switch the time would be calculated with a large delta from th_offset_count in bintime_off. tc_windup does the switch using its own calculations of a new th_offset using the large delta. As explained earlier, the new th_offset may end up being less than the previously produced binuptime. So, for a period of time new binuptime values may be "back in time" comparing to values just before the switch. Such a jump must never happen. All the code assumes that the uptime is monotonically nondecreasing and some code works incorrectly when that assumption is broken. For example, we have observed sleepq_timeout() ignoring a timeout when the sbinuptime value obtained by the callout code was greater than the expiration value, but the sbinuptime obtained in sleepq_timeout() was less than it. In that case the target thread would never get woken up. The unified calculations should ensure the monotonic property of the uptime. The problem is quite rare as normally tc_windup should be called HZ times per second (typically 1000 or 100). But it may happen in VMs on very busy hypervisors where a VM's virtual CPU may not get an execution time slot for a second or more. Reviewed by: kib Sponsored by: Panzura LLC (cherry picked from commit 3d9d64aa1846217eac9229f8cba5cb6646a688b7) --- sys/kern/kern_tc.c | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 7f9b48cb1dd4..df900bd32394 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -198,6 +198,23 @@ tc_delta(struct timehands *th) tc->tc_counter_mask); } +static __inline void +bintime_add_tc_delta(struct bintime *bt, uint64_t scale, + uint64_t large_delta, uint64_t delta) +{ + uint64_t x; + + if (__predict_false(delta >= large_delta)) { + /* Avoid overflow for scale * delta. */ + x = (scale >> 32) * delta; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + bintime_addx(bt, (scale & 0xffffffff) * delta); + } else { + bintime_addx(bt, scale * delta); + } +} + /* * Functions for reading the time. We have to loop until we are sure that * the timehands that we operated on was not updated under our feet. See @@ -209,7 +226,7 @@ bintime_off(struct bintime *bt, u_int off) { struct timehands *th; struct bintime *btp; - uint64_t scale, x; + uint64_t scale; u_int delta, gen, large_delta; do { @@ -223,15 +240,7 @@ bintime_off(struct bintime *bt, u_int off) atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); - if (__predict_false(delta >= large_delta)) { - /* Avoid overflow for scale * delta. */ - x = (scale >> 32) * delta; - bt->sec += x >> 32; - bintime_addx(bt, x << 32); - bintime_addx(bt, (scale & 0xffffffff) * delta); - } else { - bintime_addx(bt, scale * delta); - } + bintime_add_tc_delta(bt, scale, large_delta, delta); } #define GETTHBINTIME(dst, member) \ do { \ @@ -1324,17 +1333,8 @@ tc_windup(struct bintime *new_boottimebin) #endif th->th_offset_count += delta; th->th_offset_count &= th->th_counter->tc_counter_mask; - while (delta > th->th_counter->tc_frequency) { - /* Eat complete unadjusted seconds. */ - delta -= th->th_counter->tc_frequency; - th->th_offset.sec++; - } - if ((delta > th->th_counter->tc_frequency / 2) && - (th->th_scale * delta < ((uint64_t)1 << 63))) { - /* The product th_scale * delta just barely overflows. */ - th->th_offset.sec++; - } - bintime_addx(&th->th_offset, th->th_scale * delta); + bintime_add_tc_delta(&th->th_offset, th->th_scale, + th->th_large_delta, delta); /* * Hardware latching timecounters may not generate interrupts on