From nobody Fri Dec 17 07:29:01 2021 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 9E1B81902ECD; Fri, 17 Dec 2021 07:29:01 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JFgb12f2tz3JBY; Fri, 17 Dec 2021 07:29:01 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 3AD3017961; Fri, 17 Dec 2021 07:29:01 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 1BH7T19A036330; Fri, 17 Dec 2021 07:29:01 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 1BH7T1cC036329; Fri, 17 Dec 2021 07:29:01 GMT (envelope-from git) Date: Fri, 17 Dec 2021 07:29:01 GMT Message-Id: <202112170729.1BH7T1cC036329@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Andriy Gapon Subject: git: 9c0050b0a64a - stable/13 - kern_tc: unify timecounter to bintime delta conversion List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: avg X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: 9c0050b0a64a69f46d4b0ed52f4cf29283e7eafa Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1639726141; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=sijh6CwX3SHR/MWNaecUcZw/2s2+Sh2jKaqF8ANEkTk=; b=WgtROORhLZhmSKf5bKzpVdMHkPfHkg3DsIB5TJS/7123CQxTKYc4Q9HU2JjjX3Z96FOY8I 9nI4uob8212VyYNOw6fhYjae8QThRFOEy+zdT2K/gn2VHroldUcdetyVGEzbyCC0nK9/v3 kW1QWhb7XI80E9+0qf9M+yhY1DDyuXtOsXwymui5nBoopdZurq70rekkx+B9j4kYL2NMu4 On40GPumrK79POpEqg7KpmBwO7BoJ27fBjSw0sBDuaCwCX+NibpkuZ0OCd72UPPuAq3Eb7 alsCYz3dDJMdkfcPUGCw4keMSNQhSQHJi1ecHcJXnYMzNuIM1uEPqIr3+gFSWg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1639726141; a=rsa-sha256; cv=none; b=YkqSZXdMoICCCZL0pcVYbqB2FUj0OCP568pSVle3Cw0s92XqBJ3K9Z2BUFMu4aRNTdFa7O JzViaigij+XOVmRiW7UbWdDNtSkhTKI/samnLaJOxBDhcuFPO5ivIN9I0Jwmt45k5Ai+vc 8X5YqSmqAXlzf9b3ARbo+1knpyio5kedny3eE5d0IKHxDbHZMWbxGTGFMSvIctWdTpDDz5 ebqxJhObF7B0EVh+YG90pSQiP0CEMVUKWSrRBtdciH6/uKrIq9FrADMJ2uyB7hBa6cav+L 4CSwTvFQr83YjtiHM43TU1jGj0GrPqf2Dwx9vSUfpR0l5r5qaIb2Qf2Tc0BePg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch stable/13 has been updated by avg: URL: https://cgit.FreeBSD.org/src/commit/?id=9c0050b0a64a69f46d4b0ed52f4cf29283e7eafa commit 9c0050b0a64a69f46d4b0ed52f4cf29283e7eafa Author: Andriy Gapon AuthorDate: 2021-11-30 13:23:23 +0000 Commit: Andriy Gapon CommitDate: 2021-12-17 07:28:24 +0000 kern_tc: unify timecounter to bintime delta conversion There are two places where we convert from a timecounter delta to a bintime delta: tc_windup and bintime_off. Both functions use the same calculations when the timecounter delta is small. But for a large delta (greater than approximately an equivalent of 1 second) the calculations were different. Both functions use approximate calculations based on th_scale that avoid division. Both produce values slightly greater than a true value, calculated with division by tc_frequency, would be. tc_windup is slightly more accurate, so its result is closer to the true value and, thus, smaller than bintime_off result. As a consequence there can be a jump back in time when time hands are switched after a long period of time (a large delta). Just before the switch the time would be calculated with a large delta from th_offset_count in bintime_off. tc_windup does the switch using its own calculations of a new th_offset using the large delta. As explained earlier, the new th_offset may end up being less than the previously produced binuptime. So, for a period of time new binuptime values may be "back in time" comparing to values just before the switch. Such a jump must never happen. All the code assumes that the uptime is monotonically nondecreasing and some code works incorrectly when that assumption is broken. For example, we have observed sleepq_timeout() ignoring a timeout when the sbinuptime value obtained by the callout code was greater than the expiration value, but the sbinuptime obtained in sleepq_timeout() was less than it. In that case the target thread would never get woken up. The unified calculations should ensure the monotonic property of the uptime. The problem is quite rare as normally tc_windup should be called HZ times per second (typically 1000 or 100). But it may happen in VMs on very busy hypervisors where a VM's virtual CPU may not get an execution time slot for a second or more. Reviewed by: kib Sponsored by: Panzura LLC (cherry picked from commit 3d9d64aa1846217eac9229f8cba5cb6646a688b7) --- sys/kern/kern_tc.c | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 989457b82434..4a222df821be 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -208,6 +208,23 @@ tc_delta(struct timehands *th) tc->tc_counter_mask); } +static __inline void +bintime_add_tc_delta(struct bintime *bt, uint64_t scale, + uint64_t large_delta, uint64_t delta) +{ + uint64_t x; + + if (__predict_false(delta >= large_delta)) { + /* Avoid overflow for scale * delta. */ + x = (scale >> 32) * delta; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + bintime_addx(bt, (scale & 0xffffffff) * delta); + } else { + bintime_addx(bt, scale * delta); + } +} + /* * Functions for reading the time. We have to loop until we are sure that * the timehands that we operated on was not updated under our feet. See @@ -219,7 +236,7 @@ bintime_off(struct bintime *bt, u_int off) { struct timehands *th; struct bintime *btp; - uint64_t scale, x; + uint64_t scale; u_int delta, gen, large_delta; do { @@ -233,15 +250,7 @@ bintime_off(struct bintime *bt, u_int off) atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); - if (__predict_false(delta >= large_delta)) { - /* Avoid overflow for scale * delta. */ - x = (scale >> 32) * delta; - bt->sec += x >> 32; - bintime_addx(bt, x << 32); - bintime_addx(bt, (scale & 0xffffffff) * delta); - } else { - bintime_addx(bt, scale * delta); - } + bintime_add_tc_delta(bt, scale, large_delta, delta); } #define GETTHBINTIME(dst, member) \ do { \ @@ -1385,17 +1394,8 @@ tc_windup(struct bintime *new_boottimebin) #endif th->th_offset_count += delta; th->th_offset_count &= th->th_counter->tc_counter_mask; - while (delta > th->th_counter->tc_frequency) { - /* Eat complete unadjusted seconds. */ - delta -= th->th_counter->tc_frequency; - th->th_offset.sec++; - } - if ((delta > th->th_counter->tc_frequency / 2) && - (th->th_scale * delta < ((uint64_t)1 << 63))) { - /* The product th_scale * delta just barely overflows. */ - th->th_offset.sec++; - } - bintime_addx(&th->th_offset, th->th_scale * delta); + bintime_add_tc_delta(&th->th_offset, th->th_scale, + th->th_large_delta, delta); /* * Hardware latching timecounters may not generate interrupts on