[Bug 275155] TSC clocksource quickly becomes unstable in ubuntu VMs

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 17 Nov 2023 20:25:23 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275155

            Bug ID: 275155
           Summary: TSC clocksource quickly becomes unstable in ubuntu VMs
           Product: Base System
           Version: Unspecified
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: bhyve
          Assignee: virtualization@FreeBSD.org
          Reporter: sean@rogue-research.com

All 3 of my Ubuntu VMs (20.04 and 22.04), shortly after booting, have in their
kernel log:

```
clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc' as
unstable because the skew is too large:
clocksource:                       'hpet' wd_nsec: 510151922 wd_now: dd0dbd1f
wd_last: dc8b23ce mask: ffffffff
clocksource:                       'tsc' cs_nsec: 510575753 cs_now:
41a058cf4fac4e cs_last: 41a0588c680999 mask: ffffffffffffffff
clocksource:                       'tsc' is current clocksource.
tsc: Marking TSC unstable due to clocksource watchdog
TSC found unstable after boot, most likely due to broken BIOS. Use
'tsc=unstable'.
sched_clock: Marking unstable (5085057420454, 3573276862)<-(5088679512297,
-48428374)
clocksource: Checking clocksource tsc synchronization from CPU 3 to CPUs 0-2.
clocksource: Switched to clocksource hpet
```

I'm using the bhyve in the latest TrueNAS:

FreeBSD freenas.local 13.1-RELEASE-p7 FreeBSD 13.1-RELEASE-p7
n245428-4dfb91682c1 TRUENAS amd64

My host (running TrueNAS) is an Intel Xeon E5-2630 v4 @ 2.2 GHz with SuperMicro
X10 motherboard.  It was previously using TSC as its clocksource, but I
switched it to HPET.  The above log in the guests occurs either way.

I'm not certain if these logs are in and of themselves an indication of a
problem, but I'm filing this ticket on the assumption that they are.  (All 3 of
these VMs also have a problem where their /var/log/kern.log will frequently
have a entry about CPU stalls, and then services fail, and a reboot is
required.  I suspect, but don't know, that the root problem is this clocksource
issue above, and hence this ticket.)

-- 
You are receiving this mail because:
You are the assignee for the bug.