[Bug 232914] kern/kern_resource: Integer overflow in function calcru1

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 29 Jul 2022 03:03:18 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232914

Kubilay Kocak <koobs@FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |maintainer-feedback?(sjg@Fr
                   |                            |eeBSD.org),
                   |                            |maintainer-feedback?(cem@fr
                   |                            |eebsd.org),
                   |                            |maintainer-feedback?(jhb@Fr
                   |                            |eeBSD.org)
            Version|CURRENT                     |12.3-STABLE
                 CC|                            |cem@freebsd.org,
                   |                            |jhb@FreeBSD.org,
                   |                            |sjg@FreeBSD.org
           See Also|                            |https://bugs.freebsd.org/bu
                   |                            |gzilla/show_bug.cgi?id=7697
                   |                            |2
           Severity|Affects Only Me             |Affects Some People
             Status|New                         |Open
            Summary|Integer overflow in         |kern/kern_resource: Integer
                   |function calcru1            |overflow in function
                   |                            |calcru1
           Keywords|                            |needs-qa

--- Comment #1 from Kubilay Kocak <koobs@FreeBSD.org> ---
I believe I've reproduced this on a 12.3-RELEASE-p5 GENERIC amd64 virtualbox
virtual machine.

After suspending the guest and upgrading virtualbox (on host) and resuming, the
following was logged in /var/log/messages:

Jul 28 00:34:10 123-RELEASE-p5-amd64-9e36 kernel: calcru: runtime went
backwards from 5425487903787745 usec to 18518189771 usec for pid 11 (idle)

Additionally, the following is observed in top output:

  PID   JID USERNAME    THR PRI NICE   SIZE    RES SWAP STATE    C   TIME   
WCPU COMMAND
   11     0 root          2 155 ki31     0B    32K   0B RUN      0    ???
199.70% [idle]

Originally running `ps -p` on the idle process id showed time and system not
changing:

[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 17087 -o comm,time,systime
COMMAND TIME SYSTIME                                                           
                                                                        
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424922:45.67 90424922:45.67
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424922:45.67 90424922:45.67
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424922:45.67 90424922:45.67
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424922:45.67 90424922:45.67
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424922:45.67 90424922:45.67


But after a small period of time started changing again:

[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424939:04.68 90424939:04.68
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424939:05.76 90424939:05.76
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424939:06.43 90424939:06.43

Looking at kern/kern_resource history:

1) Conrad (cem) resolved a calcru1 overflow in base 23e5e43ccd0f via bug 76972
(Bruce followed up with base 23e5e43ccd0f)

2) Simon (sjg) disabled the message for virtual machines in base bacb140f31aa
via review D33902 (unsure whether this was eventually MFC'd or not, the commit
didn't include MFC tag)

There may be behaviour improvements possible in this case, particularly for the
'???' reported times in top. In particular:

1) Had the message been disabled in my (VM) case, I would not have been able to
correlate/isolate the cause via dmesg

2) What could we report for runtime in the event of overflow instead of '???'
What do other OS's do?

With dynamic frequency/power scaling being the norm these days, increasing
heterogeneity in system hardware (big.little etc) in the future, and given this
message appears to only be an (intended) 'report' or diagnostic message solely
to indicate the state change (and not a fatal error) ...

3) What other ways might FreeBSD more broadly/gracefully handle timing changes
going forward?

^Triage: Request feedback from Conrad re (remaining?) calcru1 overflow and
Simon/John re virtual machine situation.

-- 
You are receiving this mail because:
You are the assignee for the bug.