Re: widening ticks

From: Mark Johnston <markj_at_freebsd.org>
Date: Sat, 11 Jan 2025 22:35:36 UTC
On Sun, Jan 12, 2025 at 04:35:43AM +0900, Tomoaki AOKI wrote:
> On Sat, 11 Jan 2025 11:34:06 -0500
> Mark Johnston <markj@freebsd.org> wrote:
> 
> > On Sat, Jan 11, 2025 at 01:11:06PM +0900, Tomoaki AOKI wrote:
> > > On Wed, 8 Jan 2025 18:07:47 -0500
> > > Mark Johnston <markj@freebsd.org> wrote:
> > > 
> > > > On Thu, Jan 09, 2025 at 12:18:48AM +0200, Konstantin Belousov wrote:
> > > > > On Wed, Jan 08, 2025 at 04:31:16PM -0500, Mark Johnston wrote:
> > > > > > The global "ticks" variable counts hardclock ticks, it's widely used in
> > > > > > the kernel for low-precision timekeeping.  The linuxkpi provides a very
> > > > > > similar variable, "jiffies", but there's an incompatibility: the former
> > > > > > is a signed int and the latter is an unsigned long.  It's not
> > > > > > particularly easy to paper over this difference, which has been
> > > > > > responsible for some nasty bugs, and modifying drivers to store the
> > > > > > jiffies value in a signed int is error-prone and a maintenance burden
> > > > > > that the linuxkpi is supposed to avoid.
> > > > > > 
> > > > > > It would be nice to provide a compatible implementation of jiffies.  I
> > > > > > can see a few approaches:
> > > > > > - Define a 64-bit ticks variable, say ticks64, and make hardclock()
> > > > > >   update both ticks and ticks64.  Then #define jiffies ticks64 on 64-bit
> > > > > >   platforms.  This is the simplest to implement, but it adds extra work
> > > > > >   to hardclock() and is somewhat ugly.
> > > > > > - Make ticks an int64_t or a long and convert our native code
> > > > > >   accordingly.  This is cleaner but requires a lot of auditing to avoid
> > > > > >   introducing bugs, though perhaps some code could be left unmodified,
> > > > > >   implicitly truncating the value to an int.  For example I think
> > > > > >   sched_pctcpu_update() is fine.  I've gotten an amd64 kernel to compile
> > > > > >   and boot with this change, but it's hard to be confident in it.  This
> > > > > >   approach also has the potential downside of bloating structures that
> > > > > >   store a ticks value, and it can't be MFCed.
> > > > > > - Introduce a 64-bit ticks variable, ticks64, and
> > > > > >   #define ticks ((int)ticks64).  This requires renaming any struct
> > > > > >   fields and local vars named "ticks", of which there's a decent number,
> > > > > >   but that can be done fairly mechanically.
> > > > > > 
> > > > > > Is there another solution which avoids these pitfalls?  If not, should
> > > > > > we go ahead with one of these approaches?  If so, which one?
> > > > > 
> > > > > You cannot do this in C, but can in asm:
> > > > >         .data
> > > > >         .globl  ticksl, ticks
> > > > >         .type   ticksl, @object
> > > > >         .type   ticks, @object
> > > > > ticksl: .quad
> > > > >         .size   ticksl, 8
> > > > > ticks   =ticksl		/* for little-endian */
> > > > > /* ticks	=ticksl + 4  for big-endian */
> > > > >         .size   ticks, 4
> > > > > 
> > > > > 
> > > > > Then update only ticksl in the hardclock().
> > > > 
> > > > I implemented your suggestion here: https://reviews.freebsd.org/D48383
> > > 
> > > As this is already committed to main, commenting here instead of review
> > > D48383.
> > > 
> > > Maybe I'm too paranoid and overlooking something, but...
> > > 
> > > *If "jiffies" in LinuxKPI is really unsigned, isn't there any
> > >  possibilities that relies on its value to be larger than
> > >  0x7fffffffffffffff as a threshold?
> > >  (Yes, it should be silly and non-realistic, but theoretically
> > >   possible.)
> > 
> > Ideally we would have
> > 
> >     #define jiffies ((unsigned long)ticksl)
> > 
> > in the linuxkpi, but some Linux code uses "jiffies" as a struct field or
> > local variable name, so this doesn't quite work.
> > 
> > In practice, the value is usually assigned to an unsigned long or used
> > as an operand where it would be implicitly promoted to an unsigned type,
> > so we don't see any incompatibilities.
> > 
> > When jiffies is an int, code like the following can misbehave:
> > 
> > 	unsigned long remain, timeout = jiffies + const;
> > 	...
> > 	remain = timeout - jiffies;
> > 	if ((long)remain < 0)
> > 		/* timed out */
> > 
> > If (int)timeout and jiffies have different signs, as might happen close
> > to a rollover, the comparison won't work as expected.
> > 
> > Linux has some macros (time_after() etc.) which are supposed to be used
> > instead of direct comparisons, but they're not always used.
> 
> So ticksl should better be unsigned long if there's no reason to keep
> it signed, isn't it?

Well, I kept it signed since it's meant to be similar in usage to ticks.
With a signed counter, you can check test whether a value has passed by
looking at the sign of the difference between ticks(l) and that value
(modulo rollover).  With an unsigned counter, you need some casting, as
in the example above.

> > > *Is anywhere checking carry (sign) bit for int on LP32?
> > >  Maybe it would be the reason if "jiffies" in LinuxKPI is really
> > >  unsigned.
> > 
> > Could you provide an example of what you mean?
> 
> Not an example of code, but for example, when ticksl is at
> 0x7fffffffffffffff (positive value), ticks shoule be 0xffffffff
> (negative value), if I read the diff correctly.
> The same thing starts happening ticksl is at 0x0000000080000000 throug
> 0x00000000ffffffff and values alike. So signs (carry bits, usually the
> leftmost bit of each) should be checked separately for ticksl and ticks.

That's true, but I can't see why any code would care about this?

> Am I (hopefully) overlooking something?
> 
> -- 
> Tomoaki AOKI    <junchoon@dec.sakura.ne.jp>