[PATCH] ACPI CMOS region support - submission?

Sat Aug 9 00:13:27 UTC 2014

On Fri, 8 Aug 2014, Ian Smith wrote:

> On Wed, 6 Aug 2014 06:49:12 +1000, Bruce Evans wrote:
> > On Tue, 5 Aug 2014, Ian Smith wrote:
> >
> > > On Sat, 2 Aug 2014 18:10:05 -0400, Anthony Jenkins wrote:
>* ...
> > > I don't know about the spec, but I can't see allowing ACPI write access
> > > to time/alarm/date registers as being a sensible idea; this could allow
> > > completely out-of-band modification of time/alarm/date regs without the
> > > necessary precautions needed for setting these, namely use of the SET
> > > bit in status reg B (0x0b) to disable updates while setting time & date,
> > > and anyway why allow changing time/date without the OS knowing about it?
> > > Reading should be no problem, though without proper observance of UIP
> > > bit timing, data may not be consistent.
> >
> > They might need to be read on resume.  I can't see how resume can possibly
> > work without reading them or some clock to reset the time.  But this should
> > be done by calling the standard time reading functions, not at a low level.
>
> Well, we've long - not sure about 'always' - saved current timestamp to
> RTC clocktime (ts_to_ct) on suspend and recalled on resume (ct_to_ts),
> shown when bootverbose is enabled, x86 anyway; this has nothing to do
> with ACPI.
>
> Actually, 9.x and presumably 10.x needs debug.clocktime=1 to see the
> messages re these, previously (eg below from 8.2-R) you see these by
> default every 30 minutes, having booted verbose, which was annoying:

No, not long.  This bug was intentionally left out originally, and
wasn't committed until a few years ago.  Only ntpd or the user can
know if the time is correct enough to write to the RTC.  Unfortunately,
writes from ntpd don't happen automatically, since ntpd uses normally
uses micro- adjustments on FreeBSD.  Setting the clock using
settimeofday() or clock_settime() works right for setting the RTC if
and only if it is done ntpd has stabilized.  (It has some bugs in its
clobbering of the boot time, but works right for the hardware.)
Unfortunately, ntpd usually only sets the clock like this for an initial
step _before_ ntpd has stablized.  At least with old versions of ntpd.
I use an old version with ntpdate and set the clock at boot time using
ntpdate from an accurate timeserver.  This prevents ntpd doing a less
accurate step.

The writing routine is of low quality and has an error of +-1 second
(by not synchronizing with seconds updates).  The reading routine is
not much better and also has an error of +-1 second.  The latter is
fixed in my version.  Synchronization is easier for reading.  For
writing, I just log the write and return if it would make a less than
useless change:

% 	/*
% 	 * If the RTC time is already correct (to within -0 + 1 second),
% 	 * then do nothing, since writing would just clobber the hidden
% 	 * state which is more likely to be correct than the (truncated)
% 	 * time.  We assume that the weekday is correct if everything
% 	 * else is.
% 	 */
% 	tv_sec = rtc_time(&s);
% 	nanotime(&ts);
% 	if (tv_sec == -1)
% 		s = splhigh();

Here rtc_time() is like inittodr() except it reads the RTC accurately
(to about 10 usec) and has the necessary locking to do this atomically
enough with nanotime().

% 	change = (tv_sec == -1 ||
% 	    (tv_sec != ts.tv_sec && tv_sec + 1 != ts.tv_sec));
% 	klocaltime(change ? ts.tv_sec : tv_sec,
% 	    &yyyy, &mm, &dd, &wday, &hh, &MM, &ss);

Here klocaltime() is like clock_cs_to_ct() and localtime() except it has
an easier to use API.

% 	yy = yyyy % 100;
% 	printf("resettodr: %schange\n", change ? "" : "no ");
% 	printf("old: %02x/%02x/%02x %02x:%02x:%02x, cent = %02x, wday = %d\n",
% 	       rtcin(RTC_YEAR), rtcin(RTC_MONTH), rtcin(RTC_DAY),
% 	       rtcin(RTC_HRS), rtcin(RTC_MIN), rtcin(RTC_SEC),
% 	       rtcin(RTC_CENTURY), rtcin(RTC_WDAY));
% 	printf("new: %02x/%02x/%02x %02x:%02x:%02x, yyyy = %d, wday = %d\n",
% 	       bin2bcd(yy), bin2bcd(mm), bin2bcd(dd),
% 	       bin2bcd(hh), bin2bcd(MM), bin2bcd(ss), yyyy, wday);
% 	if (change == 0) {
% 		splx(s);
% 		return;
% 	}

All this used to be protected by splhigh(), but splhigh() locking was
broken by SMPng.  It was used not just for exclusion, but also for
timing.  SMPng only gave the exclusion (by Giant locking, provided
Giant locking is actually used for all callers).  -current still has
only Giant locking in clock_settime(), and non extra in inittodr() or
resettodr().  This is most broken where inittodr() calls tc_setclock().
tc_setclock() is only locked by the calling it only from the hardclock
interrupt handler, where it is locked by whatever locking that has
(certainly not Giant).

The above version benefits from some time-domain locking: rtc_time()
synchronizes with the seconds rollover.  After that the clock registers
are stable for almost 1 second so it would take 1 seconds worth of
interruption for the registers to change.

Since this is a write, the current state of the registers doesn't
matter so much, but 1 seconds of interruption would make the time
written wrong by 1 second.  The time written is certainly wrong by
many microseconds, but since there is no synchronization the time
written is always wrong by +-1 second.  If the clock hardware were
simple, then the error would usually be negative with an average of
-0.5 seconds, but ISTR that there are complications that make the
error less predictable.

On read, the error is usually negative with an average error of -0.5
seconds, except the broken locking for timing allows huge errors when
the seconds rollover also rolls over the minutes and/or hours.  It
only takes being interrupted for 244 usec to break the timing.

> Aug  8 00:41:32 t23 kernel: ts_to_ct(1407458492.000380514) = [2014-08-08 00:41:32]
> Aug  8 01:11:31 t23 kernel: ts_to_ct(1407460291.799401540) = [2014-08-08 01:11:31]
> Aug  8 01:41:31 t23 kernel: ts_to_ct(1407462091.598502848) = [2014-08-08 01:41:31]
> Aug  8 02:11:31 t23 kernel: ts_to_ct(1407463891.397341911) = [2014-08-08 02:11:31]
> Aug  8 02:41:31 t23 kernel: ts_to_ct(1407465691.196399111) = [2014-08-08 02:41:31]
> Aug  8 03:11:31 t23 kernel: ts_to_ct(1407467490.995395001) = [2014-08-08 03:11:30]
> Aug  8 03:41:30 t23 kernel: ts_to_ct(1407469290.794307971) = [2014-08-08 03:41:30]

Worse than I remembered (I don't run any kernels with this bug), so I never
saw these messages before).  Updating every 30 seconds is far too frequent,
and spams the logs.  My logs are spammed with the following messages instead
:-):

% Aug  8 05:46:49 besplex kernel: inittodr: delta = 0.000041737

This is the latency between the RTC read and using the result.  ~41 usec.

% Aug  8 05:46:56 besplex kernel: resettodr: change
% Aug  8 05:46:56 besplex kernel: old: 14/08/08 05:46:54, cent = 20, wday = 6
% Aug  8 05:46:56 besplex kernel: new: 14/08/08 05:46:56, yyyy = 2014, wday = 6

This is the change initiated by netdate -b after booting.  Nothing except
rebooting calls resettodr(), and this change of 2 seconds is for 1 days
of drift in the RTC.  An error of 1-2 seconds per day means that it it
is useless to sync to the RTC more often than once per day.  Twice per
hour is much more than once per day.

% Aug  8 08:03:13 besplex kernel: sampled delta = 0.533057039, tv_sec = 1407448993
% Aug  8 10:19:45 besplex kernel: sampled delta = 0.615658287, tv_sec = 1407457185
% Aug  8 12:36:17 besplex kernel: sampled delta = 0.699694553, tv_sec = 1407465377
% Aug  8 14:52:49 besplex kernel: sampled delta = 0.785025899, tv_sec = 1407473569
% Aug  8 17:09:21 besplex kernel: sampled delta = 0.874881660, tv_sec = 1407481761
% Aug  8 19:25:54 besplex kernel: sampled delta = 0.963000188, tv_sec = 1407489953
% Aug  8 21:42:26 besplex kernel: sampled delta = 1.046254684, tv_sec = 1407498145

This tracks the drift of the RTC every 2048 seconds.  Not much different
to twice per hour.  During development I used to log these messages even
more often.

These messages show the drift of the RTC almost exactly.  It is about 82
milliseconds per 2048 seconds.

My version tracks the drift every 16 seconds, and uses the delta for
things like resetting the system time (not the RTC time) after stopping
in ddb.  Stopping in ddb is like a short suspension.  Similar methods
should be used to reset the system time after suspension.  The new time
is the current RTC time plus a compensation, where the compensation is
the absolute difference of the times at the previous RTC read (done
every 16 seconds or so while not suspended) plus the suspension time
times the estimated RTC drift rate.

> This reveals some warts; seconds are always rounded down, so that on
> resume you can expect, on average, to lose half a second; worst case
> almost a second.  Thus the first ntpd correction after (even immediate)
> resume is often positive, maybe more than a second, sometimes two or so.

Indeed.  It is half a second on average from upper layers not having
any synchronization with seconds rollovers, then more from complications
in the RTC.  (I'm mostly writing about x86 RTCs.  The details of the
hardware complications are MD of course.)  According to Linux sources,
the complications are that settings must be made on half-second
boundaries, not seconds boundaries.  Apparently the rollover for a
newly set value occurs after 500 milliseconds.  You wouldn't want it
to be unsynchronized, and having it occur after 500 milliseconds is
less fragile than having it occur after 0 or 1000 milliseconds.  In
fact, 500 millseconds compensates on average for low quality setters
like FreeBSD.  These setters have an average error of -0.5 seconds,
and the hardware behaviour gives a fixed adjustment of +0.5 seconds,
so the average error is 0.

Careful setting and reading doesn't matter much since the clock drift
overnight is typically 1 second.  It is not useful to adjust the clock
by 1 second for the day's use and then have it drift by 1 second
overnight.  The error on rebooting is just reduced from 2 seconds to
1 second.  Both are too large for ntpd, so you need ntpdate or worse
in the boot sequence to fix up the clocks before they are used much.
If the system is not turned off overnight or rebooted for other reasons,
then the RTC time matters even less.

> Similarly when restoring time from RTC on resume, there's no synching
> with UIP so, on average, another half second will be dropped before ntpd
> picks up the pieces.  Whether sync is worth the bother or time is moot.

Indeed.

I get the sync during normal operation using the seconds alarm.  No waiting
to read the clock then.  Otherwise, rtc_time() has to busy-wait for up to 1
second.  Note good for an active system.  Acceptable for booting and
resuming.

> > > I guess there may be a case for messing with the alarm bytes, though we
> > > don't currently allow use of the alarm for wake from poweroff (S5) or
> > > STR (S3) states, as doth Linux.  I looked into this some years ago when
> > > there were a few requests for the feature, however it would involve some
> > > major reworking to always preserve the alarm interrupt bit through init
> > > and suspend/resume, and appropriate clearing of same after use, along
> > > with a utility to setup the alarm .. a potential to leave open, perhaps?
> >
> > I use the seconds alarm for pps, but don't use suspend.
>
> Is it accurate enough for pps?  Are you using ntpd to update your RTC?

No, but I once planned to use RTC to update ntpd (for systems not connected
to the internet for long periods).  It turned out that the RTC and the
TSC drift by about the same rates, so they are equally good or bad as
primary sources for ntpd.  I had one system drifting by about 0.1 seconds/
day after drift compensations (the RTC was off by several seconds per
day, but after compensating for this the drift from temperature variations
was only 0.1 seconds/day.  That is over a week or 2 when the average
termperature doesn't change much.  The compensation should depend on
the season and on actual temperature measurements, but I never automated
all that).

So I now just use pps to detect any long-term drift or errors in the RTC.
ntpdc -c kerninfo reports after 2 hours of uptime:

% pll offset:           -0.0011784 s2 hours of uptime:
% pll frequency:        -2.453 ppm
% maximum error:        0.20712 s
% estimated error:      0.004653 s
% status:               2101  pll ppssignal nano
% pll time constant:    6
% precision:            1e-09 s
% frequency tolerance:  496 ppm
% pps frequency:        -14.001 ppm
% pps stability:        0.076 ppm
% pps jitter:           1.7625e-05 s
% calibration interval: 256 s
% calibration cycles:   48
% jitter exceeded:      2
% stability exceeded:   0
% calibration errors:   1

I forget how to interpet all of this.  The pps frequency of -14.001 ppm
is the average drift of the RTC relative to the (ntpd-disiplined) real
time (or something).  The pps stability of 0.076 ppm seems good.  The
pps jitter of 1.7525e-05 s seems like nonsense.  IIRC, this measurement
is not really jitter, but incorporates the average error of -14.001 ppm
and some other error (1.7525e-05 translates to 17.525 ppm).

> Ok, and others may want to use alarm or periodic interrupts for whatever
> purpose, possibly at much higher rates than once per second, so there's
> still a case for maintaining the reg select optimisation noted below ..

Even once per second is too high.  The lapic timer can do short, variable
timeouts much better.  The RTC has a very high overhead for reprogramming.

> > > -static	int	rtc_reg = -1;
> ...
> > Removing this is a good re-pessimization.  rtcin() does 4 i/o's with the
> > pessimization.  This takes about 5 usec.  At 128 Hz this only wastes
> > 0.064% of a CPU.  At 1024 Hz it wastes 0.512%.  The non-pessimized
> > version only does 1 i/o so it wastes 1/4 as much.  The wastage should
> > be much larger.  1024 Hz is far too slow for profiling a modern system.
> > It is about right for a 4.77 MHz 8086, except the RTC overhead is too
> > high for that (I used the i8254).  Scaling this for a 2 GHz modern CPU
> > gives 429 kHz (or more, since instructions per cycle is more).  The
> > overhead for that with the pessimization would be 214% of a CPU.
> > Without the pessimization, it would only be 53.7% of a CPU.  Still too
> > much.  The RTC is just unsuitable for profiling interrupts at a useful
> > rate.  The i8254 is better and the lapic timer is much better, but
> > profhz has only been increased to 8192.  Other things don't scale so
> > well.  There are other i/o's for interrupts that make it difficult
> > to go much higher than 429 KHz though 429 Khz is certainly possible
> > using less than 100% of a CPU.
>
> I'm struggling to see why it's a 'good' re-pessimization?  You suggest
> below that adding another one or two I/O delays would be even better?

Good pessimation == optimization for reduced performace.  Even easy by
wasting more time.

> > It is a minor layering violation to have acpi functions in non-acpi drivers.
> > At least use the driver's spellings and not the Windows ones in drivers.
>
> I originally thought the former also, but Anthony has been encouraged by
> someone here to add the ACPI CMOS Region methods here.

The existence of the nvram driver and acpi_support/acpi_ibm.c show that
rtcin() and writertc() can be used without any changes.  The Region
methods can just loop calling these functions.  The only thing a special
routine can do better is lock all the device accesses together.  But
locking is neither necessary nor sufficient for getting a coherent
read -- the data can still change in a short time.  Reads that need
coherence should do something like looping until the whole region
stabilizes.  Writes are more complicated -- neither locking nor checking
for stability gives atomicity for volatile registers.

> > > 	RTC_LOCK;
> > >
> > > +	for (offset = 0U; offset < buflen; ++offset) {
> > > +		IO_DELAY();
> > > +		outb(IO_RTC_ADDR, ((address + offset) | 0x80) & 0xFF);
> > > +		IO_DELAY();
> > > +		buf[offset] = inb(IO_RTC_DATA);
> >
> > This should just call rtcin() in a loop instead of breaking it.  Then
> > there would be no need to put this in the driver.
>
> Yes, I think that'd be the right way to go; leave rtcin() and writertc()
> alone, warts and all allowing r/w of locations 0-0xff for its consumers,
> and as you say, calling rtcin() in the loop.  This preserves POLA and
> further would allow moving the ACPI stuff to anywhere? more appropriate,
> I guess ACPICA when that's updated to ACPI 5 or wherever this came in.

Except possibly for the bulk locking.  I think ACPI mainly wants to
access non-volatile registers so it doesn't need bulk locking.

Move it somewhere near acpi_ibm.c?

> > > Sorry, but I don't get '| 0x80' there, and below in acpi_cmos_write ?
> >,,,
> Ok, but if we don't bother disabling NMI, no need to reenable it, right?
> And that bit 0x80 isn't 'in' the RTC, it must be separately decoded on
> the address bus to a latch or such ..

Yes, and most hardware probably doesn't even have the latch.

> > ...
> > Why do we have to care about acpi doing unsafe accesses?  Only the
> > block access seems dangerous.  With scattered accesses like
> > rtcin(RTC_FOO) using the system (non-acpi) spelling for 'FOO', it is
> > very easy to see any dangerous ones.
>
> You can't easily 'see' what BIOS AML code is doing without examining it
> specifically, and it's both vendor- and user-updateable.  This is why I
> think it's important to print exactly which parts of RTC/NVRAM are being
> accessded, apart from developing some doc on what different vendors do.

If it uses my methods then I can stop it doing bad things by not allowing
it to access most registers, and find what it is doing by loggin all
accesses :-).  Hard for the end user with unusual AML to debug though.
I wouldn't allow it write access to any register that the system uses
(mainly RTC) or read access to volatile registers that the system uses
(mainly RTC statuses?).

> > rtcin() used to be only used for
> > memory sizing and in the fd driver (for dubious disk equipment byte
> > reads; the equipment bytes just tell you about equipment that systems
> > might have had in 1985, so it is almost useless, while other BIOS bytes
> > are even more useless since they are not standard).  It is now used in:
> > - nvram driver.  This seems to allow anyone with access to the device
> >   to read and/or write the offsets 14-127.  So they can manage or
> >   corrupt some BIOS bytes but not the ones used by FreeBSD :-).  acpia
> >   should be even more trustable, but might need read access to the
> >   clock bytes.
>
> I don't think it makes sense.  We already look after writing datetime to
> RTC on suspend, and on ntpd corrections.  We read it back on boot, or
> resume, albeit both ways ignoring synching seconds properly.  Why would
> we want to let AML code use OS ACPI services to write to those fields,
> without having actually requested a service, when the OS thinks it is in
> charge of managing time?  Perhaps I just have a naive view on this?
>
> Other bytes in (so-called) NVRAM that a particular machine wants saved
> to and perhaps later reloaded from there across boot or suspend/resume -
> that this code is designed to deal with in the first place - is all very
> well, and _perhaps_ some BIOS might want to manage updating the alarm
> bytes this way (although normally configured in BIOS SETUP), but any
> potential to mess with date or time itself seems out-of-bounds to me.

I would allow it to write RTC registers after the system part of suspend
finishes until after the system part of resume starts.  Might be
difficult to synchronize.

> Further to /sys/dev/nvram/nvram.c .. it allows access to bytes 0x0e
> (EQUIPMENT) through 0x7f, but also checks then recalculates the checksum
> for bytes 0x10 through 0x2d, stored in 0x2e and 0x2f.  So if ACPI wants
> to write anywhere in that area, it has to update the checksum too.  We
> still have yet to see examples of reads or writes that this code allows.

So it will need read access to access all the bytes.  Actually, the volatile
RTC bytes and some others must be off-limits for the checksum.

> One other thing; I don't really know where to go to check Linux code;
> google landed me at mit.edu a lot so I gathered bits there (Linux 3.x)
> but couldn't find a modern definition for tHeir CMOS_READ / CMOS_WRITE
> macros.  This looks too old, maybe:

Yes, it is obfuscated in a different way than FreeBSD (more at lower levels,
less at higher).  Try the include directory.  There are squillions of
definitions, some even in files named mc146818rtc.h.

> http://members.xoom.virgilio.it/ozma/mulinux/Download/El-Torito_iso/usr.163
> from 1993 ..
>
> #define CMOS_READ(addr) ({ \
> outb_p((addr),RTC_PORT(0)); \
> inb_p(RTC_PORT(1)); \
> })
> #define CMOS_WRITE(val, addr) ({ \
> outb_p((addr),RTC_PORT(0)); \
> outb_p((val),RTC_PORT(1)); \
> })

The asm-x86_64 one in 2.6.10 (10 years old) is no different.

> .. but if that's all, why are we bothering with any I/O delay at all?
> Or do outb_p() and inb_p() take care of that?  Yes, too many questions!

Yes, they do.  Linux used the extra-i/o pessimization much more than
FreeBSD.  outb_p() adds the pessimization if the pessimization is needed
and/or configured.

Bruce