RE: PowerMac G5 crashes with "instruction storage interrupt" on recent 13
- In reply to: Julio Merino : "RE: PowerMac G5 crashes with "instruction storage interrupt" on recent 13"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 13 Sep 2022 13:17:48 UTC
Alright, did some more bisecting and reached this range of commits where the problem with the fans starts: g5:/usr/src> git log --oneline f639aeb3fd3e..6f387a563206 sys 6f387a563206 vm_reserv: #include vm_extern.h explicitly, for arm. bf27b9bc7f5b vm_phys: convert error back to warning 87e6f3d27eba vm_phys: #include vm_extern c5a5a9dbcf38 vm_extern: use standard address checkers everywhere f8da86347070 linux(4): Implement __vdso_time 00c933e9254c linux(4): Use saved cpu feature bits I think we can safely discard the linux(4) commits. Other than that, the build seems broken at each intermediate vm_* step so it’s hard now to pinpoint any of those specifically. Does this ring a bell? Thanks Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows From: Julio Merino<mailto:julio@meroh.net> Sent: Friday, September 9, 2022 18:41 To: Justin Hibbits<mailto:jhibbits@FreeBSD.org> Cc: freebsd-ppc@freebsd.org<mailto:freebsd-ppc@freebsd.org> Subject: RE: PowerMac G5 crashes with "instruction storage interrupt" on recent 13 I have now tried to compare the dmesgs and sysctl of a good kernel (built at 9171b8068b92 with the workaround applied) and a recent bad kernel with the workaround applied as well. The main differences comparing dmesg output, where the dash prefix is for the good kernel and the plus prefix is for the bad kernel: ----- -bus_dmamem_alloc failed to align memory properly. -firewire0: 2 nodes, maxhop <= 1 cable IRM irm(1) (me) +firewire0: 2 nodes, maxhop <= 1 Not IRM capable irm(-1) +pci1:5:4:0: VPD data does not start with ident (0x8) +pci1:5:4:0: failed to read VPD data. +pci1:5:4:0: no valid vpd ident found +pci1:5:4:1: VPD data does not start with ident (0x8) +pci1:5:4:1: failed to read VPD data. +pci1:5:4:1: no valid vpd ident found +WARNING: Current temperature (CPU A0 DIODE TEMP: 916.0 C) exceeds critical temperature (90.0 C); count=1 ----- Note here that the temperature measured seems obviously wrong once the fans spin up like crazy. And soon after this, count grows too high and the machine shuts down by itself. Looking at differences for all sysctls that mention “temp”: ----- dev.ds1631.0.%pnpinfo: name=temp-monitor compat=ds1631 -dev.ds1631.0.sensor.mlb_inlet_amb.temp: 27.5C +dev.ds1631.0.sensor.mlb_inlet_amb.temp: 29.6C dev.ds1775.0.%pnpinfo: name=temp-monitor compat=ds1775 -dev.ds1775.0.sensor.drive_bay.temp: 26.5C +dev.ds1775.0.sensor.drive_bay.temp: 29.5C dev.max6690.0.%pnpinfo: name=temp-monitor compat=max6690 -dev.max6690.0.sensor.backside.temp: 36.1C -dev.max6690.0.sensor.kodiak_diode.temp: 48.7C +dev.max6690.0.sensor.backside.temp: 42.2C +dev.max6690.0.sensor.kodiak_diode.temp: 55.2C dev.max6690.1.%pnpinfo: name=temp-monitor compat=max6690 -dev.max6690.1.sensor.tunnel.temp: 31.2C -dev.max6690.1.sensor.tunnel_heatsink.temp: 33.7C +dev.max6690.1.sensor.tunnel.temp: 34.7C +dev.max6690.1.sensor.tunnel_heatsink.temp: 39.0C -dev.smusat.0.cpu_a0_diode_temp: 34.2C -dev.smusat.0.cpu_a1_diode_temp: 35.0C kstat.zfs.misc.arcstats.arc_tempreserve: 0 ----- The fact that dev.smusat.* is gone from the “bad” kernel seems suspicious, but smusat0 is detected properly in both kernels according to dmesg… Any thoughts? I can try to bisect this as well, but there are 1500+ changes to sort through so this will take a while. Thanks! From: Justin Hibbits<mailto:jhibbits@FreeBSD.org> Sent: Friday, September 9, 2022 12:12 To: Julio Merino<mailto:julio@meroh.net> Cc: freebsd-ppc@freebsd.org<mailto:freebsd-ppc@freebsd.org> Subject: Re: PowerMac G5 crashes with "instruction storage interrupt" on recent 13 That seems bizarre. There haven't been any changes to the controller thread (powermac_thermal.c) in more than 7 years. Are there any problems with sensors? I tested the change I made back in 2015 on my dual core G5, with the intent that it would ramp the fans up sooner (non-linear), and back them down with hysteresis. So when there's load that raises the temperature significantly it will ramp the fans up as quickly as it can, hitting 100% fan long before it can reach maximum temperature. - Justin On Fri, 9 Sep 2022 19:01:06 +0000 Julio Merino <julio@meroh.net> wrote: > Ah, thanks for the workaround. I applied it on top of 9171b8068b92 > and the kernel was able to boot successfully – and it seems stable so > far. > > However, if I apply the hack on top of stable/13’s HEAD, there is > still the issue of the fans going crazy at the slightest increase in > CPU load but they do drop back down to quiet when the load subsumes. > (For example, a simple “git log” in /usr/src makes the fan spin up > within a couple of seconds and they stop soon after that.) Any ideas > on where this might come from? > > > From: Justin Hibbits<mailto:jhibbits@FreeBSD.org> > Sent: Friday, September 9, 2022 09:09 > To: Julio Merino<mailto:julio@meroh.net> > Cc: freebsd-ppc@freebsd.org<mailto:freebsd-ppc@freebsd.org> > Subject: Re: PowerMac G5 crashes with "instruction storage interrupt" > on recent 13 > > Hi Julio, > > 971cb62e0b23 is the likely culprit. Alfredo has a patch at > https://reviews.freebsd.org/D36234 that you can use until the problem > is solved. The alternative is you could build everything into the > kernel instead of using modules. > > The problem appears to be in either lld or the kernel linker. > > - Justin > > On Fri, 9 Sep 2022 16:00:33 +0000 > Julio Merino <julio@meroh.net> wrote: > > > Armed with a lot of patience, I was able to bisect where the crashes > > are coming from. They seem to be due to these three consecutive and > > related commits (because the first one broke the build and required > > two extra fixes for powerpc’s GENERIC64 to build): > > > > 9171b8068b92 cpuset: Fix the KASAN and KMSAN builds > > 01f281d0ee52 Fix the build after 47a57144 > > 971cb62e0b23 cpuset: Byte swap cpuset for compat32 on big endian > > architectures > > > > Any idea on how to look into these crashes further? > > > > Thank you! > > > > > > From: Julio Merino<mailto:julio@meroh.net> > > Sent: Sunday, July 31, 2022 07:45 > > To: freebsd-ppc@freebsd.org<mailto:freebsd-ppc@freebsd.org> > > Subject: PowerMac G5 crashes with "instruction storage interrupt" on > > recent 13 > > > > Hi all, > > > > I have a PowerMac G5 that’s running an old build of FreeBSD 13 > > stable (from around October of last year) that I’m trying to > > upgrade to recent stable/13. > > > > Booting into a new kernel brings two issues: the first is that the > > fans spin up to jet engine levels right before transferring control > > to userspace. An old patch I have locally to mitigate this (which I > > got from whichever outstanding bug exists for this in the bug > > tracker) doesn’t seem to work any longer. > > > > The second is that the kernel crashes (apparently) as soon as it > > tries to mount a ZFS pool during early stages of the boot process, > > but after successfully transferring control to userspace. Typing > > this from a photo of the crash so omitting details that I think > > aren’t going to be relevant here, like addresses, here is what I > > get: > > > > ---- > > Setting hostid: … > > ZFS filesystem version: 5 > > ZFS storage pool version: features support (500) > > > > Fatal kernel trap: > > > > Exception = 0x400 (instruction storage interrupt) > > … > > pid = 64, comm = zpool > > > > panic: instruction storage interrupt trap > > cpuid = 1 > > time = … > > KDB: stack backtrace: > > #0 kdb_backtrace > > #1 vpanic > > #2 panic > > #3 trap > > #4 powerpc_interrupt > > Uptime: 7s > > ---- > > > > Any thoughts about what I could look into? Any “recent” commits that > > you think may be at fault? > > > > Thanks! > > >