Machine lockup - WAS: Re: drmn0: error: GPU lockup CP stall for more than 10000msec

Tue Sep 16 20:53:08 UTC 2014

Since starting to use the DRI driver I am having problems with my 
machine
locking up.  Remote ssh sessions die, keyboard dead, screen blank.

I am asking for pointers on how to try and debug this.  My system is ZFS
on boot so I created a swap partition on USB /dev/da0s1b and added

dumpdev="/dev/da0p1"

to rc.conf.  After rebooting into single user mode and mounting zfs
partitions "zfs mount -a" I tried

#savecore /var/crash /dev/da0p1 but savecore complains that no core 
files
were found. Is kernel dump to USB swap not supported?

I'm running 10.1-BETA1 with these options added.

options         INVARIANTS
options         INVARIANT_SUPPORT
options         DEBUG_VFS_LOCKS

I have not been in front of the box during failure so I am not sure if
it exhibited the behavior shown below or not.

I have used this box for several years using VESA drivers without issue.

I'm posting here first since this seems to related to a DRI issue.

Thanks.

--mikej

On 2014-09-12 14:02, Michael Jung wrote:
> This has happened twice so I thought I would report it.  I believe all
> the required info is available in the links below.
> 
> X.Org X Server 1.12.4 / 10.1-BETA1 #0 r271460
> 
> I was simply building ports when X11 puked, the console flipped back
> to VT and tried to fire up X11 again.  This happened over and over
> until reboot. I did not try unloading the kernel modules and reloading
> them. Please advise what else to do or to collect should this happen
> again.
> 
> Lastly, I have been using the VESA driver for a long time on this
> hardware.  I just starting using the new ATI driver and VT console.
> 
> Thanks.
> 
> http://216.26.158.189/x11/devinfo.txt
> http://216.26.158.189/x11/dmesg.fail
> http://216.26.158.189/x11/pciconf.txt
> http://216.26.158.189/x11/pkg.txt
> xorg.conf auto generated
> 
> 
> rmn0: error: GPU lockup CP stall for more than 10000msec
> drmn0: warning: GPU lockup (waiting for 0x000000000000d969)
> drmn0: error: failed to get a new IB (-11)
> error: [drm:pid5702:radeon_cs_ib_chunk] *ERROR* Failed to get ib !
> drmn0: info: Saved 1591 dwords of commands on ring 0.
> drmn0: info: GPU softreset: 0x00000003
> drmn0: info:   GRBM_STATUS               = 0xA0003828
> drmn0: info:   GRBM_STATUS_SE0           = 0x00000007
> drmn0: info:   GRBM_STATUS_SE1           = 0x00000007
> drmn0: info:   SRBM_STATUS               = 0x20000040
> drmn0: info:   R_008674_CP_STALLED_STAT1 = 0x00000000
> drmn0: info:   R_008678_CP_STALLED_STAT2 = 0x00010000
> drmn0: info:   R_00867C_CP_BUSY_STAT     = 0x00020106
> drmn0: info:   R_008680_CP_STAT          = 0x80038647
> drmn0: info:   GRBM_SOFT_RESET=0x00007F6B
> drmn0: info:   GRBM_STATUS               = 0x00003828
> drmn0: info:   GRBM_STATUS_SE0           = 0x00000007
> drmn0: info:   GRBM_STATUS_SE1           = 0x00000007
> drmn0: info:   SRBM_STATUS               = 0x20000040
> drmn0: info:   R_008674_CP_STALLED_STAT1 = 0x00000000
> drmn0: info:   R_008678_CP_STALLED_STAT2 = 0x00000000
> drmn0: info:   R_00867C_CP_BUSY_STAT     = 0x00000000
> drmn0: info:   R_008680_CP_STAT          = 0x00000000
> drmn0: info: GPU reset succeeded, trying to resume
> info: [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
> drmn0: info: WB enabled
> drmn0: info: fence driver on ring 0 use gpu addr 0x0000000020000c00
> and cpu addr 0x0xfffff8018adfec00
> drmn0: info: fence driver on ring 3 use gpu addr 0x0000000020000c0c
> and cpu addr 0x0xfffff8018adfec0c
> info: [drm] ring test on 0 succeeded in 2 usecs
> info: [drm] ring test on 3 succeeded in 1 usecs
> drmn0: error: GPU lockup CP stall for more than 10000msec