[Bug 253954] kernel: g_access(958): provider da8 has error 6 set
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 13 Jun 2022 20:20:06 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253954 jnaughto@ee.ryerson.ca changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jnaughto@ee.ryerson.ca --- Comment #4 from jnaughto@ee.ryerson.ca --- Any update on this bug. I just experienced the exact same issue. I have 8 disks (all SATA) connected to a Freebsd 12.3 system. The ZFS pool is setup as a raidz3. Got in today found one drive was "REMOVED" # zpool status pool pool: pool state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: scrub repaired 0 in 0 days 02:32:26 with 0 errors on Sat Jun 11 05:32:26 2022 config: NAME STATE READ WRITE CKSUM pool DEGRADED 0 0 0 raidz3-0 DEGRADED 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 8936423309855741075 REMOVED 0 0 0 was /dev/ada5 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 I assumed that the drive had died and pulled it. I put a new drive in place and attempted to replace it: # zpool replace pool 8936423309855741075 ada5 cannot replace 8936423309855741075 with ada5: no such pool or dataset It seems that the old drive somehow is still remembered by the system. I dug through the logs to find the following occurring when the new drive is inserted into the system: Jun 13 13:03:15 server kernel: cam_periph_alloc: attempt to re-allocate valid device ada5 rejected flags 0x118 refcount 1 Jun 13 13:03:15 server kernel: adaasync: Unable to attach to new device due to status 0x6 Jun 13 13:04:23 server kernel: g_access(961): provider ada5 has error 6 set Did a reboot without the new drive in place. On reboot the output of the pool did look somewhat different: # zpool status pool pool: pool state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-4J scan: scrub repaired 0 in 0 days 02:32:26 with 0 errors on Sat Jun 11 05:32:26 2022 config: NAME STATE READ WRITE CKSUM pool DEGRADED 0 0 0 raidz3-0 DEGRADED 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 8936423309855741075 FAULTED 0 0 0 was /dev/ada5 ada5 ONLINE 0 0 0 diskid/DISK-Z1W4HPXX ONLINE 0 0 0 errors: No known data errors I assumed this was due to the fact that there was one less drive attached and the system assigned new adaX values to each drive. At this point when I inserted the new drive the new drive appeared as an ada9. So I re-issued the zpool replace command but now with ada9. Though it did take about 3mins before the zpool replace command responded back (which really concerned me). Yet the server has quite a few users accessing the filesystem so I thought as long as the new drive was re-silvering I would be fine.... I do a weekly scrub of the pool and I believe the error crept up after the scub. at 11am today the logs showed the following response: Jun 13 11:29:15 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 Jun 13 11:29:15 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout Jun 13 11:29:15 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): Retrying command, 0 more tries remain Jun 13 11:30:35 172.16.20.66 kernel: ahcich5: Timeout on slot 5 port 0 Jun 13 11:30:35 172.16.20.66 kernel: ahcich5: is 00000000 cs 00000060 ss 00000000 rs 00000060 tfd c0 serr 00000000 cmd 0004c517 Jun 13 11:30:35 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 Jun 13 11:30:35 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): CAM status: Command timeout Jun 13 11:30:35 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): Retrying command, 0 more tries remain Jun 13 11:31:08 172.16.20.66 kernel: ahcich5: AHCI reset: device not ready after 31000ms (tfd = 00000080) At 11:39 I believe the following log entries are of note: Jun 13 11:39:45 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): CAM status: Unconditionally Re-queue Request Jun 13 11:39:45 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): Error 5, Periph was invalidated Jun 13 11:39:45 172.16.20.66 ZFS[92964]: vdev state changed, pool_guid=$5100646062824685774 vdev_guid=$8936423309855741075 Jun 13 11:39:45 172.16.20.66 ZFS[92966]: vdev is removed, pool_guid=$5100646062824685774 vdev_guid=$8936423309855741075 Jun 13 11:39:46 172.16.20.66 kernel: g_access(961): provider ada5 has error 6 set Jun 13 11:39:47 reactor syslogd: last message repeated 1 times Jun 13 11:39:47 172.16.20.66 syslogd: last message repeated 1 times Jun 13 11:39:47 172.16.20.66 kernel: ZFS WARNING: Unable to attach to ada5. Any idea on what was the issue? -- You are receiving this mail because: You are the assignee for the bug.