[zfs] filesystem reads hanging
Reshad Patuck
reshadpatuck1 at gmail.com
Tue Oct 1 05:35:12 UTC 2019
Hi Warner,
I will do a scrub the moment I reboot the box.
As mentioned, running the zpool scrub command itself hangs (the command
does not return, and I can not kill it), further I am not able to see the
scrub running in the zpool status for zroot.
Is there any other way I can check the disk/hardware? (the pool is running
on a single SSD)
I have no logs that look like disk errors to me in /var/log/all.log and
/var/log/messages.
Thanks,
Reshad
On Tue, Oct 1, 2019 at 10:51 AM Warner Losh <imp at bsdimp.com> wrote:
>
>
> On Mon, Sep 30, 2019, 10:56 PM Reshad Patuck <reshadpatuck1 at gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a FreeBSD 12.0-RELEASE-p9 system running ZFS.
>> The system runs an application that uses postgres, and python (among other
>> services).
>>
>> I have noticed that python suddenly is not able to connect to postgres.
>> When I try to investigate further, certain files on disk can not be read.
>> The commands `cat` and `ls -l` hang (no output and I can not ctrl-c or
>> kill
>> -9 them), ps -aux shows them in a D+ state.
>> On killing the SSH session these processes continue running in orphans, I
>> am not able to kill them.
>>
>> Someone on IRC suggested running a zfs scrub to check for data corruption,
>> but running `zpool scrub zroot` has the same effect.
>> The command does not return, ctrl-c does not kill it and `zpool scrub -s
>> zroot` says "cannot cancel scrubbing zroot: there is no active scrub".
>>
>> This has happened in the past 1 month to two of my production servers and
>> since the application was critical they were rebooted and the boxes
>> function as normal after the reboot.
>> Files that were not cat-able on the production servers were working fine
>> and a zfs scrub worked fine to show 0 errors and 0 fixes.
>> One of these boxes needed a hard reboot as it got stuck in the shutting
>> down stage of a soft reboot.
>>
>> I am not sure where to start debugging this or if there are any ways to
>> get
>> metrics on a box stuck in this state.
>> Please let me know if you would like me to fetch any metrics or run and
>> commands, etc. for you.
>> Any help would be much appreciated.
>>
>
> Step 1 should be to make sure there are no disk errors... the successful
> scrub suggests not, but it doesn't hurt to rule out hardware...
>
> Warner
>
> Best regards,
>>
>> Reshad
>> _______________________________________________
>> freebsd-fs at freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>>
>
More information about the freebsd-fs
mailing list