ZFS related hang with FreeBSD 9.2

Rod Taylor rod.taylor at gmail.com
Sat Dec 14 03:14:50 UTC 2013


On Fri, Dec 13, 2013 at 7:55 PM, Steven Hartland <killing at multiplay.co.uk>wrote:

> Are you doing any snapshot sends as well as interacting with
> snapshots such as listing files in them via the .zfs?
>

I didn't take much time to debug it as the snapshots created by zfSnap were
for local backups only. Off-site backups are a simple pg_dump. Snapshots
were unmounted and untouched by me, nor do they show up in zfs list by
default. They did not get copied/imported to any other machines. No clones
were in use.

With zfSnap periodics enabled with the following configuration, the machine
spontaneously reboots about once a week. 9.0 was much worse, I could push
it over with simple heavy IO such a query performing a sequential table
scan in PostgreSQL on a 40GB table. 9.2 only seemed to go down during
periodic runs. I've not been able to push it over at any other time.

Anyway, /var/crash remains empty after a reboot (dumpdev="AUTO").

I *think* the problem is related to creating snapshots during high load.
Though the problem is significantly reduced if I disabled deletes. I've
been unable to manually trigger a crash on 9.2 using zfSnap commands; but
they still occur with regularity during periodics and spontaneously during
the day.

ZFS v28 under 9.0/9.1, and all feature flags enabled under 9.2.

Nothing is logged.


Relevant snippet from periodic.conf:

# Filesystem snapshots
daily_zfsnap_enable="YES"
daily_zfsnap_recursive_fs="tank0"
daily_zfsnap_flags="-s -S"
daily_zfsnap_ttl=2m

monthly_zfsnap_enable="YES"
monthly_zfsnap_recursive_fs="tank0"
monthly_zfsnap_flags="-s -S"
monthly_zfsnap_ttl=6m

reboot_zfsnap_enable="YES"
reboot_zfsnap_flags="-s -S"
reboot_zfsnap_recursive_fs="tank0"

weekly_zfsnap_delete_enable="YES"
weekly_zfsnap_delete_flags="-s -S"
weekly_zfsnap_recursive_fs="tank0"



> If so make sure you have the following patch applied as
> that can cause a deadlock between these two operations
> http://svnweb.freebsd.org/changeset/base/258595
>

I have not tried this patch but can over the holidays.



----- Original Message ----- From: "Rod Taylor" <rod.taylor at gmail.com>
> To: "Ryan Baldwin" <ryan.baldwin at nexusalpha.com>
> Cc: <freebsd-fs at freebsd.org>
> Sent: Friday, December 13, 2013 11:21 PM
> Subject: Re: ZFS related hang with FreeBSD 9.2
>
>
>
>  Are you using snapshots?
>>
>> I've found ZFS Snapshots on 9.0, 9.1, and 9.2 regularly crash the system.
>> Delete the snapshots and don't create any new ones and suddenly it's
>> stable
>> for months.
>>
>>
>>
>> On Fri, Dec 13, 2013 at 12:14 AM, Ryan Baldwin
>> <ryan.baldwin at nexusalpha.com>wrote:
>>
>>  Hi,
>>>
>>> We have a server based on FreeBSD 9.2 which hangs at times on a daily
>>> basis. The longest uptime we have achieved is 5 days conversely it has
>>> stopped daily several days in a row.
>>>
>>> When this occurs it appears there are two proceses stuck in 'tx->tx'
>>> state. In the top output shown these are snapshot-manager processes which
>>> create and destroy snapshots generally and sometime rollback filesystems
>>> to
>>> snapshots. When the lockup occurs other processes which try to access the
>>> file system can seem to end up stuck in state 'rrl->r'. The reboot
>>> command
>>> that was issued to try and reboot the server has ended up stuck in this
>>> state as can be seen.
>>>
>>> The server is not under particularly heavy load.
>>>
>>> It has remained in this state for hours. The 'deadman handler'? does not
>>> appear to restart the system. Once this has occurred there is no further
>>> disk activity.
>>>
>>> We did not experience this problem at all previously using 9.1 although
>>> we
>>> had less snapshot-manager processes before. We have built this server
>>> against 9.1 again now but it has only been going one day so far.
>>>
>>> We can try and reproduce this problem again on 9.2 if by doing so we can
>>> gather any additional information that could help resolve this problem.
>>> Please let me know what other information would be helpful.
>>>
>>> The hardware is a Dell R420 with Perc H310 raid controller in JBOD mode
>>> with the pool mirrored on two SAS disks.
>>>
>>> Thanks
>>>
>>> top and procstat output follow: ...
>>>
>>


More information about the freebsd-fs mailing list