Panic on BETA1 in the ZFS subsystem

Fri Jul 22 13:39:05 UTC 2016

On 21/07/2016 13:52, Andriy Gapon wrote:
> On 21/07/2016 15:25, Karl Denninger wrote:
>> The crash occurred during a backup script operating, which is (roughly)
>> the following:
>>
>> zpool import -N backup (mount the pool to copy to)
>>
>> iterate over a list of zfs filesystems and...
>>
>> zfs rename fs at zfs-base fs at zfs-old
>> zfs snapshot fs at zfs-base
>> zfs send -RI fs at zfs-old fs at zfs-base | zfs receive -Fudv backup
>> zfs destroy -vr fs at zfs-old
>>
>> The first filesystem to be done is the rootfs, that is when it panic'd,
>> and from the traceback it appears that the Zio's in there are from the
>> backup volume, so the answer to your question is "yes".
> I think that what happened here was that a quite large number of TRIM
> requests was queued by ZFS before it had a chance to learn that the
> target vdev in the backup pool did not support TRIM.  So, when the the
> first request failed with ENOTSUP the vdev was marked as not supporting
> TRIM.  After that all subsequent requests were failed without sending
> them down the storage stack.  But the way it is done means that all the
> requests were processed by the nested zio_execute() calls on the same
> stack.  And that lead to the stack overflow.
>
> Steve, do you think that this is a correct description of what happened?
>
> The state of the pools that you described below probably contributed to
> the avalanche of TRIMs that caused the problem.
>
Yes does indeed sound like what happened to me.

     Regards
     Steve