OpenZFS/FreeBSD only: potential space leak with async_destroy

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Mon, 23 Sep 2024 10:40:27 UTC
This is a so far theoretical only concern based on a real problem that I 
encountered at work.

It seems that only on FreeBSD ERESTART == TRAVERSE_VISIT_NO_CHILDREN == -1.
 From a quick search, Linux and illumos use positive numbers for ERESTART.

The potential problem is that dsl_process_async_destroys uses bptree_iterate to 
traverse blocks of a destroyed dataset.  bptree_iterate internally uses 
traverse_dataset_destroyed and all the common traversal machinery.
If dsl_scan_free_block_cb decides that the async destroy activity should stop 
(pause) in the current sync pass, then it would return ERESTART.

traverse_dnode can misinterpret that return value as TRAVERSE_VISIT_NO_CHILDREN 
and stop visiting data blocks while continuing the traversal.

As a result:
1. bptree_iterate may continue traversing despite dsl_scan_async_block_should_pause;
2. bptree_iterate may believe that it visited data blocks when, in fact, they 
were skipped.

A test would require destroying a large enough dataset (on a throwaway pool), 
waiting for feature@async_destroy to switch from active to enabled and then
checking that the pool's freeing property is not zero.

What do you think?

-- 
Andriy Gapon