DELETE support in the VOP_STRATEGY(9)?

Warner Losh imp at bsdimp.com
Tue Dec 8 19:19:16 UTC 2015


> On Dec 8, 2015, at 12:06 PM, Konstantin Belousov <kostikbel at gmail.com> wrote:
> 
> On Tue, Dec 08, 2015 at 08:58:18PM +0200, Konstantin Belousov wrote:
>> On Tue, Dec 08, 2015 at 07:44:38PM +0100, Dag-Erling Sm??rgrav wrote:
>>> Warner Losh <imp at bsdimp.com> writes:
>>>> Dag-Erling Sm??rgrav <des at des.no> writes:
>>>>> But the filesystem does not know whether the underlying storage is
>>>>> electromechanical or solid-state, nor does it know whether the user
>>>>> cares much about seek times (unless we introduce the heuristic
>>>>> "avoid creating holes unless the file already has them, in which
>>>>> case the userland probably does not care").
>>>> Actually, the filesystem does know. Or has some knowledge of what
>>>> is supported and what isn't. BIO_DELETE support is a strong indicator
>>>> of a flash or other log-type system.
>>> 
>>> The filesystem can ask the layer below if BIO_DELETE is supported, but
>>> should not assume anything about what it means.  For instance, I could
>>> write a gnop-like module that translates BIO_DELETE into an all-zeroes
>>> BIO_WRITE and passes everything else unmodified.  It would provide a
>>> stronger guarantee than, say, SATA TRIM but would also have a completely
>>> different performance profile (even on SSDs, since it would do its work
>>> synchronously whereas TRIM works asynchronously).
>> I again agree.  This is how UFS issues TRIM.  When the data block is freed
>> and there are no dandling pointers in the inode copy on disk pointing to
>> the block, BIO_DELETE is issued if volume reports it.  Everything else
>> is up to the geom stack and driver.
> I am sorry for the followup mail, but I probably have to explain more.
> The freed block, for which BIO_DELETE is issued, is not marked as free
> in the bitmap, until the BIO_DELETE completion is reported. In other
> words, we do not reuse the freed block while TRIM command is possibly
> executed.

Since the BIO_DELETE queue up in the storage layer, and we make no requirement
on ordering[*] in the storage layer, this is prudent. A BIO_WRITE could overtake the
BIO_DELETE, which would be bad.

Warner

[*] Well, apart from BIO_ORDERED which isn’t used for BIO_DELETE requests
and generally is only needed in FreeBSD to ensure that writes are flushed out
(with BIO_FLUSH) in a particular order for power-fail recovery.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20151208/88e1c50b/attachment.sig>


More information about the freebsd-hackers mailing list