Understanding the rationale behind dropping of "block devices"
Konstantin Belousov
kostikbel at gmail.com
Mon Jan 16 11:00:16 UTC 2017
On Mon, Jan 16, 2017 at 05:20:25PM +0800, Julian Elischer wrote:
> On 16/01/2017 4:49 PM, Aijaz Baig wrote:
> > Oh yes I was actually running an old release inside a VM and yes I had
> > changed the device names myself while jotting down notes (to give it a more
> > descriptive name like what the OSX does). So now I've checked it on a
> > recent release and yes there is indeed no block device.
> >
> > root at bsd-client:/dev # gpart show
> > => 34 83886013 da0 GPT (40G)
> > 34 1024 1 freebsd-boot (512K)
> > 1058 58719232 2 freebsd-ufs (28G)
> > 58720290 3145728 3 freebsd-swap (1.5G)
> > 61866018 22020029 - free - (10G)
> >
> > root at bsd-client:/dev # ls -lrt da*
> > crw-r----- 1 root operator 0x4d Dec 19 17:49 da0p1
> > crw-r----- 1 root operator 0x4b Dec 19 17:49 da0
> > crw-r----- 1 root operator 0x4f Dec 19 23:19 da0p3
> > crw-r----- 1 root operator 0x4e Dec 19 23:19 da0p2
> >
> > So this shows that I have a single SATA or SAS drive and there are
> > apparently 3 partitions ( or is it four?? Why does it show unused space
> > when I had used the entire disk?)
> >
> > Nevertheless my question still holds. What does 'removing support for block
> > device' mean in this context? Was what I mentioned earlier with regards to
> > my understanding correct? Viz. all disk devices now have a character (or
> > raw) interface and are no longer served via the "page cache" but rather the
> > "buffer cache". Does that mean all disk accesses are now direct by passing
> > the file system??
>
> Basically, FreeBSD never really buffered/cached by device.
>
> Buffering and caching is done by vnode in the filesystem.
> We have no device-based block cache. If you want file X at offset Y,
> then we can satisfy that from cache.
> VM objects map closely to vnode objects so the VM system IS the file
> system buffer cache.
This is not true.
We do have buffer cache of the blocks read through the device (special)
vnode. This is how, typically, the metadata for filesystems which are
clients of the buffer cache, is handled, i.e. UFS msdosfs cd9600 etc.
It is up to the filesystem to not create aliased cached copies of the
blocks both in the device vnode buffer list and in the filesystem vnode.
In fact, sometimes filesystems, e.g. UFS, consciously break this rule
and read blocks of the user vnode through the disk cache. For instance,
this happens for the SU truncation of the indirect blocks.
> If you want device M, at offset N we will fetch it for you from the
> device, DMA'd directly into your address space,
> but there is no cached copy.
> Having said that, it would be trivial to add a 'caching' geom layer to
> the system but that has never been needed.
The useful interpretation of the claim that FreeBSD does not cache
disk blocks is that the cache is not accessible over the user-initiated
i/o (read(2) and write(2)) through the opened devfs nodes. If a program
issues such request, it indeed goes directly to/from disk driver, which
is supplied a kernel buffer formed by remapped user pages. Note that
if this device was or is mounted and filesystem kept some metadata in
the buffer cache, then the devfs i/o would make the cache inconsistent.
> The added complexity of carrying around two alternate interfaces to
> the same devices was judged by those who did the work to be not worth
> the small gain available to the very few people who used raw devices.
> Interestingly, since that time ZFS has implemented a block-layer cache
> for itself which is of course not integrated with the non-existing
> block level cache in the system :-).
We do carry two interfaces in the cdev drivers, which are lumped into
one. In particular, it is not easy to implement mapping of the block
devices exactly because the interfaces are mixed. If cdev disk device is
mapped, VM would try to use cdevsw d_mmap or later mapping interfaces to
handle user page faults, which is incorrect for the purpose of the disk
block mapping.
More information about the freebsd-scsi
mailing list