ZFS v28 for 8.2-STABLE
Pierre Lamy
pierre at userid.org
Sat Apr 30 21:03:15 UTC 2011
On 4/29/2011 8:15 PM, Jeremy Chadwick wrote:
> On Fri, Apr 29, 2011 at 11:20:21PM +0300, Volodymyr Kostyrko wrote:
>> 28.04.2011 07:37, Ruslan Yakovlev wrote:
>>> Does actually patch exist for 8.2-STABLE ?
>>> I probe
>>> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20110317.patch.xz
>>>
>>> Building failed with:
>>> can't cd to /usr/src/cddl/usr.bin/zstreamdump
>>> Also sys/cddl/compat/opensolaris/sys/sysmacros.h failed to patch.
>>>
>>> Current FreeBSD 8.2-STABLE #35 Mon Apr 18 03:40:38 EEST 2011 i386
>>> periodically frozen on high load like backup by rsync or find -sx ...
>>> (from default cron tasks).
>> Well ZFSv28 should be very close to STABLE for now?
>>
>> http://lists.freebsd.org/pipermail/freebsd-current/2011-February/023152.html
> It's now a matter of opinion. The whole idea of ZFSv28 being committed
> to HEAD was to be tested. I haven't seen any indication of a progress
> report provided for anything on HEAD that pertains to ZFSv28, have you?
>
> Furthermore, the FreeBSD Quarterly Status Report just came out on 04/27
> for the months of January-March (almost a 2 month delay, sigh):
>
> 1737 04/27 10:58 Daniel Gerzo ( 41K) FreeBSD Status Report January-March, 2011
>
> http://www.freebsd.org/news/status/report-2011-01-2011-03.html
>
> Which states that ZFSv28 is "now available in CURRENT", which we've
> known for months:
>
> http://www.freebsd.org/news/status/report-2011-01-2011-03.html#ZFSv28-available-in-FreeBSD-9-CURRENT
>
> But again, no progress report, so nobody except those who follow
> HEAD/CURRENT know what the progress is. And that progress has not been
> relayed to any of the non-HEAD/CURRENT lists.
>
> I'm a total hard-ass about this stuff, and have been for years, because
> it all boils down to communication (or lack there-of). It seems very
> hasty to say "Yeah! MFC this!" when we (folks who only follow STABLE)
> have absolutely no idea if what's in CURRENT is actually broken in some
> way or if there are outstanding problems -- and if there are, what those
> are so users can be aware of them in advance.
>
Hello,
Here's a summary of my recent end-user work with ZFS on -current. I
recently was lucky enough to purchase 2 NAS systems, which consist of 2
cheap new PCs loaded with 6 HD, one is a simple gpt boot device 1x 1tb
and 5x 2tb data drives. The mobo has 6 sata connectors but I needed to
purchase an additional PCI-E sata adapter since the DVD also uses a sata
port. The system has 4gb memory and a new inexpensive quad core AMD CPU.
I've been running it (recent -current) for a couple of weeks with heavy
single-user use. 2.5tb/7.1tb.
The only problem I found, was that deleting a file-backed log device
from a degraded pool would immediately panic the system. I'm not running
stock -current so I didn't report it.
Resilvering seems absurdly slow, but since I won't be doing it much also
didn't care. My NAS is side by side redundant, so if resilvering takes
more than 2 days I would just replicate off of my other NAS.
Throughput without a log device was in the range of 30mb/sec (3% of my
1gb interface). Adding a file-backed log device on a UFS partition that
is used for boot, resulted in a 10x jump, saturating the SATA bus that I
was sending data from over the network. It spiked up to 30% of interface
throughput/max bus speed for disk, and did not vary much. This resolved
the issues I saw that a lot of other people have posted about on the
internet, about very spiky data transfers. I first used a 40mb/sec
throughput USB device as the log device, which showed a dramatic
smoothness in data transfer, but still had ~15 seconds where no data
would xfer, while it was flushed from USB to disk. After researching I
discovered that I could use a file backed log device and this fixed all
the problems about spiky data transfers.
Before that I had tuned the sysctl's as the poor out of the box settings
were giving me very slow speeds (in the range of 1% network throughput,
before log device). I played around with the vfs.zfs tunables but found
that I did not need to after I added the log device, and the out of the
box settings for that sysctl tree were just fine.
I had first set this up before CAM was added to -current as default, and
did not use labels. Due to troubleshooting some unrelated disk issues, I
ended up switching to CAM without problems, and subsequently labeled the
disks (recreated the zpool after the labeling). I am now using CAM and
AHCI without any issues.
Here are some personal notes about the tunables I set, I am sure they
are not all helpful. I didn't add them one by one, I simply mass changed
them and saw a positive result. Also noted are the commands I used and
current system status.
sysctl -w net.inet.tcp.sendspace=373760
sysctl -w net.inet.tcp.recvspace=373760
sysctl -w net.local.stream.sendspace=82320
sysctl -w net.local.stream.recvspace=82320
sysctl -w vfs.zfs.prefetch_disable=1
sysctl -w net.local.stream.recvspace=373760
sysctl -w net.local.stream.sendspace=373760
sysctl -w net.local.inflight=1
sysctl -w net.inet.tcp.ecn.enable=1
sysctl -w net.inet.flowtable.enable=0
sysctl -w net.raw.recvspace=373760
sysctl -w net.raw.sendspace=373760
sysctl -w net.inet.tcp.local_slowstart_flightsize=10
sysctl -a net.inet.tcp.delayed_ack=0
sysctl -w kern.maxvnodes=600000
sysctl -w net.local.dgram.recvspace=8192
sysctl -w net.local.dgram.maxdgram=8192
sysctl -w net.inet.tcp.slowstart_flightsize=10
sysctl -w net.inet.tcp.path_mtu_discovery=0
<root.wheel at zfs-slave> [/var/preserve/root] # glabel label g_ada0 /dev/ada0
<root.wheel at zfs-slave> [/var/preserve/root] # glabel label g_ada1 /dev/ada1
<root.wheel at zfs-slave> [/var/preserve/root] # glabel label g_ada3 /dev/ada3
<root.wheel at zfs-slave> [/var/preserve/root] # glabel label g_ada4 /dev/ada4
<root.wheel at zfs-slave> [/var/preserve/root] # glabel label g_ada5 /dev/ada5
Labels so that later I will be able to more easily identify disks. My
mobo has a single ata bus slave port for SATA. That disk would
"disappear" from the box. Moving the drive to a master sata port
resolved the issue (? very odd).
gnop create -S 4096 /dev/label/g_ada0
mkdir /var/preserve/zfs
dd if=/dev/zero of=/var/preserve/zfs/log_device bs=1m count=5000
zpool create -f tank raidz /dev/label/g_ada0.nop /dev/label/g_ada1
/dev/label/g_ada3 /dev/label/g_ada4 /dev/label/g_ada5 log
/var/preserve/zfs/log_device
The 4 above lines are to set the alignment to 4kb, to create a file
backed log device, and create the pool.
zfs set atime=off tank
I decided not to use dedup, because my files don't have a lot of dup.
They're mostly large media files, ISOs etc.
<root.wheel at zfs-slave> [/var/preserve/root] # zpool status
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
label/g_ada0 ONLINE 0 0 0
label/g_ada1 ONLINE 0 0 0
label/g_ada3 ONLINE 0 0 0
label/g_ada4 ONLINE 0 0 0
label/g_ada5 ONLINE 0 0 0
logs
/var/preserve/zfs/log_device ONLINE 0 0 0
errors: No known data errors
<root.wheel at zfs-slave> [/var/preserve/root] #
<root.wheel at zfs-slave> [/var/preserve/root] # df
Filesystem Size Used Avail Capacity Mounted on
/dev/gpt/pyros-a 9.7G 3.3G 5.6G 37% /
/dev/gpt/pyros-c 884G 6.1G 808G 1% /var
tank 7.1T 2.5T 4.6T 35% /tank
<root.wheel at zfs-slave> [/var/preserve/root] #
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada1 at ahcich2 bus 0 scbus3 target 0 lun 0
ada1: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada2 at ahcich3 bus 0 scbus4 target 0 lun 0
ada2: <ST31000520AS CC32> ATA-8 SATA 2.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada3 at ahcich4 bus 0 scbus5 target 0 lun 0
ada3: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada4 at ahcich5 bus 0 scbus6 target 0 lun 0
ada4: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada5 at ata1 bus 0 scbus8 target 0 lun 0
ada5: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
ada5: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
CPU: AMD Phenom(tm) II X4 920 Processor (2800.19-MHz K8-class CPU)
...
real memory = 4294967296 (4096 MB)
avail memory = 3840598016 (3662 MB)
ZFS filesystem version 5
ZFS storage pool version 28
Best practices:
Tune the sysctls related to buffer sizes / queue depth.
Label your disks before you build the zpool.
Use gnop to 4kb align the disks. Only one disk in the pool needs this
before you create it.
Use CAM.
*** USE A LOG DEVICE! ***
-Pierre
More information about the freebsd-fs
mailing list