mfi(4) IO performance regression, post 8.1
John Baldwin
jhb at freebsd.org
Fri Jun 15 17:18:23 UTC 2012
On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
> Hello FreeBSD folk,
>
> We're seeing what appears to be a storage performance regression as we
> try to move from 8.1 (i386) to 8.3. We looked at 8.2 also and it
> appears that the regression happened between 8.1 and 8.2.
>
> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs.
> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
> configuration, using UFS + geom_journal for filesystem.
>
> Postgresql performance, as seen via pgbench, dropped by approx 20%.
> This testing was done with our usual PAE-enabled kernels. We then went
> back to GENERIC kernels and did comparisons using "bonnie", results
> below. Following that is a kernel boot log.
>
> Notably, we're seeing this regression only with our RAID mfi(4) based
> systems. Notably, from looking at FreeBSD source changelogs it appears
> that the mfi(4) code has seen some changes since 8.1.
Between 8.1 and 8.2 mfi has not had any significant changes. The only changes
made to sys/dev/mfi were to add a new constant:
> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
Index: mfireg.h
===================================================================
--- mfireg.h (.../8.1/sys/dev/mfi) (revision 237134)
+++ mfireg.h (.../8.2/sys/dev/mfi) (revision 237134)
@@ -975,7 +975,9 @@
MFI_PD_STATE_OFFLINE = 0x10,
MFI_PD_STATE_FAILED = 0x11,
MFI_PD_STATE_REBUILD = 0x14,
- MFI_PD_STATE_ONLINE = 0x18
+ MFI_PD_STATE_ONLINE = 0x18,
+ MFI_PD_STATE_COPYBACK = 0x20,
+ MFI_PD_STATE_SYSTEM = 0x40
};
union mfi_ld_ref {
The difference in write performance must be due to something else. You
mentioned you are using UFS + gjournal. I think gjournal uses BIO_FLUSH, so I
wonder if this is related:
------------------------------------------------------------------------
r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines
MFC 212160:
Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.
The barrier semantics of bioq_insert_tail() were broken in two ways:
o In bioq_disksort(), an added bio could be inserted at the head of
the queue, even when a barrier was present, if the sort key for
the new entry was less than that of the last queued barrier bio.
o The last_offset used to generate the sort key for newly queued bios
did not stay at the position of the barrier until either the
barrier was de-queued, or a new barrier (which updates last_offset)
was queued. When a barrier is in effect, we know that the disk
will pass through the barrier position just before the
"blocked bios" are released, so using the barrier's offset for
last_offset is the optimal choice.
sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
o Update last_offset in bioq_insert_tail().
o Only update last_offset in bioq_remove() if the removed bio is
at the head of the queue (typically due to a call via
bioq_takefirst()) and no barrier is active.
o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
set prev to the barrier and cur to it's next element. Now that
last_offset is kept at the barrier position, this change isn't
strictly necessary, but since we have to take a decision branch
anyway, it does avoid one, no-op, loop iteration in the while
loop that immediately follows.
o In bioq_disksort(), bypass the normal sort for bios with the
BIO_ORDERED attribute and instead insert them into the queue
with bioq_insert_tail(). bioq_insert_tail() not only gives
the desired command order during insertion, but also provides
barrier semantics so that commands disksorted in the future
cannot pass the just enqueued transaction.
sys/sys/bio.h:
Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.
sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
Use an ordered command for SCSI/ATA-NCQ commands issued in
response to bios with the BIO_ORDERED flag set.
sys/cam/scsi/scsi_da.c
Use an ordered tag when issuing a synchronize cache command.
Wrap some lines to 80 columns.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
Mark bios with the BIO_FLUSH command as BIO_ORDERED.
Sponsored by: Spectra Logic Corporation
------------------------------------------------------------------------
Can you try perhaps commenting out the 'bp->bio_flags |= BIO_ORDERED' line
changed in geom_io.c in 8.2? That would be effectively reverting this
portion of the diff:
Index: geom_io.c
===================================================================
--- geom_io.c (.../8.1/sys/geom) (revision 237134)
+++ geom_io.c (.../8.2/sys/geom) (revision 237134)
@@ -265,6 +265,7 @@
g_trace(G_T_BIO, "bio_flush(%s)", cp->provider->name);
bp = g_alloc_bio();
bp->bio_cmd = BIO_FLUSH;
+ bp->bio_flags |= BIO_ORDERED;
bp->bio_done = NULL;
bp->bio_attribute = NULL;
bp->bio_offset = cp->provider->mediasize;
--
John Baldwin
More information about the freebsd-stable
mailing list