bhyve ahcich0: Timeout on slot 0 port 0, , regression with stable/11->releng/11.1

Harry Schmalzbauer freebsd at omnilan.de
Fri Jun 30 19:17:56 UTC 2017


 Hello,

on releng/11.1 I noticed a severe performace degradation during file
unlinking in a FreeBSD guest. Host was running quiet recent stable/11
before.

On the host, the vm is started with
ahci,hd:/dev/adaN

The guest attaches:
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <BHYVE SATA DISK 001> ACS-2 ATA SATA 3.x device

The geust has very high Sys-load during unlinking (50-75%@2 cores).
Also, the host logs these errors:
ahcich0: Timeout on slot 0 port 0
ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr
00000000 cmd 0001cd17
ahcich0: ... waiting for slots fffb7ffe
ahcich0: Timeout on slot 10 port 0
ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr
00000000 cmd 0001cd17
ahcich0: ... waiting for slots fffb7bfe
ahcich0: Timeout on slot 14 port 0
ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr
00000000 cmd 0001cd17
ahcich0: ... waiting for slots fffb3bfe
ahcich0: Timeout on slot 17 port 0
…
ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr
00000000 cmd 0001c617
ahcich0: ... waiting for slots 00018000
ahcich0: Timeout on slot 15 port 0
ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr
00000000 cmd 0001c617
ahcich0: ... waiting for slots 00010000
ahcich0: Timeout on slot 16 port 0
ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr
00000000 cmd 0001c617
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 00 e8 30 40 04 00
00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 40 e8 30 40 04 00
00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 80 e8 30 40 04 00
00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
…
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 c0 ff 44 40 04 00
00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 00 00 45 40 04 00
00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 40 00 45 40 04 00
00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 80 00 45 40 04 00
00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01
00 00 00 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 c0 00 45 40 04 00
00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
…
And so on.

I always saw performance penalty using ahci instead of virtio-blk, most
likely due to TRIM support, but never noticed such a huge difference:
obj-tree deleting takes <1min with virtio-blk and usually took about 8
minutes with ahci on stable/11.
Now (releng/11.1) it takes >20min (not yet finished) and I get really
lots of these errors.

Can someone (mav?) interpret the command errors and tell if it could be
a new problem due to recent MFCs?
Will bisect stable/11 revisions to see where it starts if nobody has a
quick idea about the cause.

Thanks,

-harry


More information about the freebsd-scsi mailing list