9.2 ixgbe tx queue hang (packets that exceed 65535bytes in length)
Rick Macklem
rmacklem at uoguelph.ca
Fri Mar 21 22:22:08 UTC 2014
Christopher Forgeron wrote:
> (Pardon me, for some reason my gmail is sending on my cut-n-pastes if
> I cr
> down too fast)
>
> First set of logs:
>
> Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116
Ok, so this isn't a TSO segment then, unless I don't understand how
the csum flags are used, which is quite possible.
Assuming that you printed this out in decimal:
4116->0x1014
Looking in mbuf.h, 0x1014 is
CSUM_SCTP_VALID | CSUM_FRAGMENT | CSUM_UDP
alternately, if 4116 is hex, then it is:
CSUM_TCP_IPV6 | CSUM_IP_CHECKED | CSUM_FRAGMENT | CSUM_UDP
either way, it doesn't appear to be a TCP TSO?
(But you said that disabling TSO fixed the problem, so colour me
confused by this.;-)
Sorry, but my rusty networking is confused by this, so maybe someone
else can explain it? (I don't think any packet handed to the net interface
should exceed 65535. Am I right?)
Anyhow, all I can say is that I think these mbuf chains should fail with EFBIG,
since they are too big. I have no idea where they come from and I don't
know why this would lead to exhaustion of the transmit descriptor entries,
which seems to be when things get really wedged.
(From what little I can see in the driver sources, these transmit descriptor
entries should be released via interrupts, but I've just glanced at it.)
Sorry, but I think this will need someone conversant with the networking side
to figure out, rick
> Mar 21 11:07:00 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542
> Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116
> Mar 21 11:07:00 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542
> Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116
> Mar 21 11:07:00 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542
> Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116
> Mar 21 11:07:00 SAN0 kernel: after mbcnt=33 pklen=65542 actl=65542
> Mar 21 11:07:00 SAN0 kernel: before pklen=65542 actl=65542 csum=4116
>
> Here's a few later on.
>
> Mar 21 11:10:09 SAN0 kernel: before pklen=65538 actl=65538 csum=4116
> Mar 21 11:10:09 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 21 11:10:09 SAN0 kernel: before pklen=65538 actl=65538 csum=4116
> Mar 21 11:10:09 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 21 11:10:09 SAN0 kernel: before pklen=65538 actl=65538 csum=4116
> Mar 21 11:10:09 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 21 11:10:09 SAN0 kernel: before pklen=65538 actl=65538 csum=4116
> Mar 21 11:10:09 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
>
> Mar 21 11:23:00 SAN0 kernel: after mbcnt=33 pklen=65546 actl=65546
> Mar 21 11:23:01 SAN0 kernel: before pklen=65546 actl=65546 csum=4116
> Mar 21 11:23:01 SAN0 kernel: after mbcnt=33 pklen=65546 actl=65546
> Mar 21 11:23:03 SAN0 kernel: before pklen=65546 actl=65546 csum=4116
> Mar 21 11:23:03 SAN0 kernel: after mbcnt=33 pklen=65546 actl=65546
> Mar 21 11:23:04 SAN0 kernel: before pklen=65546 actl=65546 csum=4116
> Mar 21 11:23:04 SAN0 kernel: after mbcnt=33 pklen=65546 actl=65546
>
> Mar 21 11:41:25 SAN0 kernel: before pklen=65538 actl=65538 csum=4116
> Mar 21 11:41:25 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 21 11:41:25 SAN0 kernel: before pklen=65538 actl=65538 csum=4116
> Mar 21 11:41:25 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 21 11:41:25 SAN0 kernel: before pklen=65538 actl=65538 csum=4116
> Mar 21 11:41:25 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 21 11:41:25 SAN0 kernel: before pklen=65538 actl=65538 csum=4116
> Mar 21 11:41:25 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 21 11:41:26 SAN0 kernel: before pklen=65538 actl=65538 csum=4116
> Mar 21 11:41:26 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
> Mar 21 11:41:26 SAN0 kernel: before pklen=65538 actl=65538 csum=4116
> Mar 21 11:41:26 SAN0 kernel: after mbcnt=33 pklen=65538 actl=65538
>
> To be clear, I changed tp->t_tsomax to IP_MAXPACKET at ~ 777 in
> sys/netinet/tcp_output.c like so:
>
> if (len > IP_MAXPACKET - hdrlen) {
> len = IP_MAXPACKET - hdrlen;
> sendalot = 1;
> }
>
> I notice there is more that is different between 9.1 and 10 for this
> file:
> http://fxr.watson.org/fxr/diff/netinet/tcp_output.c?v=FREEBSD10;diffval=FREEBSD91;diffvar=v
>
> I'm going to attempt inserting a 9.1 tcp_output.c and see if that
> makes any
> difference.
>
> Otherwise, I wait further ideas from the list.
>
> Thanks.
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to
> "freebsd-net-unsubscribe at freebsd.org"
>
More information about the freebsd-net
mailing list