netmap/vale periodic deadlock
Harry Schmalzbauer
freebsd at omnilan.de
Wed Nov 22 09:19:02 UTC 2017
Bezüglich Harry Schmalzbauer's Nachricht vom 22.11.2017 09:39 (localtime):
> Bezüglich Vincenzo Maffione's Nachricht vom 22.11.2017 09:04 (localtime):
>>
>> 2017-11-21 21:48 GMT+01:00 Harry Schmalzbauer <freebsd at omnilan.de
>> <mailto:freebsd at omnilan.de>>:
>>
>> Bezüglich Vincenzo Maffione's Nachricht vom 21.11.2017 09:39
>> (localtime):
>> …
>> >
>> > If this is the case, although you are allowed to do that, I don't think
>> > it's a convenient way to use netmap.
>> > Since VLAN interfaces like vlan0 do not have (and cannot have) native
>> > netmap support, you are falling back to emulated netmap adapters (which
>> > are probably buggy on FreeBSD, specially when combined with VALE).
>> > Apart from bugs I think that with this setup you can't get decent
>> > performance that would justify using netmap rather than the standard
>> > kernel bridge and TAP devices.
>>
>> Hello,
>>
>> lockup happened earlier than expected.
>> This time 'vale-ctl' still reported (-l) the configuration.
>> One guest, using if_vtnet(4)-virtio-net#vale2:korso, showed:
>> dmz: watchdog timeout on queue 0
>> (dmz is the renamed if_vtnet(4))
>>
>> I could attach tcpdump to the uplink interface and also to all vlan
>> children.
>> Complete silence everywhere. So it seems the nic stopped processing
>> anything.
>>
>> Do you think that symptom could be caused by my special vale
>> integration, so that bugs in netmap emulation could crash the NIC?
>> Or is it unlikely that this is related.
>>
>> I hadn't prepared a debug kernel for the host, so the machine rebooted
>> without again.
>> I think I'll have to start with replacing vale first, to narrow down
>> possible causes. Today I was lucky, the lockup happend after business
>> hours, but I won't rely on that.
>> At least I know if I really need to look for a debug netmap kernel, or
>> possibly there's something else...
>>
>> Thanks,
>>
>> -harry
>>
>>
>>
>> I can't really say anything without a stack trace or meaningful logs.
>> There is a thing that you may do to see if the bug comes out of a bad
>> interaction between
>> emulated netmap and VALE.
>> Instead of attaching the vlan interfaces to VALE you can connect VALE to
>> the vlan interface
>> through the "bridge" program. In this way nothing changes from the
>> functional point of view,
>> but you are not attaching anymore the VLAN interface to VALE (and you
>> are using an additional process).
>>
>> So instead of
>>
>> # vale-ctl vale0:vlan0
>>
>> you would have
>>
>> # bridge netmap:vlan0 vale0:vv # "vv" can be anything
> Hello Vincenzo,
>
> thank you very much for that interesting hint.
> I prepared a netgraph setup yesterday evening, but I'll try your
> suggestion first. Unfortunately I don't have time to prepare a debug
Since this doesn't need a reboot and I'm in adventure mood, I just tried
it at runtime.
Unfortunately I can't find bridge documentation besides the source code.
It doesn't detach from terminal here:
bridge built Oct 8 2017
12:59:57
060.359974 main [244] ------- zerocopy NOT
supported
060.359987 main [251] Wait 4 secs for link to come up...
064.365872 main [255] Ready to go, nic1_egn 0x0/1 <-> vale4:nic1egn
0x0/1.
068.084364 main [306] poll timeout [0] ev 1 0 rx 0 at 33 tx 1022, [1] ev 1
0 rx 0 at 34 tx
1023
072.565559 main [306] poll timeout [0] ev 1 0 rx 0 at 34 tx 1022, [1] ev 1
0 rx 0 at 35 tx 1023
…
In general, things are working.
Is bridge staing in the foreground by design?
Thanks,
-harry
More information about the freebsd-net
mailing list