swaping ring slots between NIC ring and Host ring does not always success

Fri Nov 24 20:49:58 UTC 2017

Hi Vincenzo,

Thanks for your reply.

Let me clarify my problem.

I have a program, which is an extension of bridge.c

https://github.com/luigirizzo/netmap/blob/788f25dcc48dfec2e481573277b662968f690042/LINUX/ixgbe_netmap_linux.h#L377

On Wed, Nov 22, 2017 at 7:39 AM, Vincenzo Maffione <v.maffione at gmail.com>
wrote:

> Hi,
>
> 2017-11-21 7:51 GMT+01:00 Xiaoye Sun <Xiaoye.Sun at rice.edu>:
>
>> Hi,
>>
>> Recently I found another problem with netmap. I think this new problem
>> could be related to the problems in this threads so I just post the new
>> problem here.
>>
>> In my setup, I have a sender program having a netmap ring (a pair of
>> RX/TX ring) for the NIC and a ring for the host stack. The sender program
>> puts customized packets (each packet has a unique sequence number and the
>> sender sends the packet in a sequence number increasing order) to the NIC
>> TX ring directly and also forwards the packets from the host RX ring to
>> the
>> NIC TX ring using "zerocopy" by swapping the buffer indices.
>> However, the receiver sees duplicated customized packets. For example, in
>> the case where the ring size is 32 (32 slots in a ring) the order of the
>> sequence numbers the receiver see is 1,2,3,4,5,...,68,69,*70*
>> ,71,72,73,...,99,100,*70*,101,102,103,... . An interesting thing I found
>> is
>> that the "gaps" between these two duplicated packets (70 in the example)
>> are always a number very close to the ring size, 32 in this example. In my
>> experiment, I use a ring with 4096 slots and the gap is always more than
>> 4090 and close to 4096. I verified that this duplication happens due to
>> the
>> sender, not the receiver. Assuming my sender's implementation is correct,
>> then this duplication may happen in netmap and the NIC driver (ixgbe).
>>
>
> Netmap itself doesn't do any duplication nor takes a look at the packets.
> It just passes
> down ring->cur/ring->head to the ixgbe driver (after validation).
> The ixgbe driver datapath is bypassed and replaced with a netmap-enabled
> datapath (see https://github.com/luigirizzo/netmap/blob/master/LINUX/
> ixgbe_netmap_linux.h#L294-L461);
> no duplication should happen there as each netmap slot (1 TX packet) is
> used
> only once.
>
>>
>>
>> Thinking back to the original problem in this post, I think these problems
>> may be related. It seems to me that there could be multiple threads
>> pulling
>> the packets from the NIC TX ring (or the thread moved to other CPUs when
>> the problem occurs) and these threads may run on different cores so that
>> the outdated content in the buffer may be sent out when new content is
>> written to the buffer.
>>
>>
> There are no such threads pulling from the NIC TX ring. Your application
> directly
> puts new packets to be transmitted in the netmap buffers referenced in the
> netmap TX
> ring. When then you call NIOCTXSYNC or poll(), all the new TX buffers
> (e.g. all
> the ones from the previous value ring->head (included) to the new value of
> ring->head (excluded))
> are moved to the NIC TX ring. This happens in the context of your
> application thread,
> no worker threads are used. Then the NIC hardware starts the transmission.
>
>
>> I am wondering if there is a way to pin the NIC driver of the netmap
>> module
>> to a specific core. or is there a way to know the root of such problem?
>>
>
> The only threads are the ones of your application.
> Maybe your problem comes from concurrent accesses to the netmap TX ring
> from different threads? Only one thread at a given time should update a
> netmap
> TX/RX ring. Otherwise the behaviour is unspecified.
>
> Cheers,
>   Vincenzo
>
>
>>
>> Best,
>> Xiaoye
>>
>>
> --
> Vincenzo Maffione
>