Linux netmap memory allocation

Tue Jan 2 23:07:10 UTC 2018

Hi Vincenzo,

I am using poll(), and I am not specifying NETMAP_NO_TX_POLL, and have found that sometimes frames and sent only when the TX buffer is full, and sometimes they are not sent at all. They are never sent as expected on every invocation of poll(). If I run ioctl(NIOCTXSYNC) manually, everything works correctly. I assume I have simply missed something from my nmreq.

I don't think you have missed anything within nmreq.  I see that you are waiting for POLLIN only (and this is right in your router case), so poll() will actually invoke txsync on interface #i only when netmap intercepts an RX or TX interrupt on interface #i. This means that packets may stall for long time in the TX rings if you don't call ioctl(TXSYNC). The manual is not wrong, however. You can look at the apps/bridge/bridge.c example to understand where this "poll automatically calls txsync" thing is useful.
Thank you for the clarification. I have now altered my code to call TXSYNC after each iteration, but only if I have modified the TX ring for that interface. This seems to work perfectly. The patch can be seen at https://github.com/catphish/netmap-router/commit/2961ab16f14a8b2a2561c9d73f73857e523cc177

You also mentioned: "whether netmap calls or does not call txsync/rxsync on certain rings depends on the parameters passed to nm_open()". I do not use the nm_open helper method, but I am extremely interested to know what parameters would affect this bahaviour, as this would seem very relevant to my problem.

Yes, we do not normally use the low level interface (ioctl(REGIF)), because it's just simpler to use the nm_open() interface. Within the first parameter of nm_open() you can specify to open just one RX/TX rings couple, e.g. with "enp1f0s1-3". Then you usually want to mmap() just once (as you do in your program); with nm_open(), you do that with the NM_OPEN_NO_MMAP flag.
I did look at nm_open, and even read the source of nm_open to discover how to implement the shared memory, but (for no good reason) I preferred to set up the interface manually.

If you are interested or if it helps explain my question, my complete code (hopefully well commented but far from complete) can be found here: https://github.com/catphish/netmap-router/blob/58a9b957c19b0a012088c491bd58bc3161a56ff1/router.c

Specifically, if the ioctl call at line 92 is removed, the code does not work (packets are not transmitted, or are only transmitted when the buffer is full, which of these 2 behaviours seems to be random), however I would expect it to work because I do not specify NETMAP_NO_TX_POLL, and I would therefore hope that the poll() call on line 80 would have the same effect.

Yes, that depends on when netmap_poll() is called by the kernel, that depends on when something is ready for receive on the file descriptor.
Looking at your program, I think you need to call ioctl(TXSYNC), at least because you don't want to introduce artificial/unbounded latency. However, since these calls are expensive, you could use them only when necessary (e.g. when you nm_ring_space(txring) == 0 or when you actually forwarded some packets on txring.
Per the patch above I now call TXSYNC on an interface only after pushing a batch of packets to it and this seems to work perfectly, at least with a good balance between performance and latency. If nm_ring_space(txring) == 0 I just drop frames until the next batch. I don't TXSYNC part way through a batch, it hasn't yet seemed necessary, but I may need to look into this later.

I'm running this on a 6-core 2.8GHz Xeon with a 4-port i350-T4 NIC. I thought I'd just post some stats of the performance I observe using my code (excluding the routing table lookup as this isn't relevant to netmap). Not really looking for any advice here, just thought I'd share my results.

All examples are with 1.488Mpps (1 x 1Gbps) input and no packet loss observed:
1 thread - CPU usage = 100%, batch size = 4
2 thread - CPU usage = 54% (27% x 2), batch size = 12
4 thread - CPU usage = 98% (25% x 4), batch size = 8
6 thread - CPU usage = 124% (21% x 6), batch size = 8

And again with 2.976Mpps (2 x 1Gbps) input and no packet loss observed:
1 thread - CPU usage = 100%, batch size = 12
2 thread - CPU usage = 68% (34% x 2), batch size = 21
4 thread - CPU usage = 100% (25% x 4), batch size = 17
6 thread - CPU usage = 105% (18% x 6), batch size = 16

These results seem excellent and demonstrate that netmap is scaling as expected with both threads and packet volume. The higher thread count will be more beneficial when I am doing more processing on each packet.

I hope this all makes sense, and again, I hope I have simply missed something from the nmreq i pass to NIOCREGIF.

It is worth mentioning that with the exception of this problem / confusion, I am getting extremely good results from this code and netmap in general.

That's nice to hear :)
Your program looks simple enough that we could even add it to the examples (as an example of routing logic).
I'd be very happy to contribute to the documentation in any way that may be helpful. I have added a permissive licence to my Github repository just in case my code of of use to anyone else. It is currently somewhat incomplete as an IPv4 router as it doesn't update MAC addresses on frames before forwarding them, and because the interface names are hardcoded, but when it's more complete I'd be very happy for it to be contributed to the examples. Of course anyone is free to use my code for any purpose too.

Thanks for all your assistance! I'm happy enough with this that I will move on to looking at my IP routing code.

Charlie

Charlie Smurthwaite
Technical Director

tel. email. charlie at atech.media<mailto:charlie at atech.media> web. https://atech.media

This e-mail has been sent by aTech Media Limited (or one of its assoicated group companys, Dial 9 Communications Limited or Viaduct Hosting Limited). Its contents are confidential therefore if you have received this message in error, we would appreciate it if you could let us know and delete the message. aTech Media Limited is a UK limited company, registration number 5523199. Dial 9 Communications Limited is a UK limited company, registration number 7740921. Viaduct Hosting Limited is a UK limited company, registration number 8514362. All companies are registered at Unit 9 Winchester Place, North Street, Poole, Dorset, BH15 1NX.