Fwd: Re: Disappointing packets-per-second performance results on a Dell,PE R530
Sepherosa Ziehau
sepherosa at gmail.com
Tue Feb 28 02:47:53 UTC 2017
Did you compile and installed GENERIC-NODEBUG kernel for the CURRENT test?
On Tue, Feb 28, 2017 at 10:13 AM, Caraballo-vega, Jordan A.
(GSFC-6062)[COMPUTER SCIENCE CORP] <jordancaraballo87 at gmail.com>
wrote:
> As a summarywe have a Dell R530 with a Chelsio T580 cardwith -CURRENT.
>
> In an attempt to reduce the time the system was taking to look for the
> cpus; we changed the BIOS setting to let the system have 8 visible cores
> and tested cxl* and vcxl* chelsio interfaces. Scores are still way lower
> than what we expected:
>
> Cxl interface
>
> root at router1:~ # netstat -w1 -h
> input (Total) output
> packets errs idrops bytes packets errs bytes colls
> 4.1M 0 3.4M 2.1G 725k 0 383M 0
> 3.7M 0 3.1M 1.9G 636k 0 336M 0
> 3.9M 0 3.2M 2.0G 684k 0 362M 0
> 4.0M 0 3.3M 2.1G 702k 0 371M 0
> 3.8M 0 3.2M 2.0G 658k 0 348M 0
> 3.9M 0 3.2M 2.0G 658k 0 348M 0
> 3.9M 0 3.2M 2.0G 721k 0 381M 0
> 3.3M 0 2.6M 1.7G 681k 0 360M 0
> 3.2M 0 2.5M 1.7G 666k 0 352M 0
> 2.6M 0 2.0M 1.4G 620k 0 328M 0
> 2.8M 0 2.1M 1.4G 615k 0 325M 0
> 3.2M 0 2.6M 1.7G 612k 0 323M 0
> 3.3M 0 2.7M 1.7G 664k 0 351M 0
>
>
> Vcxl interface
> input (Total) output
> packets errs idrops bytes packets errs bytes colls drops
> 590k 7.5k 0 314M 590k 0 314M 0 0
> 526k 6.6k 0 280M 526k 0 280M 0 0
> 588k 7.1k 0 313M 588k 0 313M 0 0
> 532k 6.6k 0 283M 532k 0 283M 0 0
> 578k 7.2k 0 307M 578k 0 307M 0 0
> 565k 7.0k 0 300M 565k 0 300M 0 0
> 558k 7.0k 0 297M 558k 0 297M 0 0
> 533k 6.7k 0 284M 533k 0 284M 0 0
> 588k 7.3k 0 313M 588k 0 313M 0 0
> 553k 6.9k 0 295M 554k 0 295M 0 0
> 527k 6.7k 0 281M 527k 0 281M 0 0
> 585k 7.4k 0 311M 585k 0 311M 0 0
>
> Related to pmcstat scores are:
>
> root at router1:~/PMC_Stats/Feb22 # pmcstat -R sample.out -G - | head
> @ CPU_CLK_UNHALTED_CORE [2091 samples]
>
> 15.35% [321] lock_delay @ /boot/kernel/kernel
> 94.70% [304] _mtx_lock_spin_cookie
> 100.0% [304] __mtx_lock_spin_flags
> 57.89% [176] pmclog_loop @ /boot/kernel/hwpmc.ko
> 100.0% [176] fork_exit @ /boot/kernel/kernel
> 41.12% [125] pmclog_reserve @ /boot/kernel/hwpmc.ko
> 100.0% [125] pmclog_process_callchain
> 100.0% [125] pmc_process_samples
>
> root at router1:~/PMC_Stats/Feb22 # pmcstat -R sample0.out -G - | head
> @ CPU_CLK_UNHALTED_CORE [480 samples]
>
> 37.29% [179] acpi_cpu_idle_mwait @ /boot/kernel/kernel
> 100.0% [179] acpi_cpu_idle
> 100.0% [179] cpu_idle_acpi
> 100.0% [179] cpu_idle
> 100.0% [179] sched_idletd
> 100.0% [179] fork_exit
>
> 12.92% [62] cpu_idle @ /boot/kernel/kernel
>
> When trying to run pmcstat with the vcxl interfaces enabled the system
> just went to a state of not responding.
>
> Based on previous scores with Centos 7 (over 3M pps), we can assume that
> it is not the hardware. However, we are still looking for a reason of
> why are we getting these scores.
>
> Any feedback or suggestion would be highly appreciated.
>
> - Jordan
>
> On 2/9/17 11:34 AM, Navdeep Parhar wrote:
>> The vcxl interfaces should work under current or 11-STABLE. Let me know
>> if you run into any trouble when trying to use netmap with cxgbe driver.
>>
>> Regards,
>> Navdeep
>>
>> On Thu, Feb 09, 2017 at 10:29:08AM -0500, John Jasen wrote:
>>> It's not the hardware.
>>>
>>> Jordan booted up CentOS on the box, and untuned, were able to obtain
>>> over 3 mpps.
>>>
>>> He has some pmcstat output from freebsd-current, but basically, it
>>> appears the system spends most of its time looking for a CPU to service
>>> the interrupts and keeps landing on one or two of them, as opposed to
>>> any of the other 16 cores on the physical silicon.
>>>
>>> We also tried swapping out the T5 card for a Mellanox, tried different
>>> PCIe slots, adjusted cpuset for the low and the high CPUs, no matter
>>> what we try, the results have been bad.
>>>
>>> Our network test environment is under reconstruction at the moment, but
>>> our plans afterwards are to:
>>>
>>> a) test netmap-fwd again (the VCXL enabling works under -CURRENT?)
>>>
>>> b) test without netmap-fwd, and with reduced cores/physical cpus (BIOS
>>> setting)
>>>
>>> c) potentially, test with netmap-fwd and reduced core count.
>>>
>>> Any other ideas out there?
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 02/05/2017 12:55 PM, Navdeep Parhar wrote:
>>>> I've been following the email thread on freebsd-net on this. The
>>>> numbers you're getting are well below what the hardware is capable of.
>>>>
>>>> Have you tried netmap-fwd or something that bypasses the kernel? That
>>>> will be a very quick way to make sure that the hardware is doing ok.
>>>>
>>>> In case you try netmap:
>>>> cxgbe has virtual interfaces now and those are used for netmap (instead
>>>> of the main interface). Add this line to /boot/loader.conf and you'll
>>>> see a 'vcxl' interface for every cxl interface.
>>>> hw.cxgbe.num_vis=2
>>>> It has its own MAC address and can be used like any other interface,
>>>> except it has native netmap support too. You can run netmap-fwd between
>>>> these vcxl ports.
>>>>
>>>> Regards,
>>>> Navdeep
>>>>
>>>> On Tue, Jan 31, 2017 at 01:57:37PM -0400, Jordan Caraballo wrote:
>>>>> Navdeep, Troy,
>>>>>
>>>>> I forwarded you this email to see if we could get feedback from both of
>>>>> you. I talked with Troy during November about
>>>>>
>>>>> this R530 system and the use of a 40G Chelsio T-580-CR card. So far, we
>>>>> have not experienced results above 1.4 million or so.
>>>>>
>>>>> Any help would be appreciated.
>>>>>
>>>>> - Jordan
>>>>>
>>>>> -------- Forwarded Message --------
>>>>>
>>>>> Subject: Re: Disappointing packets-per-second performance results on a
>>>>> Dell,PE R530
>>>>> Date: Tue, 31 Jan 2017 13:53:15 -0400
>>>>> From: Jordan Caraballo <jordancaraballo87 at gmail.com>
>>>>> To: Slawa Olhovchenkov <slw at zxy.spb.ru>
>>>>> CC: freebsd-net at freebsd.org
>>>>>
>>>>> This are the most recent stats. No advances so far. The system has
>>>>> -Current right now.
>>>>>
>>>>> Any help or feedback would be appreciated.
>>>>> Hardware Configuration:
>>>>> Dell PowerEdge R530 with 2 Intel(R) Xeon(R) E52695 CPU's, 18 cores per
>>>>> cpu. Equipped with a Chelsio T-580-CR dual port in an 8x slot.
>>>>>
>>>>> BIOS tweaks:
>>>>> Hyperthreading (or Logical Processors) is turned off.
>>>>> loader.conf
>>>>> # Chelsio Modules
>>>>> t4fw_cfg_load="YES"
>>>>> t5fw_cfg_load="YES"
>>>>> if_cxgbe_load="YES"
>>>>> rc.conf
>>>>> # Gateway Configuration
>>>>> ifconfig_cxl0="inet 172.16.1.1/24"
>>>>> ifconfig_cxl1="inet 172.16.2.1/24"
>>>>> gateway_enable="YES"
>>>>>
>>>>> Last Results:
>>>>> packets errs idrops bytes packets errs bytes colls drops
>>>>> 2.7M 0 2.0M 1.4G 696k 0 368M 0 0
>>>>> 2.7M 0 2.0M 1.4G 686k 0 363M 0 0
>>>>> 2.6M 0 2.0M 1.4G 668k 0 353M 0 0
>>>>> 2.7M 0 2.0M 1.4G 661k 0 350M 0 0
>>>>> 2.8M 0 2.1M 1.5G 697k 0 369M 0 0
>>>>> 2.8M 0 2.1M 1.4G 684k 0 361M 0 0
>>>>> 2.7M 0 2.1M 1.4G 674k 0 356M 0 0
>>>>>
>>>>> root at router1:~ # vmstat -i
>>>>>
>>>>> interrupt total rate
>>>>> irq9: acpi0 73 0
>>>>> irq18: ehci0 ehci1 1155973 3
>>>>> cpu0:timer 3551157 10
>>>>> cpu29:timer 9303048 27
>>>>> cpu9:timer 71693455 207
>>>>> cpu16:timer 9798380 28
>>>>> cpu18:timer 9287094 27
>>>>> cpu26:timer 9342495 27
>>>>> cpu20:timer 9145888 26
>>>>> cpu8:timer 9791228 28
>>>>> cpu22:timer 9288116 27
>>>>> cpu35:timer 9376578 27
>>>>> cpu30:timer 9396294 27
>>>>> cpu23:timer 9248760 27
>>>>> cpu10:timer 9756455 28
>>>>> cpu25:timer 9300202 27
>>>>> cpu27:timer 9227291 27
>>>>> cpu14:timer 10083548 29
>>>>> cpu28:timer 9325684 27
>>>>> cpu11:timer 9906405 29
>>>>> cpu34:timer 9419170 27
>>>>> cpu31:timer 9392089 27
>>>>> cpu33:timer 9350540 27
>>>>> cpu15:timer 9804551 28
>>>>> cpu32:timer 9413182 27
>>>>> cpu19:timer 9231505 27
>>>>> cpu12:timer 9813506 28
>>>>> cpu13:timer 10872130 31
>>>>> cpu4:timer 9920237 29
>>>>> cpu2:timer 9786498 28
>>>>> cpu3:timer 9896011 29
>>>>> cpu5:timer 9890207 29
>>>>> cpu6:timer 9737869 28
>>>>> cpu7:timer 9790119 28
>>>>> cpu1:timer 9847913 28
>>>>> cpu21:timer 9192561 27
>>>>> cpu24:timer 9300259 27
>>>>> cpu17:timer 9786186 28
>>>>> irq264: mfi0 151818 0
>>>>> irq266: bge0 30466 0
>>>>> irq272: t5nex0:evt 4 0
>>>>> Total 402604945 1161
>>>>> top -PHS
>>>>> last pid: 18557; load averages: 2.58, 1.90, 0.95 up 4+00:39:54 18:30:46
>>>>> 231 processes: 40 running, 126 sleeping, 65 waiting
>>>>> CPU 0: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 3: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 4: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 5: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 6: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle
>>>>> CPU 7: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 8: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 9: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 10: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 11: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 12: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 13: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 14: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 15: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 16: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 17: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 18: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 19: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 20: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 21: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 22: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 23: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 24: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 25: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 26: 0.0% user, 0.0% nice, 0.0% system, 59.6% interrupt, 40.4% idle
>>>>> CPU 27: 0.0% user, 0.0% nice, 0.0% system, 96.3% interrupt, 3.7% idle
>>>>> CPU 28: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 29: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 30: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 31: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 32: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 33: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> CPU 34: 0.0% user, 0.0% nice, 0.0% system, 100% interrupt, 0.0% idle
>>>>> CPU 35: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>>>> Mem: 15M Active, 224M Inact, 1544M Wired, 393M Buf, 29G Free
>>>>> Swap: 3881M Total, 3881M Free
>>>>>
>>>>> pmcstat -R sample.out -G - | head
>>>>> @ CPU_CLK_UNHALTED_CORE [159 samples]
>>>>>
>>>>> 39.62% [63] acpi_cpu_idle_mwait @ /boot/kernel/kernel
>>>>> 100.0% [63] acpi_cpu_idle
>>>>> 100.0% [63] cpu_idle_acpi
>>>>> 100.0% [63] cpu_idle
>>>>> 100.0% [63] sched_idletd
>>>>> 100.0% [63] fork_exit
>>>>>
>>>>> 17.61% [28] cpu_idle @ /boot/kernel/kernel
>>>>>
>>>>> root at router1:~ # pmcstat -R sample0.out -G - | head
>>>>> @ CPU_CLK_UNHALTED_CORE [750 samples]
>>>>>
>>>>> 31.60% [237] acpi_cpu_idle_mwait @ /boot/kernel/kernel
>>>>> 100.0% [237] acpi_cpu_idle
>>>>> 100.0% [237] cpu_idle_acpi
>>>>> 100.0% [237] cpu_idle
>>>>> 100.0% [237] sched_idletd
>>>>> 100.0% [237] fork_exit
>>>>>
>>>>> 10.67% [80] cpu_idle @ /boot/kernel/kernel
>>>>>
>>>>> On 03/01/17 13:46, Slawa Olhovchenkov wrote:
>>>>>
>>>>> On Tue, Jan 03, 2017 at 12:35:42PM -0400, Jordan Caraballo wrote:
>>>>>
>>>>>
>>>>> We recently tested a Dell R530 with a Chelsio T580 card, under FreeBSD 10.3, 11.0, -STABLE and -CURRENT, and Centos 7.
>>>>>
>>>>> Based on our research, including netmap-fwd and with the routing improvements project (https://wiki.freebsd.org/ProjectsRoutingProposal),
>>>>> we hoped for packets-per-second (pps) in the 5+ million range, or even higher.
>>>>>
>>>>> Based on prior testing (http://marc.info/?t=140604252400002&r=1&w=2), we expected 3-4 Million to be easily obtainable.
>>>>>
>>>>> Unfortunately, our current results top out at no more than 1.5 M (64 bytes length packets) with FreeBSD, and
>>>>> surprisingly around 3.2 M (128 bytes length packets) with Centos 7, and we are at a loss as to why.
>>>>>
>>>>> Server Description:
>>>>> Dell PowerEdge R530 with 2 Intel(R) Xeon(R) E52695 CPU's, 18 cores per
>>>>> cpu. Equipped with a Chelsio T-580-CR dual port in an 8x slot.
>>>>>
>>>>> ** Can this be a lack in support issue related to the R530's hardware? **
>>>>>
>>>>> Any help appreciated!
>>>>>
>>>>> What hardware configuration?
>>>>> What BIOS setting?
>>>>> What loader.conf/sysctl.conf setting?
>>>>> What `vmstat -i`?
>>>>> What `top -PHS`?
>>>>> what
>>>>> ====
>>>>> pmcstat -S CPU_CLK_UNHALTED_CORE -l 10 -O sample.out
>>>>> pmcstat -R sample.out -G out.txt
>>>>> pmcstat -c 0 -S CPU_CLK_UNHALTED_CORE -l 10 -O sample0.out
>>>>> pmcstat -R sample0.out -G out0.txt
>>>>> ====
>
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
--
Tomorrow Will Never Die
More information about the freebsd-net
mailing list