new sk driver [was: nve timeout (and down) regression?]

Pieter de Goeje pieter at degoeje.nl
Tue Mar 28 14:38:23 UTC 2006


On Tuesday 28 March 2006 12:40, you wrote:
<snip>
> probably you do not have the traffic to make the box crash or less then
> 1/2GB of RAM in use

The box has 1GB RAM. Traffic is approx. 2-3Mbit/s.

>
> in fact the problem does not happen on UP machines, only some times a
> device timeout which only ocasionally cause rx/tx to stop
>
> The problem is appearing on SMP machines
>
> when you have less then 2Gb of RAM the problem ocurres once a day or so and
> seems to depend on memory use and amount of traffic
>
> soon the traffic reaches more than 1Mbit/s the crash is predictable and you
> can wait to see

The box has actually crashed once, but I am not sure it was because of the 
NIC. 

~> uptime
 4:19PM  up 3 days,  9:59, 1 user, load averages: 1.38, 1.20, 1.03

>
> on 4GB of Ram machines and more traffic the crash is imediatly and worse
> when the box crashed under load (4-6Mbit/s) and comes back then the high
> demand strokes it and it crashes in minutes or imediatly soon the network
> is up
>
> so probably mpsafenet may help by not processing concurrent packets but
> this is a workaround not a solution (for me)

Agreed.

>
> last time I checked mpsafenet=0 almost cut 1Mbit/s of traffic and the
> overall performance/response was bad, higher HZ did not resolved anything
> and disabling polling made it still worse (I have other NICs installed),
> the machines are working as GW
I can't really tell if the performance is impaired by mpsafenet=0, because the 
box is mostly busy doing userland stuff. Typical traffic looks like this:

~> netstat -w 1
            input        (Total)           output
   packets  errs      bytes    packets  errs      bytes colls
      1186     0      97134       1302     0     276430     0
      1206     0      97484       1382     0     264315     0
      1193     0      97048       1366     0     278901     0
      1198     0      98251       1403     0     273428     0
      1205     0      99283       1393     0     270364     0
      1162     0      94746       1376     0     265909     0
      1162     0      93011       1420     0     258514     0
      1187     0      94366       1467     0     263162     0
      1178     0      93441       1441     0     248875     0
      1176     0      93116       1484     0     266285     0
      1146     0      91615       1424     0     256180     0
      1222     0      96597       1560     0     432862     0
      1222     0      93796       1591     0     444466     0

This is all UDP. The traffic generates around 2000 interrupts/sec on sk.

>
> until january the machines didn't crashed, only timeouts and rx/tx stops
> I used Pyun's driver and the timeouts went away, thank's again!
>
> so then I got confused by some if_sk talks on stable and thought the driver
> was comitted and the boxes started crashing until I got it last week and
> reused Pyun's driver again and my sk problems are gone again, the machines
> are stable for 4/5 days now

I'm going to test the new driver to see if I can disable mpsafenet. To be 
specific on the NIC:

skc0 at pci0:10:0: class=0x020000 card=0x811a1043 chip=0x432011ab rev=0x13 
hdr=0x00
    vendor   = 'Marvell Semiconductor (Was: Galileo Technology Ltd)'
    device   = '88E8001/8003/8010 Gigabit Ethernet Controller with Integrated 
PHY (copper)'
    class    = network
    subclass = ethernet

Pieter de Goeje


More information about the freebsd-stable mailing list