vr(4) troubles for AMD Geode CS5536 chipset
YongHyeon PYUN
pyunyh at gmail.com
Mon Sep 3 02:41:03 UTC 2012
On Fri, Aug 31, 2012 at 12:45:53PM +0700, Eugene Grosbein wrote:
> In previous letter I've described my attempts to try vr(4) from HEAD.
> Now I'd like to explain why I've tried it.
>
> The problem is that stock vr(4) from 8.3-STABLE/i386 has serious issues for my system.
> I have home router with two vr interfaces, vr0 is for LAN (IPoE) and vr1 is for WAN (PPPoE/mpd).
>
> Presently, every day my WAN vr interface stops running correctly:
> sometimes it stops receiving all packets - tcpdump shows none of them.
> Sometimes, it receives some but with great delay - up to 10 seconds (not miliseconds)
> and even more. tcpdump shows that delay occurs on receive path.
> Sometimes, it even rearranges packets - tcpdump shows that some incoming ICMP echo requests
> with lower sequence numbers come in later that already answered higher-numbered requests.
Hmm, it seems driver's consumer/producer index of RX path were
corrupted.
>
> ifconfig vr1 down/up revives interface completely until next morning.
> sysctl net.inet.ip.fw.enable=0 does not solve the problem.
>
> I have control over WAN switching/routing network and may assure it runs just fine.
> However, I can't guarantee it has no "soft" anomalies like short storms or some silly broadcasts.
>
> I've tried to make incoming flood with ng_source(4) generated UDP flood at 100M rate
> for 60 seconds and failed to reproduce the problem artificially.
>
> I've tried to move WAN from vr1 to vr0 and the problem has moved to vr0 too.
> My LAN has very little traffic and corresponding vr interface exhibits no problems.
>
> This router also routinely runs transmission (torrent client from ports)
> serving torrents from USB-attached HDD making severe CPU load, so I suspect
> the problem may be related with CPU load.
>
> I've also checked mbuf/mbuf clusters usage and they are all right:
>
> # netstat -m
> 1539/2076/3615 mbufs in use (current/cache/total)
> 1200/1278/2478/65536 mbuf clusters in use (current/cache/total/max)
> 1200/306 mbuf+clusters out of packet secondary zone in use (current/cache)
> 318/181/499/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 4056K/3799K/7855K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/4/6656 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
>
> # vmstat -z | egrep -i 'ITEM|mbuf'
> ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
> mbuf_packet: 256, 0, 1429, 77, 112854470, 0
> mbuf: 256, 0, 489, 1620, 369073316, 0
> mbuf_cluster: 2048, 65536, 1506, 604, 5401864, 0
> mbuf_jumbo_page: 4096, 12800, 469, 158, 8306777, 0
> mbuf_jumbo_9k: 9216, 6400, 0, 0, 0, 0
> mbuf_jumbo_16k: 16384, 3200, 0, 0, 0, 0
> mbuf_ext_refcnt: 4, 0, 0, 0, 0, 0
> NetGraph items: 36, 4130, 1, 117, 263123, 0
> NetGraph data items: 36, 531, 0, 295, 106663377, 0
>
> While ifconfig vr1 down/up solves the problem completely (for some long time),
> taking link down/up using switch solves it "in half" - huge packet delays disappear
> and turn to 25% packet loss happening in regular short intervals, once a second of like.
>
> ifconfig down/up clears this mess too.
>
> Please help me to debug this, it's pretty annoying.
By chance, did vr(4) spew some kind of diagnostics messages to
console? If I remember correctly, vr(4) automatically restarts
controller and show these errors when it detects abnormal
condition. Abnormal conditions for vr(4) would be:
- TX/RX MAC stuck
- RX MAC stop due to FIFO overflow or no RX buffers
- PCI bus errors
- TX abort
- TX underrun
> I had a hope new vr(4) driver would help but it takes my system down under average load
> and is unusable.
>
> Here is start of dmesg.boot:
>
> Copyright (c) 1992-2012 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 8.3-STABLE #1: Wed Aug 29 22:49:45 NOVT 2012
> root at grosbein.pp.ru:/usr/local/obj/nanobsd.gw/i386/usr/local/src/sys/GW i386
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: Geode(TM) Integrated Processor by AMD PCS (499.91-MHz 586-class CPU)
> Origin = "AuthenticAMD" Id = 0x5a2 Family = 5 Model = a Stepping = 2
> Features=0x88a93d<FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CLFLUSH,MMX>
> AMD Features=0xc0400000<MMX+,3DNow!+,3DNow!>
> real memory = 1065025536 (1015 MB)
> avail memory = 1032929280 (985 MB)
> K6-family MTRR support enabled (2 registers)
>
> I must also note that this system runs with ACPI disabled in /boot/loader.conf:
> hint.acpi.0.disabled=1
>
> Otherwise, its timekeeping becomes broken.
>
> Eugene Gtosbein
More information about the freebsd-net
mailing list