cvs commit: src/sys/dev/bge if_bge.c
Scott Long
scottl at samsco.org
Mon Dec 18 21:24:54 PST 2006
Bruce Evans wrote:
> On Sat, 16 Dec 2006, I wrote:
>
>> On Thu, 14 Dec 2006, I wrote:
>>
>>> On Wed, 13 Dec 2006, Jung-uk Kim wrote:
>>>
>>>> On Wednesday 13 December 2006 03:51 pm, Scott Long wrote:
>>>>> scottl 2006-12-13 20:51:51 UTC
>>>>>
>>>>> FreeBSD src repository
>>>>>
>>>>> Modified files:
>>>>> sys/dev/bge if_bge.c
>>>>> Log:
>>>>> Remove a redundant write of the firmware reset magic number. It
>>>>> ...
>>>> I am still getting firmware handshake timeouts and/or watchdog
>>>> timeouts. Most importantly it panics or get witness warnings (lots
>>>> of 'memory modified after free'). Panic goes like this (while
>>>> kldunload if_bge with dhclient enabled):
>>>>
>>>> brgphy0: detached
>>>> miibus0: detached
>>>> bge0: firmware handshake timed out, found 0x4b657654
>>>> bge0: firmware handshake timed out, found 0x4b657654
>>>
>>> I have seen these for debugging the redundant-write problem (not for
>>> detach but for bringing up the interface for the first time). My 5701
>>> just hangs if there is any redundant write (2 where the first one was
>>> in bge_reset(), or 2 separate, or 2 where the second one was). My
>>> 5705 survives two separate sets of 256 repeated writes; however, then
>>> the firmware handshake times out; however2, everything works normally
>>> after ignoring the the timeout except for printing the message. I
>>> just noticed that this error wasn't ignored until recently -- I noticed
>>> the return statement being removed but not that it was in a critical
>>> area.
>>
>> The debugging code doesn't seem to have been responsible for this.
>> Now, without it I almost (?) always get handshake errors on the 5705,
>> but never (?) on the 5701. Apparently, the 3rd write (the one that
>> was removed) was the only correctly placed one.
>
> Avoiding the "write_op" part of the changes fixes the handshake errors
> on my non-PCIE 5705. write_op is only used to write the reset value and
> one other value to BGE_MISC_CFG. bge_writemem_ind() apparently writes
> the reset to nowhere, but bge_writereg() still works.
>
> %%%
> Index: if_bge.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v
> retrieving revision 1.165
> diff -u -2 -r1.165 if_bge.c
> --- if_bge.c 15 Dec 2006 00:27:06 -0000 1.165
> +++ if_bge.c 18 Dec 2006 10:44:05 -0000
> @@ -2544,4 +2634,7 @@
> if (sc->bge_flags & BGE_FLAG_PCIE)
> write_op = bge_writemem_direct;
> + /* XXX bge_writemem_ind is wrong for at least reset of 5705. */
> + else if (sc->bge_asicrev == BGE_ASICREV_BCM5705)
> + write_op = bge_writereg_ind;
> else
> write_op = bge_writemem_ind;
> %%%
>
> The panics might be caused by the change making the reset null. Resetting
> might be much more necessary for uninitialization than for initialization.
>
> The bug caused the following behaviour here:
> - the problem with taking a long time to start serving nfs requests (with
> /usr nfs-mounted) became larger. Normally, nfs tries to start before
> the interface is really up and then it takes about a minute to start.
> With the bug, it often got portmap errors and sometimes never started.
> - after "ifconfig down", it took a reboot to bring the interface back up.
>
> Bruce
Ok, this looks like a result of me not understanding a bit of the linux
code that I read. When doing the reset, the linux equivalent of
bge_writemem_ind() is specifically avoided.
I'm on vacation for the next 10 days, but I'll try to put together a
patch that addresses this and other problems soon. Ping my after the
first of the year otherwise.
Scott
More information about the cvs-src
mailing list