kern/162110: Releng_9 panics on boot in IGB driver - regression from 8.2

Gleb Smirnoff glebius at FreeBSD.org
Mon Oct 31 19:40:10 UTC 2011


The following reply was made to PR kern/162110; it has been noted by GNATS.

From: Gleb Smirnoff <glebius at FreeBSD.org>
To: Frank Terhaar-Yonkers <fty at cisco.com>
Cc: freebsd-gnats-submit at FreeBSD.org, jfv at FreeBSD.org
Subject: Re: kern/162110: Releng_9 panics on boot in IGB driver - regression
 from 8.2
Date: Mon, 31 Oct 2011 22:37:28 +0300

 --LTeJQqWS0MN7I/qa
 Content-Type: text/plain; charset=koi8-r
 Content-Disposition: inline
 
 On Fri, Oct 28, 2011 at 07:43:28PM +0000, Frank Terhaar-Yonkers wrote:
 F> 
 F> >Number:         162110
 F> >Category:       kern
 F> >Synopsis:       Releng_9 panics on boot in IGB driver - regression from 8.2
 F> >Confidential:   no
 F> >Severity:       critical
 F> >Priority:       high
 F> >Responsible:    freebsd-bugs
 F> >State:          open
 F> >Quarter:        
 F> >Keywords:       
 F> >Date-Required:
 F> >Class:          sw-bug
 F> >Submitter-Id:   current-users
 F> >Arrival-Date:   Fri Oct 28 19:50:08 UTC 2011
 F> >Closed-Date:
 F> >Last-Modified:
 F> >Originator:     Frank Terhaar-Yonkers
 F> >Release:        Releng_9 CVSUP 2011-October-28
 F> >Organization:
 F> Cisco
 F> >Environment:
 F> FreeBSD fty-zfs-01 9.0-RC1 FreeBSD 9.0-RC1 #1: Fri Oct 28 06:50:23 EDT 2011     toot at fty-zfs-01:/usr/obj/usr/src/sys/GENERIC  amd64
 F> >Description:
 F> if_igb driver panics during bootup.
 F> 
 F> The IGB driver probes the device at line 591 of if_igb.c and punts:
 F>                 if (e1000_validate_nvm_checksum(&adapter->hw) < 0) {
 F>                         device_printf(dev,
 F>                             "The EEPROM Checksum Is Not Valid\n");
 F>                         error = EIO;
 F>                         goto err_late;
 F>                 }
 F> 
 F> The kernel immediately panics with a page fault.  The trace-back show it's in the if_igb driver as the console messages suggest.
 F> 
 F> Releng_8 did not panic, so this is a regression.  The IGB NIC most likely has some sort of problem which is properly diagnosed.
 F> 
 F> Email me if you want the screen shot of the panic, or have a fix to try out.
 
 To reproduce your problem, I've put '|| 1)' conditional into code quoted
 above. It appeared that calling igb_detach() in case of igb_attach() failure
 is full of landmines. Attached patch fixes lot of them, and at least kernel
 doesn't panic in case of e1000_validate_nvm_checksum() failure, not sure
 about other cases.
 
 Unfortunately patch will not fix your NIC, it only cures panic.
 
 I've put into Cc Jack Vogel, who is maintainer of the Intel NIC drivers
 in FreeBSD. May be he can help you.
 
 Jack, please consider including my patch into next version of driver.
 The issues fixed:
 
 - igb_detach() may be called with not initialized ifp
 - igb_stop() may be called with not initialized ifp
 - igb_detach() already does free transmit/receive structures
 - igb_detach() already does free adapter->mta
 - igb_detach() already does destroy core lock
 
 There are probably other edge cases, when kernel panics due to some failure
 in igb_attach(), not all possible error exits were tested.
 
 -- 
 Totus tuus, Glebius.
 
 --LTeJQqWS0MN7I/qa
 Content-Type: text/x-diff; charset=koi8-r
 Content-Disposition: attachment; filename="if_igb.c.diff"
 
 Index: if_igb.c
 ===================================================================
 --- if_igb.c	(revision 226966)
 +++ if_igb.c	(working copy)
 @@ -670,11 +670,12 @@
  
  err_late:
  	igb_detach(dev);
 -	igb_free_transmit_structures(adapter);
 -	igb_free_receive_structures(adapter);
  	igb_release_hw_control(adapter);
  	if (adapter->ifp != NULL)
  		if_free(adapter->ifp);
 +	igb_free_pci_resources(adapter);
 +	return (error);
 +
  err_pci:
  	igb_free_pci_resources(adapter);
  	free(adapter->mta, M_DEVBUF);
 @@ -701,26 +702,37 @@
  
  	INIT_DEBUGOUT("igb_detach: begin");
  
 -	/* Make sure VLANS are not using driver */
 -	if (adapter->ifp->if_vlantrunk != NULL) {
 -		device_printf(dev,"Vlan in use, detach first\n");
 -		return (EBUSY);
 -	}
 +	IGB_CORE_LOCK(adapter);
 +	adapter->in_detach = 1;
 +	igb_stop(adapter);
 +	IGB_CORE_UNLOCK(adapter);
  
 -	ether_ifdetach(adapter->ifp);
 +	/* Unregister VLAN events */
 +	if (adapter->vlan_attach != NULL)
 +		EVENTHANDLER_DEREGISTER(vlan_config, adapter->vlan_attach);
 +	if (adapter->vlan_detach != NULL)
 +		EVENTHANDLER_DEREGISTER(vlan_unconfig, adapter->vlan_detach);
  
 -	if (adapter->led_dev != NULL)
 -		led_destroy(adapter->led_dev);
 +	callout_drain(&adapter->timer);
  
 +	if (ifp != NULL) {
 +		/* Make sure VLANS are not using driver */
 +		if (ifp->if_vlantrunk != NULL) {
 +			device_printf(dev,"Vlan in use, detach first\n");
 +			return (EBUSY);
 +		}
 +
 +		ether_ifdetach(ifp);
 +
  #ifdef DEVICE_POLLING
 -	if (ifp->if_capenable & IFCAP_POLLING)
 -		ether_poll_deregister(ifp);
 +		if (ifp->if_capenable & IFCAP_POLLING)
 +			ether_poll_deregister(ifp);
  #endif
 +		if_free(ifp);
 +	}
  
 -	IGB_CORE_LOCK(adapter);
 -	adapter->in_detach = 1;
 -	igb_stop(adapter);
 -	IGB_CORE_UNLOCK(adapter);
 +	if (adapter->led_dev != NULL)
 +		led_destroy(adapter->led_dev);
  
  	e1000_phy_hw_reset(&adapter->hw);
  
 @@ -734,17 +746,8 @@
  		igb_enable_wakeup(dev);
  	}
  
 -	/* Unregister VLAN events */
 -	if (adapter->vlan_attach != NULL)
 -		EVENTHANDLER_DEREGISTER(vlan_config, adapter->vlan_attach);
 -	if (adapter->vlan_detach != NULL)
 -		EVENTHANDLER_DEREGISTER(vlan_unconfig, adapter->vlan_detach);
 -
 -	callout_drain(&adapter->timer);
 -
  	igb_free_pci_resources(adapter);
  	bus_generic_detach(dev);
 -	if_free(ifp);
  
  	igb_free_transmit_structures(adapter);
  	igb_free_receive_structures(adapter);
 @@ -2135,7 +2138,8 @@
  	callout_stop(&adapter->timer);
  
  	/* Tell the stack that the interface is no longer active */
 -	ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
 +	if (ifp != NULL)
 +		ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
  
  	/* Unarm watchdog timer. */
  	for (int i = 0; i < adapter->num_queues; i++, txr++) {
 
 --LTeJQqWS0MN7I/qa--


More information about the freebsd-net mailing list