GEOM probes fail on aac with EARLY_AP_STARTUP

Scott Long scottl at samsco.org
Fri Sep 8 15:12:09 UTC 2017


Hi John,

Great bug report and analysis.  I think you’re right, behavior in the system
changed with EARLY_AP_STARTUP and the intrhook is being released too
soon now, before the driver is ready for concurrent access.  I’ll shepherd it
into SVN.  There’s a similar pattern in most of the non-CAM drivers, so I’ll
review them as well.

Scott

> On Sep 7, 2017, at 7:19 PM, john hood <cgull at glup.org> wrote:
> 
> I've got a devel machine here which was failing to boot on our vendored
> FreeBSD 11.1, because GEOM was unable to find the partitions on the boot
> drive and so the root mount failed.  This started happening on many but
> not all boots after I upgraded the machine from 9.3.
> 
> The machine is an Intel S25520UR motherboard with 2x Xeon E5620 CPUs
> (Hyperthreading enabled, so hw.ncpu=16) and an Adaptec 5805, and 2 RAID
> volumes configured on 6 SATA drives.
> 
> When booting, it sees the aac0 controller and aacd0
> volume but GEOM does not find any of the partitions on that volume, and the
> initial mount of root on /dev/aacd0p2 fails.  aacd0 is available and
> readable, but the expected aacd0p{1,2,3} devices do not exist.
> (However, aacd1 and its partitions/devices are configured normally.)
> 
> I think it's a race condition between the aac driver and GEOM probing,
> probably newly triggered/exposed by EARLY_AP_STARTUP.  I've reproduced
> the problem on upstream FreeBSD 11.1 and -current.  Disabling
> EARLY_AP_STARTUP, or setting kern.smp.disabled=1, causes the kernel to
> start correctly. 'boot -v' also causes the kernel to start correctly.
> 
> The kernel calls aac_attach() which uses
> configure_intrhook_establish() to run aac_startup() later.  When that
> runs, it adds devices via
> aac_add_container()/device_add_child()/bus_generic_attach().
> 
> However, at the beginning of aac_attach(), an AAC_STATE_SUSPEND flag
> is set.  It is cleared at the end of aac_startup().  It appears that
> GEOM probes call aac_disk_open(), which checks the flag and returns
> error if it is set.  On my system the race is that the GEOM probes
> happen before that flag is cleared, possibly because GEOM is tasting
> aacd0 while the aac driver is still attaching aacd1.  So the GEOM probes
> fail and the geom nodes never get created.  If I boot with the -v flag,
> the kernel boots successfully, I think because the message printing
> takes long enough to delay GEOM probing past aac_start() completion.
> 
> I've attached a patch which resolves the problem on FreeBSD-current (and 11.1), would anybody care to adopt it and shepherd it into SVN?
> 
> regards,
> 
>  --John Hood
> 
> <aac.diff>_______________________________________________
> freebsd-scsi at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"



More information about the freebsd-scsi mailing list