GEOM probes fail on aac with EARLY_AP_STARTUP
Scott Long
scottl at samsco.org
Fri Sep 8 15:12:09 UTC 2017
Hi John,
Great bug report and analysis. I think you’re right, behavior in the system
changed with EARLY_AP_STARTUP and the intrhook is being released too
soon now, before the driver is ready for concurrent access. I’ll shepherd it
into SVN. There’s a similar pattern in most of the non-CAM drivers, so I’ll
review them as well.
Scott
> On Sep 7, 2017, at 7:19 PM, john hood <cgull at glup.org> wrote:
>
> I've got a devel machine here which was failing to boot on our vendored
> FreeBSD 11.1, because GEOM was unable to find the partitions on the boot
> drive and so the root mount failed. This started happening on many but
> not all boots after I upgraded the machine from 9.3.
>
> The machine is an Intel S25520UR motherboard with 2x Xeon E5620 CPUs
> (Hyperthreading enabled, so hw.ncpu=16) and an Adaptec 5805, and 2 RAID
> volumes configured on 6 SATA drives.
>
> When booting, it sees the aac0 controller and aacd0
> volume but GEOM does not find any of the partitions on that volume, and the
> initial mount of root on /dev/aacd0p2 fails. aacd0 is available and
> readable, but the expected aacd0p{1,2,3} devices do not exist.
> (However, aacd1 and its partitions/devices are configured normally.)
>
> I think it's a race condition between the aac driver and GEOM probing,
> probably newly triggered/exposed by EARLY_AP_STARTUP. I've reproduced
> the problem on upstream FreeBSD 11.1 and -current. Disabling
> EARLY_AP_STARTUP, or setting kern.smp.disabled=1, causes the kernel to
> start correctly. 'boot -v' also causes the kernel to start correctly.
>
> The kernel calls aac_attach() which uses
> configure_intrhook_establish() to run aac_startup() later. When that
> runs, it adds devices via
> aac_add_container()/device_add_child()/bus_generic_attach().
>
> However, at the beginning of aac_attach(), an AAC_STATE_SUSPEND flag
> is set. It is cleared at the end of aac_startup(). It appears that
> GEOM probes call aac_disk_open(), which checks the flag and returns
> error if it is set. On my system the race is that the GEOM probes
> happen before that flag is cleared, possibly because GEOM is tasting
> aacd0 while the aac driver is still attaching aacd1. So the GEOM probes
> fail and the geom nodes never get created. If I boot with the -v flag,
> the kernel boots successfully, I think because the message printing
> takes long enough to delay GEOM probing past aac_start() completion.
>
> I've attached a patch which resolves the problem on FreeBSD-current (and 11.1), would anybody care to adopt it and shepherd it into SVN?
>
> regards,
>
> --John Hood
>
> <aac.diff>_______________________________________________
> freebsd-scsi at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"
More information about the freebsd-scsi
mailing list