Thinkpad R60 hangs when booting recent 8.4-STABLE
John Baldwin
jhb at freebsd.org
Fri May 2 18:06:54 UTC 2014
On Friday, May 02, 2014 5:39:19 am Don Lewis wrote:
> On 1 May, John Baldwin wrote:
> > On Thursday, May 01, 2014 1:55:41 am Don Lewis wrote:
> >> On 30 Apr, John Baldwin wrote:
> >>
> >> > Are you up for doing some printf sleuthing? There are two odd things that I
> >> > see so far:
> >> >
> >> > 1) the base address of 0. The question here is if pci_add_map() in
> >> > sys/dev/pci/pci.c decides to set start to 0 explicitly, or if it happens
> >> > further up the callchain (should be bus_alloc_resource calls in
> >> > sys/dev/acpica/acpi_pcib_acpi.c, sys/x86/x86/nexus.c and then in the
> >> > rman code itself in sys/kern/subr_rman.c)
> >> >
> >> > 2) The 'reserved' printfs during boot probe. Those come from a printf in
> >> > pci_alloc_resource() in sys/dev/pci/pci.c. However, that should not be called
> >> > until a driver attaches to a device and calls bus_alloc_resource(). It should
> >> > not be called from pci_add_child() as it seems to be now.
> >>
> >> The call graph for the four earlier ones that you previously pointed
> >> out (not hostb0) is:
> >> pci_add_child()
> >> pci_add_resources()
> >> *_early_takeover()
> >> [I suspect]
> >> bus_alloc_resource_any()
> >> pci_alloc_resource()
> >>
> >> These are the three system uhci controllers and the system ehci
> >> controller, which apparently pass this test:
> >> pci_get_class(dev) == PCIC_SERIALBUS &&
> >> pci_get_subclass(dev) == PCIS_SERIALBUS_USB
> >
> > Oh, ok. That is fine, and that explains why it was selective in the past (and
> > only for I/O resources). That just leaves 1) then. It would be especially good
> > to know what pci_add_map() does when it sees this BAR during the bus probe.
>
> In either case after reading the BAR, base=0xd0000000,
> testval=0xf0000008. and the following gets printed:
> map[10]: type Prefetchable Memory, range 32, base 0xd0000000, size 28, enabled
> Then we call resource_list_add(..., d0000000, dfffffff, 10000000)
>
> When we don't pass the size in flags (w/o r262226),
> resource_list_alloc() eventually calls nexus_alloc_resource() with the
> expected start, end, and count values. After returning from
> pci_add_map, we write 0xd0000000 to the BAR.
>
> If we *do* pass the size in flags (with r262226), then
> resource_list_alloc() never calls nexus_alloc_resource(), but it does
> return a non-NULL value. We then write 0 to the BAR.
>
> Digging into resource_list_alloc(), I see start, count, and end having
> the expected values before the second call to BUS_ALLOC_RESOURCE()
> but afterwards rman_get_start(rle->res) is 0 and rman_get_end(rle->res)
> is 0xfffffff. We don't call nexus_alloc_resource(). Setting flags to 0
> for this specific call to BUS_ALLOC_RESOURCE() based on the value of
> start fixes the problem.
>
> The culprit appears to be the call to rman_reserve_resource() in
> acpi_alloc_resource(). With r262226 it "succeeds", but the returned
> resource has the wrong start and end addresses. Without r262226,
> rman_reserve_resource returns NULL and we call BUS_ALLOC_RESOURCE(),
> which calls nexus_alloc_resource(), which returns the expected range.
>
> I enabled debug.rman_debug and saw the following:
>
> rman_reserve_resource_bound: <ACPI I/O memory addresses> request:
> [0xd0000000, 0xdfffffff], length 0x10000000, flags 28736, device (null)
> considering [0xfec00000, 0xffffffff]
> truncated region: [0, 0xdfffffff]; size 0xe0000000 (requested
> 0x10000000)
> candidate region [0, 0xdfffffff], size 0xe0000000
> allocating at the end
>
> This is with subr_rman.c from HEAD.
>
> This expression will overflow:
> if (s->r_start + count - 1 > end) {
> I don't understand the point of this test. Is it an optimization to
> exit the loop early based on an assumption about the ordering of
> elements in the list?
Yes. The elements in an rman list are supposed to be sorted.
> This expression will also overflow:
> rstart = (rstart + amask) & ~amask;
> which is where I think the start address is getting set to 0.
>
>
> After reverting subr_rman.c and applying the following patch, I see:
> considering [0xfec00000, 0xffffffff]
> no unshared regions found
> Then nexus_alloc_resource() gets called and I see
> considering [0x3ff60000, 0xfebfffff]
> and then all the right stuff happens.
>
>
> Index: sys/kern/subr_rman.c
> ===================================================================
> --- sys/kern/subr_rman.c (revision 262226)
> +++ sys/kern/subr_rman.c (working copy)
> @@ -468,11 +468,9 @@
> */
> for (s = r; s; s = TAILQ_NEXT(s, r_link)) {
> DPRINTF(("considering [%#lx, %#lx]\n", s->r_start, s->r_end));
> - if (s->r_start + count - 1 > end) {
> - DPRINTF(("s->r_start (%#lx) + count - 1> end (%#lx)\n",
> - s->r_start, end));
> - break;
> - }
> + if (s->r_end < start + count - 1 ||
> + s->r_start > end - count + 1)
> + continue;
> if (s->r_flags & RF_ALLOCATED) {
> DPRINTF(("region is allocated\n"));
> continue;
>
>
I think this looks good.
> I have no idea why 9.2-STABLE was able to boot on this machine ...
acpi_alloc_resource() works differently in 9.x and later. It only
tries the internal rmans if nexus_alloc_resource() fails. In 8.x it
tries the internal rmans first. That change was made in r216674.
--
John Baldwin
More information about the freebsd-stable
mailing list