Re: pciconf -lbvV crashes kernel main-8d72c409c

From: Stefan Esser <se_at_FreeBSD.org>
Date: Sun, 06 Feb 2022 11:03:07 UTC
Am 06.02.22 um 01:19 schrieb Michael Jung:
> Dump header from device: /dev/ada0p2
> Architecture: amd64
> Architecture Version: 2
> Dump Length: 900231168
> Blocksize: 512
> Compression: none
> Dumptime: 2022-02-04 15:48:08 -0500
> Hostname: draid.mikej.com
> Magic: FreeBSD Kernel Dump
> Version String: FreeBSD 14.0-CURRENT #1 main-8d72c409c: Thu Feb 3 18:14:01 EST 2022
> mikej@draid:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> Panic String: length mismatch
> Dump Parity: 1692982593
> Bounds: 2
> Dump Status: good

This is caused by the following code fragments:

        /*


         * Calculate the amount of space needed in the data buffer.  An


         * identifier element is always present followed by the read-only


         * and read-write keywords.


         */
        len = sizeof(struct pci_vpd_element) + strlen(vpd->vpd_ident);
        for (i = 0; i < vpd->vpd_rocnt; i++)
                len += sizeof(struct pci_vpd_element) + vpd->vpd_ros[i].len;
        for (i = 0; i < vpd->vpd_wcnt; i++)
                len += sizeof(struct pci_vpd_element) + vpd->vpd_w[i].len;
[...]
        vpd_user = lvio->plvi_data;
[...]
	vpd_user = PVE_NEXT_LEN(vpd_user, vpd_element.pve_datalen);
        vpd_element.pve_flags = 0;
        for (i = 0; i < vpd->vpd_rocnt; i++) {
                vpd_element.pve_keyword[0] = vpd->vpd_ros[i].keyword[0];
                vpd_element.pve_keyword[1] = vpd->vpd_ros[i].keyword[1];
		vpd_element.pve_datalen = vpd->vpd_ros[i].len;
                error = copyout(&vpd_element, vpd_user, sizeof(vpd_element));
                if (error)
                        return (error);
		error = copyout(vpd->vpd_ros[i].value, vpd_user->pve_data,
                    vpd->vpd_ros[i].len);
                if (error)
                        return (error);
                vpd_user = PVE_NEXT_LEN(vpd_user, vpd_element.pve_datalen);
        }
        vpd_element.pve_flags = PVE_FLAG_RW;
        for (i = 0; i < vpd->vpd_wcnt; i++) {
                vpd_element.pve_keyword[0] = vpd->vpd_w[i].keyword[0];
                vpd_element.pve_keyword[1] = vpd->vpd_w[i].keyword[1];
                vpd_element.pve_datalen = vpd->vpd_w[i].len;
                error = copyout(&vpd_element, vpd_user, sizeof(vpd_element));
                if (error)
                        return (error);
                error = copyout(vpd->vpd_w[i].value, vpd_user->pve_data,
                    vpd->vpd_w[i].len);
                if (error)
                        return (error);
                vpd_user = PVE_NEXT_LEN(vpd_user, vpd_element.pve_datalen);
        }
        KASSERT((char *)vpd_user - (char *)lvio->plvi_data == len,
            ("length mismatch"));

The KASSERT triggered, indicating that a different amount of data has been
fetched than has previously been calculated.

It would be interesting to compare the pre-computed "len" and the actual
amount of data (i.e. the operands of == in the KASSERT).

The definition of PVE_NEXT_LEN looks correct, but in order to completely
understand what the issue is, a dump of the VPD range should be analyzed
(or you could add trace output to both the calculation of "len" and to
the fetching of the VPD data that advances vpd_user).

Regards, STefan

PS: You may want to build a kernel with the attached patch, which prints
    the calculated lengths after each element that is added to "len".
    The KASSERT will only trigger if the actual length exceeds the expected
    value, and the printf() output should go to the console device.
    My system does not seem to have a single device that provides VPD,
    therefore the patch has only been compile tested ...