powerpc64 head -r314687 (PowerMac G5 so-called "Quad Core", clang based): CAM status: Command timeout (always?)

Mark Millard markmi at dsl-only.net
Tue Mar 7 19:40:50 UTC 2017


On 2017-Mar-7, at 9:15 AM, Mark Johnston <markj at FreeBSD.org> wrote:

> On Mon, Mar 06, 2017 at 08:03:08PM -0800, Mark Millard wrote:
>> On 2017-Mar-6, at 5:02 PM, Mark Johnston <markj at FreeBSD.org> wrote:
>> 
>>> On Mon, Mar 06, 2017 at 02:01:06PM -0800, Mark Millard wrote:
>>>> [scsi_pass.c -r314624 is the problem file vintage of the two files.]
>>>> 
>>>> On 2017-Mar-6, at 10:36 AM, Mark Millard <markmi at dsl-only.net> wrote:
>>>> 
>>>>> On 2017-Mar-6, at 8:43 AM, Mark Johnston <markj at FreeBSD.org> wrote:
>>>>> 
>>>>>> On Mon, Mar 06, 2017 at 02:05:39AM -0800, Mark Millard wrote:
>>>>>>> On 2017-Mar-6, at 1:37 AM, Mark Millard <markmi at dsl-only.net> wrote:
>>>>>>> [...]
>>>>>>> Yep: reverting the two files allowed the PowerMac G5 so-called
>>>>>>> "Quad Core" to boot fully and I could log in.
>>>>>> 
>>>>>> Do you have a full dmesg of the failed boot? Am I correct in thinking
>>>>>> that the boot failed before making it to user mode?
>>>>> 
>>>>> . . .
>>>>>> If so I'm rather
>>>>>> puzzled, as the change should only affect userland applications.
>>>>>> Specifically, it modified a couple of ioctl handlers.
>>>>>> 
>>>>>>> 
>>>>>>> It appears that if such powerpc64 machines are to stay bootable
>>>>>>> then other things need to be cleaned up before the two updated
>>>>>>> files from -r314624 should be used.
>>>>>>> 
>>>>>>> Should the 2 files be reverted until other things are cleaned up?
>>>>>> 
>>>>>> I don't mind reverting the change, but my suspicion is that it uncovered
>>>>>> a problem rather than introducing it. If you're willing to narrow things
>>>>>> down a bit, could you try booting with one of the file modifications and
>>>>>> not the other? They are independent.
>>>>> 
>>>>> In a while I'll try each of the files individually, one old, one modern
>>>>> each time.
>>>> 
>>>> scsi_pass.c -r314624 (new) and cam_xpt.c -r314283 (old): fails.
>>>> 
>>>> cam_xpt.c -r314624 (new) and scsi_pass.c -r308451 (old) : works fine so far.
>>>> 
>>>> Prior results:
>>>> 
>>>> cam_xpt.c and scsi_pass.c both being -r314624 (both new): fails
>>>> 
>>>> cam_xpt.c -r314283 and scsi_pass.c -r308451 (both old): works fine.
>>> 
>>> Thank you. I'm still failing to see how the change is connected with the
>>> symptoms you're seeing. Are you testing with a kernel that has
>>> INVARIANTS and WITNESS configured?
>>> 
>>> I've broken up the scsi_pass.c change into several patches. They are
>>> sequential; can you try testing the result of each patch in the series?
>> 
>> I'm no longer able to reproduce the problem, not even with an
>> "svnlite update -r314687" based build where "svnlite status
>> /usr/src/" does not list ether of the files. This was after
>> trying the patch sequence, which had no failures at any stage.
>> 
>> This suggests some sort of intermittent problem someplace.
>> 
>> At least it fits with your not finding a way for your code
>> update to cause the results that I got.
>> 
>> But finding such an intermittent problem is a pain. I've
>> no clue if/when I'll even see an example again, much less
>> find a way to investigate it if I do. (PowerMac's do not
>> take ddb input early.)
>> 
>> There is the possibility that the recent atomic_fcmpset based
>> locking changes still has some sort of problem, just not seen
>> often. Not easy to find if true.
>> 
>> Anyway I'm now running -r314687 with:
> 
> Indeed, this kind of problem is tricky to track down. A couple of
> thoughts:
> - Were you using the same compiler for all of your tests? I noticed your
>  post yesterday about clang 3.9 vs. 4.0 for powerpc and powerpc64.

All the powerpc64 builds were cross builds from amd64 -r314687 --
and so all are system-clang 4.0 based.

Those notes are because I've been a long-term tester and issue
reporter for clang targeting the powerpc family. I also report
to the llvm bugzilla for this. I have history to compare against
without running new tests for 3.9.1.

> - Was the rest of the source tree (i.e., everything but cam_xpt.c and
>  scsi_pass.c) the same in all of your testing? I've noticed in the past
>  that unrelated changes to the source tree can result in various kernel
>  linker sets having a different order than they would have otherwise,
>  and that can expose or hide bugs. See this recent post for an example:
>  https://lists.freebsd.org/pipermail/freebsd-current/2016-December/064122.html

Yes: the same. In fact I use reproducible builds now and my
2017-Mar-4 /boot/kerc40/* matches my 2017-Mar-6 build at issue
exactly. (This is not a debug-kernel build context.) Booting
kerc40 no longer gets the problem either, which is part of why
I did that diff -r and discovered the exact match for not
having reverted either file.

===
Mark Millard
markmi at dsl-only.net


More information about the freebsd-ppc mailing list