Re: Git clone failures on armv7, was Re: Git core dump checking out main on armv7

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 28 Jun 2024 06:26:34 UTC
[Example problems can be reproduced in a context not involving
Ethernet or such networking, just system-local activity.]

On Jun 27, 2024, at 15:54, Mark Millard <marklmi@yahoo.com> wrote:

> On Jun 27, 2024, at 14:40, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> On Jun 27, 2024, at 12:15, Warner Losh <imp@bsdimp.com> wrote:
>> 
>> On Thu, Jun 27, 2024 at 10:24 AM bob prohaska <fbsd@www.zefoxnet> wrote:
>>> On Wed, Jun 26, 2024 at 06:24:59PM -0700, Mark Millard wrote:
>>>> 
>>>> Does using the chroot setup using --depth=1 on the
>>>> RPi2B consistently work when tried repeatedly? Or
>>>> was this just an example of a rare success?
>>>> 
>>> Apparently it was a rare success. Five back-to-back retries
>>> all failed in an orderly way, though with different messages;
>>> invalid fetch-pack and missing blob object.
>>> The transcript is at
>>> http://www.zefox.net/~fbsd/rpi2/git_problems/readme_shallow_armv7_chroot_gittests
>>> 
>>> Is it worthwhile to repeat with different --depth= values? I'm not sure
>>> what the sensible range might be, maybe a 1, 2, 5 sequence? It would
>>> be convenient to avoid a panic, as that slows repetition.
>>> 
>>> What happens if you start limiting the memory resources inside a armv7 jail
>>> on a aarch64 machine?
>>> 
>>> Sometimes it works, sometimes it doesn't triggers a "memory shortage" or
>>> "marginal amounts of memory available" bug hunting memories for me.
>> 
>> As I reported in a earlier submittal to the list, I've
>> replicated the problem on an armv7 system running main [15]
>> with RAM+SWAP being:
>> 
>> 2048 MiBytes RAM + 3685 MiBytes SWAP == 5733 MiBytes OVERALL
>> 
>> This was on a Orange Pi+ 2ed. A top variation monitoring and
>> reporting various maximum observed figures did not show any
>> large memory use compared to even 1024 MiBytes. Any limitation
>> would appear to have to be local to some more specific kind
>> of constraint rather than overall system RAM or RAM+SWAP.
>> 
>>> Warner

I should have noted that, in my context, /tmp is not a tmpfs
area. For example:

# df -m
Filesystem         1M-blocks  Used  Avail Capacity  Mounted on
/dev/gpt/BPIM3root    823229 66636 690735     9%    /
devfs                      0     0      0     0%    /dev
/dev/gpt/BPIM3efi        259    86    173    33%    /boot/efi
devfs                      0     0      0     0%    /usr/obj/DESTDIRs/main-armv7-chroot-ports-official/dev

(Also: the activity does not actually involve dev/fd and so it
it is not necessary.)

> FYI:
> 
> So far, doing the likes of "truss -o ~/truss.txt -f -a -H -p 2136"
> towards the end of "Receiving objects" (where 2136 was the original
> git process) has always resulted in a normal completion of the clone.
> 
> comparing/contrasting use of
> 
> (gdb) run clone --depth=1 -o freebsd ssh://anongit@192.158.248.9/src.git /tmp/DOES-NOT-EXIST
> 
> Such still gets the errors:
> 
> (gdb) run clone --depth=1 -o freebsd ssh://anongit@192.158.248.9/src.git /tmp/DOES-NOT-EXIST
> Starting program: /usr/local/bin/git clone --depth=1 -o freebsd ssh://anongit@192.158.248.9/src.git /tmp/DOES-NOT-EXIST
> Cloning into '/tmp/DOES-NOT-EXIST'...
> [Detaching after fork from child process 2172]
> [New LWP 100254 of process 2171]
> [Detaching after fork from child process 2173]
> remote: Enumerating objects: 104642, done.
> remote: Counting objects: 100% (104642/104642), done.
> remote: Compressing objects: 100% (88919/88919), done.
> remote: Total 104642 (delta 22161), reused 43523 (delta 11808), pack-reused 0 (from 0)
> Receiving objects: 100% (104642/104642), 344.50 MiB | 1.11 MiB/s, done.
> [LWP 100254 of process 2171 exited]
> Resolving deltas: 100% (22161/22161), done.
> [Detaching after fork from child process 2176]
> fatal: missing blob object '64981a94f867c4c6f9c4aaa26c1117cc8d85de34'
> fatal: remote did not send all necessary objects
> [Inferior 1 (process 2171) exited with code 0200]

Yet another type of test:

I'll note that the file:///usr/official-src/ notation used
does not automatically involve --local style handling and
so is closer to the remote cloning process than --local
use would be.

Inside a chroot context, with its /usr/official-src/ being via
prior mount_nullfs use, the file:///usr/official-src/ notation
use case gets example failures:

# git clone --depth=1 -o freebsd file:///usr/official-src/ /tmp/DOES-NOT-EXIST
Cloning into '/tmp/DOES-NOT-EXIST'...
remote: Enumerating objects: 102713, done.
remote: Counting objects: 100% (102713/102713), done.
remote: Compressing objects: 100% (88060/88060), done.
Receiving objects: 100% (102713/102713), 342.13 MiB | 1.10 MiB/s, done.
remote: Total 102713 (delta 21887), reused 43790 (delta 10827), pack-reused 0 (from 0)
Resolving deltas: 100% (21887/21887), done.
fatal: missing blob object 'b6f146fd872e8006434a846d2536a98b696b9f09'
fatal: remote did not send all necessary objects

Instead avoiding the mount_nullfs use via duplicating the
repository directory tree into the chroot area before doing
the chroot did not lead to any failure in my testing.

Not involving chroot at all, use of file:///... notation
got no failures, independent of direct vs. mount_nullfs
use.


>>> Thanks for reading,
>>> 
>>> bob prohaska
>>> 
>>> 
>>>>> A second try without chroot resulted in failure but no panic:
>>>> 
>>>>> <jemalloc>: Should own extent_mutex_pool(17)
>>>> 
>>>> That looks like it would be interesting to someone
>>>> appropriately knowledgeable. If jemalloc can see bad
>>>> mutex ownerships, that seems like such could lead to
>>>> all sorts of later problems: Garbage-in/garbage-out.
>>>> 
>>>> I do not know if the message means that various
>>>> corruptions might be in place afterwards so that
>>>> various later problems might be consequences that
>>>> are not surprising possibilities.
>>>> 
>>>>> 47.25 MiB | 1.35 MiB/s  
>>>>> error: index-pack died of signal 6
>>>>> 
>>>>> A repeat session produced an oft-seen failure:
>>>>> 
>>>>> root@www:/mnt # mkdir 3rdarmv7gittest
>>>>> root@www:/mnt # cd 3rdarmv7gittest
>>>>> root@www:/mnt/3rdarmv7gittest # git clone  -o freebsd ssh://anongit@192.158.248.9/src.git .
>>>>> Cloning into '.'...
>>>>> remote: Enumerating objects: 4511481, done.
>>>>> remote: Counting objects: 100% (383480/383480), done.
>>>>> remote: Compressing objects: 100% (28955/28955), done.
>>>> 
>>>>> <jemalloc>: Should own extent_mutex_pool(17)
>>>> 
>>>> That is the same error notice as above that looked
>>>> to be interesting.
>>>> 
>>>> Note that it happens before the later message
>>>> "error: index-pack died of signal 6". So that
>>>> last may just be a later consequence of the
>>>> earlier error(s).
>>>> 
>>>>> 47.25 MiB | 1.35 MiB/s  
>>>>> error: index-pack died of signal 6
>>>>> fatal: index-pack failed
>>>>> root@www:/mnt/3rdarmv7gittest # ls
>>>>> root@www:/mnt/3rdarmv7gittest # cd ..
>>>>> root@www:/mnt # mkdir 4tharmv7gittest
>>>>> root@www:/mnt # cd 4tharmv7gittest
>>>>> root@www:/mnt/4tharmv7gittest # git clone -o freebsd ssh://anongit@192.158.248.9/src.git .
>>>>> Cloning into '.'...
>>>>> remote: Enumerating objects: 4511481, done.
>>>>> remote: Counting objects: 100% (383480/383480), done.
>>>>> remote: Compressing objects: 100% (28955/28955), done.
>>>>> Receiving objects:  43% (1966916/4511481), 926.00 MiB | 626.00 KiB/s 
>>>>> remote: Total 4511481 (delta 377747), reused 354525 (delta 354525), pack-reused 4128001 (from 1)
>>>>> Receiving objects: 100% (4511481/4511481), 1.64 GiB | 705.00 KiB/s, done.
>>>>> fatal: pack is corrupted (SHA1 mismatch)
>>>>> fatal: index-pack failed
>>>> 
>>>> Note the lack of a local message:
>>>> 
>>>> <jemalloc>: Should own extent_mutex_pool
>>>> 
>>>> But the prior jemalloc message(s) may be sufficient
>>>> context to not be surprised about this.
>>>> 
>>>>> root@www:/mnt/4tharmv7gittest # 
>>>>> 
>>>>> No panic, however, and it seems reproducible:
>>>>> root@www:/mnt # mkdir 5tharmv7gittest
>>>>> root@www:/mnt # cd 5tharmv7gittest
>>>>> root@www:/mnt/5tharmv7gittest # git clone -o freebsd ssh://anongit@192.158.248.9/src.git .
>>>>> Cloning into '.'...
>>>>> remote: Enumerating objects: 4511513, done.
>>>>> remote: Counting objects: 100% (383480/383480), done.
>>>>> remote: Compressing objects: 100% (28955/28955), done.
>>>>> remote: Total 4511513 (delta 377756), reused 354525 (delta 354525), pack-reused 4128033 (from 1)
>>>>> Receiving objects: 100% (4511513/4511513), 1.64 GiB | 1.28 MiB/s, done.
>>>>> fatal: pack is corrupted (SHA1 mismatch)
>>>>> fatal: index-pack failed
>>>> 
>>>> Note the lack of a local message:
>>>> 
>>>> <jemalloc>: Should own extent_mutex_pool
>>>> 
>>>> But the prior jemalloc message(s) may be sufficient
>>>> context to not be surprised about this (again).
>>>> 
>>>>> root@www:/mnt/5tharmv7gittest 
>>>>> 
>>>>> Not sure what to try next, thanks for reading this far! 
>>>>> 
>>>>> bob prohaska
>>>>> 
>>>>> 
>>>>> Archived at 
>>>>> http://www.zefox.net/~fbsd/rpi2/git_problems/readme_armv7
>>>> 
>>>> 
> 



===
Mark Millard
marklmi at yahoo.com