[Bug 258339] ports-mgmt/poudriere: Poudriere host loses network connectivity during bulk run

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 07 Sep 2021 14:00:04 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=258339

            Bug ID: 258339
           Summary: ports-mgmt/poudriere:  Poudriere host loses network
                    connectivity during bulk run
           Product: Ports & Packages
           Version: Latest
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: Individual Port(s)
          Assignee: bdrewery@FreeBSD.org
          Reporter: trix@basement.net
          Assignee: bdrewery@FreeBSD.org
             Flags: maintainer-feedback?(bdrewery@FreeBSD.org)

Overview:  
OS: FreeBSD 13.0-RELEASE-p4 amd64 (root@amd64-builder.daemonology.net) local
ZFS filesystems (root on ZFS)
Poudriere: 3.3.7 (built from Ports)
Hardware: HP Pavilion Desktop 590-p0xxx
  CPU: AMD Ryzen 5 2400G (8) @ 3.593GHz
  RAM: 12GB
    $ grep memory /var/run/dmesg.boot
    real memory  = 12884901888 (12288 MB)
    avail memory = 11296428032 (10773 MB)
    $ 
  NIC: re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet>

/usr/local/etc/poudriere.conf contains NO_ZFS=yes and BASEFS=/opt/poudriere.

/opt/poudriere is an NFS mount (from TruNAS) with options "rw,hard,nfsv3,tcp"
over IPv4.

At a seemingly random point in the bulk run, the OS will report that the NFS
mount has timed out.  This message is repeated.  Console messages indicate that
re0's watchdog has timed out, the interface reports that it has gone down and
come back up.  This series repeats.

ifconfig(8) reports IPv4 and IPv6 addresses.  DHCP address is a reservation,
and should not time out.  Pings sent to local gateway (or any address, really,
report "no route to host" even though 'netstat -rn4' appears normal.

Only fix appears to be a power-cycle.  'shutdown -r now' eventually terminates
due to timeout after signalling all processes.

Closest I've come to pinpointing a failure is large, memory intensive port
builds, like multiple C compilers (llvm _and_ gcc, building at the same time.

-- 
You are receiving this mail because:
You are the assignee for the bug.