From nobody Tue Jan 18 02:07:43 2022 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id A0643195B56B; Tue, 18 Jan 2022 02:07:52 +0000 (UTC) (envelope-from glebius@freebsd.org) Received: from cell.glebi.us (glebi.us [162.251.186.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "cell.glebi.us", Issuer "cell.glebi.us" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JdBxg4x03z4nGs; Tue, 18 Jan 2022 02:07:51 +0000 (UTC) (envelope-from glebius@freebsd.org) Received: from cell.glebi.us (localhost [127.0.0.1]) by cell.glebi.us (8.16.1/8.16.1) with ESMTPS id 20I27hca089300 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Mon, 17 Jan 2022 18:07:43 -0800 (PST) (envelope-from glebius@freebsd.org) Received: (from glebius@localhost) by cell.glebi.us (8.16.1/8.16.1/Submit) id 20I27hcm089299; Mon, 17 Jan 2022 18:07:43 -0800 (PST) (envelope-from glebius@freebsd.org) X-Authentication-Warning: cell.glebi.us: glebius set sender to glebius@freebsd.org using -f Date: Mon, 17 Jan 2022 18:07:43 -0800 From: Gleb Smirnoff To: dev-commits-src-main@freebsd.org, current@freebsd.org Cc: bz@freebsd.org, zec@freebsd.org Subject: netinet & netpfil tests failing Message-ID: List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspamd-Queue-Id: 4JdBxg4x03z4nGs X-Spamd-Bar: ++ Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=softfail (mx1.freebsd.org: 162.251.186.162 is neither permitted nor denied by domain of glebius@freebsd.org) smtp.mailfrom=glebius@freebsd.org X-Spamd-Result: default: False [2.90 / 15.00]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; FREEFALL_USER(0.00)[glebius]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_SPAM_SHORT(1.00)[1.000]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; TO_DN_NONE(0.00)[]; R_SPF_SOFTFAIL(0.00)[~all]; DMARC_NA(0.00)[freebsd.org]; NEURAL_SPAM_MEDIUM(1.00)[1.000]; NEURAL_SPAM_LONG(1.00)[1.000]; MLMMJ_DEST(0.00)[dev-commits-src-main,current]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:27348, ipnet:162.251.186.0/24, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N Hi, just to remind that I'm responsible for multiple tests failing and to refresh the context, kind of explaining why the hell they aren't fixed yet?! The long old discussion can be found in this thread in December, link to last message: https://lists.freebsd.org/archives/dev-commits-src-main/2021-December/002581.html Summarized refreshed context follows. The reason for failing tests is complex. There a constellation of factors [bugs] that attribute to it: * Jails are reference counted and jail destroy may be delayed. Test suite usually didn't trigger delayed jail destroy and expectation of many tests is that immediately after 'jail -r' all resources are released, especially network interfaces in a jail are if_vmove()'d to vnet0. * My original change to inpcb database protection ignored the fact that inp->inp_cred->cr_prison is dereferenced and read during a fast pcb lookup. The prison doesn't have neither network epoch nor SMR protection. That was a bug and to fix it me & Mark decided that an elegant idea would be to delay crfree() when a pcb is destroyed from immediate call to SMR-delayed destructor. This fixed the race, but created another bug. Since every vnet had its own pcb zone, a dying jail won't ever free its resources, it will stay forever. This was mitigated by making the pcb zone global. Now pcbs are correctly recycled, but there is no guarantee that upon return from 'jail -r' the jail is already fully cleared. * Back to tests. As tests expect 'jail -r' to immediately free resources. Right after 'jail -r' tests do 'ifconfig ${ifname} destroy', where ifname is the interface that was just popped up back to vnet0 from the destroyed jail. Now this 'ifconfig destroy' fails, but test suite ignores this error. A test succeeds. However, some time later, usually after other tests, the jail is indeed destroyed and surprise interfaces out of nowhere pop up at vnet0. Of course this is definite memory&resource leak, but not the reason why tests are failing. * Another factor - scapy. The python scapy library would emit warning to stderr if it sees interface without any IP address. This happens right at 'import scapy'. The test suite considers a test failed if it has something on stderr, even if it returned success. So, result is that some test (absolutely unrelated to pcbs) leaves a jail with interfaces, then jail is released, interfaced pop up at vnet0, and then some other test (absolutely unrelated to pcbs) using scapy writes a warning to stderr and triggers failure. My & Mark are now seeing three approaches to the problem: * Reclaim the memory from pcb zone(s), when jail is destroyed, returning back the old behaviour that with test suites 'jail -r' is always synchronous. Some prerequisites for this approach are here: https://reviews.freebsd.org/D33868 * Protect jails with epoch, bypass the cred pointer in inpcb and in the lookup check inp->inp_prison->pr_foo. After that the crfree() can be moved back to the immediate inpcb free procedure. Mark has a quick & dirty proof of concept for this approach. * In the test suite destroy the interface from the jail: 'jexec jname ifconfig ${ifname} destroy'. I'd like to add a few words on the last option. To me it seems most elegant as we are improving the test suite instead of changing kernel to meet demands of the suite. However, it doesn't work :( Why? Why does 'jexec jname ifconfig epair0b destroy' or 'jexec jname ifconfig lo1 destroy' returns ENXIO? Because the interface was created within vnet0 and is linked on vnet0 cloner's list. To repeat: epair0b ifnet is linked to the jail's list of network interfaces, but it linked on vnet0 list of epair(4) ifcloner. Likewise, some lo4 interface would also be in the jail list of interfaces, but on vnet0 if_loop cloner. This makes it impossible to destroy such interface from inside the jail. Neither it is possible to destroy it from the outside, for obvious reasons. There are more side effects about this. For example the only reason why we can't create an interface with the same name inside a jail using its cloner list is call to ifunit() in the beginning of if_clone_createif(). This definitely is a part of design, since if_clone_create()/if_clone_destroy() would lookup vnet0 cloner list in case if interface is not found on the current vnet list. To put it short, it is yet another problem created by if_vmove :( Not an easy one to fix and makes the third approach to the problem complicated. To sum up: I'm sorry for tests broken, I'm working on it, it isn't easy problem. Suggestions and help are welcome. -- Gleb Smirnoff