From nobody Fri Jun 25 16:58:59 2021 X-Original-To: jail@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 7F51C11E4754 for ; Fri, 25 Jun 2021 16:59:42 +0000 (UTC) (envelope-from freebsd@grem.de) Received: from mail.evolve.de (mail.evolve.de [213.239.217.29]) (using TLSv1.3 with cipher TLS_CHACHA20_POLY1305_SHA256 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA512 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mail.evolve.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GBNXG0q23z4nc2; Fri, 25 Jun 2021 16:59:41 +0000 (UTC) (envelope-from freebsd@grem.de) Received: by mail.evolve.de (OpenSMTPD) with ESMTP id 1d4592f7; Fri, 25 Jun 2021 16:59:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=grem.de; h=date:from:to:cc :subject:message-id:in-reply-to:references:mime-version :content-type:content-transfer-encoding; s=20180501; bh=t5DAusLi RoRmETuW4pFyBSnQAiw=; b=W0Qz6TO2NRi3AEnMep7vW/sIaN7PrVmsOzOg3CvT sTcPWgcJc+cvVNollS5DdlzxHMmQ2TTeeT+bY10Gx1vJL6mZN3McOPE9Bzt/pdZ9 3V6V9Pl3Uzr4362xro4yBjjFmRrDc8DE7lntXeSEsb5WecbG6WT66uZkIlmKSRJ0 2bUlZjoYI5xvvepNg54rejvzTAJ8zMCQZgc2vabgGPFJH+OJIk/99ug3JDtHsg/p Y8WuU7SUE4uWu0aQyNbLMKQl0v6QCpo9fGCtJp8oYlcunWVHLlAymxtT3HIJtNcx qirGmL1j9/LM+loHJn47QtFDG6Ylgm7X7IANbNq95wLPZQ== DomainKey-Signature: a=rsa-sha1; c=nofws; d=grem.de; h=date:from:to:cc :subject:message-id:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=20180501; b=kf cShyqCc3P9djhYfU+JzgQ81fmOstMiSE+Xe083BHF6qM8mTsJRRUl3nX3C83vVGM wO/I8NZ13Vz8gn37GMPEc+j4nppdXWWRSEdSi5a9dz0L0H4Xc5OmeGJtXmZLuJ/D uPwRsUWjtEIwk/WZqMJPPvuaty5qQqPPHYgXq4FZUUKgxuwinKcslZYyj3xZNwrY Qu14aPGe0SRAweQ/SsrVwBwAxf/j20pvkokEGnfHm+gFbA0oY0iPOviJ9DIVcTUj wznJJnHkmxrTYXMZHdfLF+S+/ZGMzoHzuQr6Q0t7V2baylU81vIap9YC1k0CP0iG 2JaaHWna4V/xtBvBWolw== Received: by mail.evolve.de (OpenSMTPD) with ESMTPSA id 8f57da67 (TLSv1.3:AEAD-CHACHA20-POLY1305-SHA256:256:NO); Fri, 25 Jun 2021 16:59:32 +0000 (UTC) Date: Fri, 25 Jun 2021 18:58:59 +0200 From: Michael Gmelin To: James Gritton Cc: jail@freebsd.org, Michael Gmelin Subject: Re: POSIX shared memory and dying jails Message-ID: <20210625185859.40fead46@bsd64.grem.de> In-Reply-To: <03809b2655a40134dd802386afa6be7d@freebsd.org> References: <20210625164100.73c71055@bsd64.grem.de> <03809b2655a40134dd802386afa6be7d@freebsd.org> X-Face: $wrgCtfdVw_H9WAY?S&9+/F"!41z'L$uo*WzT8miX?kZ~W~Lr5W7v?j0Sde\mwB&/ypo^}> +a'4xMc^^KroE~+v^&^#[B">soBo1y6(TW6#UZiC]o>C6`ej+i Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWJBwe5BQDl LASZU0/LTEWEfHbyj0Txi32+sKrp1Mv944X8/fm1rS+cAAAACXBIWXMAAAsTAAAL EwEAmpwYAAAAB3RJTUUH3wESCxwC7OBhbgAAACFpVFh0Q29tbWVudAAAAAAAQ3Jl YXRlZCB3aXRoIFRoZSBHSU1QbbCXAAAAAghJREFUOMu11DFvEzEUAGCfEhBVFzuq AKkLd0O6VrIQsLXVSZXoWE5N1K3DobBBA9fQpRWc8OkWouaIjedWKiyREOKs+3PY fvalCNjgLVHeF7/3bMtBzV8C/VsQ8tecEgCcDgrzjekwKZ7TwsJZd/ywEKwwP+ZM 8P3drTsAwWn2mpWuDDuYiK1bFs6De0KUUFw0tWxm+D4AIhuuvZqtyWYeO7jQ4Aea 7jUqI+ixhQoHex4WshEvSXdood7stlv4oSuFOC4tqGcr0NjEqXgV4mMJO38nld4+ xKNxRDon7khyKVqY7YR4d+Cg0OMrkWXZOM7YDkEfKiilCn1qYv4mighZiynuHHOA Wq9QJq+BIES7lMFUtcikMnkDGHUoncA+uHgrP0ctIEqfwLHzeSo+eUA66AqzwN6n 2ZHJhw6Qh/PoyC/QENyEyC/AyNjq74Bs+3UH0xYwzDUC4B97HgLocg1QLYgDDO1v f3UX9Y307Ew4AHh67YAFFsxEpkXwpXY3eIgMhAAE3R19L919nNnuD2wlPcDE3UeT L2ytEICQib9BXgS2fU8PrD82ToYO1OEmMSnYTjSqSv9wdC0tPYC+rQRQD9ESnldF CyqfmiYW+tlALt8gH2xrMdC/youbjzPXEun+/ReXsMCDyve3dZc09fn2Oas8oXGc Jj6/fOeK5UmSMPmf/jL+GD8BEj0k/Fn6IO4AAAAASUVORK5CYII= List-Id: Discussion about FreeBSD jail(8) List-Archive: https://lists.freebsd.org/archives/freebsd-jail List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-jail@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4GBNXG0q23z4nc2 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Fri, 25 Jun 2021 09:19:05 -0700 James Gritton wrote: > On 2021-06-25 07:41, Michael Gmelin wrote: > > It seems like non-anonymous POSIX shared memory is not freed > > automatically when a jail is removed and keeps it in a dying state, > > until the shared memory segment is deleted manually. > > > > See below for the most basic example: > > > > [root@jailhost ~]# jail -c path=/ command=/bin/sh > > # posixshmcontrol create /removeme > > # exit > > [root@jailhost ~]# jls -dv -j shmtest dying > > true > > > > So at this point, the jail is stuck in a dying state. > > > > Checking POSIX shared memory segments shows the shared memory > > segment which is stopping the jail from crossing the Styx: > > > > [root@jailhost ~]# posixshmcontrol list > > MODE OWNER GROUP SIZE PATH > > rw------- root wheel 0 /removeme > > > > After removing the shared memory segment manually... > > > > [root@jailhost ~]# posixshmcontrol rm /removeme > > > > the jail passes away peacefully: > > > > [root@jailhost ~]# jls -dv -j shmtest dying > > jls: jail "shmtest" not found > > > > I wonder if it wouldn't make sense to always remove POSIX shared > > memory created by a jail automatically when it's removed. > > That does seem reasonable, though it would take some bookkeeping to do > right. There is currently no concrete idea of a jail's ownership of a > POSIX shm object, as it uses only uid and gid for access permissions, > same as files. The tie to the jail is in the underlying vm_object, > which holds a cred that references the jail - that seems to be what's > keeping the jail from going away. Interesting - I was wondering how that worked, thanks. Would there by a way to cut that tie somehow (for use cases that deliberately want to leave the shared memory segment behind)? > > Like files, POSIX shared memory is one way a jail may communicate with > the rest of the system. So it's theoretically conceivable that shared > memory created by a defunct jail my still be in use by a parent jail, > in the same way that shared memory created by a defunct process is > still visible to other processes, but that may be a rare enough case > to disregard. This could theoretically be controlled by a parameter set on the jail (something like "noposixshmcleanup"), the default being to remove the segments on jail removal. Another problem caused by the lack of jail ownership is that access semantics are a bit strange. E.g., a jail based on / can easily list (and remove) all memory allocations in the system, while for other jails it depends. They can stat their own allocations like in: # posixshmcontrol stat /xyz output as expected... But not list them: # posixshmcontrol ls posixshmcontrol: cannot get kern.ipc.posix_shm_list length: Operation not permitted Probably related to matching the path of the allocation, I didn't look into the code. For practical purposes, we implemented a primitive workaround in the scriptwork stopping jails that simply lists all allocations matching a jail's path and removes them: # Garbage collect POSIX shared memory if command -v posixshmcontrol >/dev/null; then _shm_paths=$( posixshmcontrol ls | cut -f 5 | grep "^$_pdir/" ) for _shm_path in $_shm_paths ; do posixshmcontrol rm "$_shm_path" done fi but having something automatic in the OS would be nice. Or being able to run `posixshmcontrol -j shmtest ls`. Seems like this would be quite some effort though to get it right - also in terms of who can access what - right now, it's simply based on the path, which also gives a lot of flexibility. By the way, this was all triggered by running postgresql in a jail - depending on how it was started (non-persistent/exec.start vs persistent/jexec) it would not clean up after itself when the jail was removed, leading to jails and POSIX shared memory leaking on each jail restart[0]. Probably something about signal handling, but that's material for a different thread :). Best, Michael [0]https://github.com/pizzamig/pot/issues/150 -- Michael Gmelin