From nobody Mon Aug 02 20:38:54 2021 X-Original-To: jail@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 1EEC212BCF7B for ; Mon, 2 Aug 2021 20:39:05 +0000 (UTC) (envelope-from freebsd@grem.de) Received: from mail.evolve.de (mail.evolve.de [213.239.217.29]) (using TLSv1.3 with cipher TLS_CHACHA20_POLY1305_SHA256 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA512 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mail.evolve.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Gdqbr5BKpz3Gm5; Mon, 2 Aug 2021 20:39:04 +0000 (UTC) (envelope-from freebsd@grem.de) Received: by mail.evolve.de (OpenSMTPD) with ESMTP id b5cd62cc; Mon, 2 Aug 2021 20:38:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=grem.de; h=content-type :content-transfer-encoding:mime-version:subject:from:in-reply-to :date:cc:message-id:references:to; s=20180501; bh=RjfMQ9ZSSoTygL CnVKqm0Dfud4M=; b=cXl+ggoyulzmtN5hhk+t8QOqNyjUqfR9wV/F+9FDZQz9CA U5l+CabntfcSBV3CfMBNw9HUuFTBmxHfKi2dw4sN3tKnCMjCJaN7Pdk1DprgqiN2 oTI1OC3rC3H29IRLSruuzY300xNSbsXJbxWv8POHCUbpqm0J/GC6V2fFTFgOD6Va tsOv6Vm/QWqX3M0BOkzrf7ydOduAvMWMczI5rEOHLfAm929yit2vhYQwVeZAlfwR qU7ykizBoIa20bSPTDMZ5/mFz8sLqy5J4opsoDAGIepDRCXfeLAddypDPRFCSxcX mMflAICY4OVQBz+nnWWhSALVBUS/vSt96w/LWYSQ== DomainKey-Signature: a=rsa-sha1; c=nofws; d=grem.de; h=content-type :content-transfer-encoding:mime-version:subject:from:in-reply-to :date:cc:message-id:references:to; q=dns; s=20180501; b=MTaglQ8l fabzJvhrGenVsPJ+BngkRrUA3QtBD3fGgSJ14bdsfsTL9GS3f7Agn75aAjrQq0HS hsreBDiHN49b5tfL3+GqtO1CsDcRrTIvyvdWhlsb8GjltGmpIIvJ1pT6Vx2dIdE9 HxLyLul/TktLxT22WHIlQy42IcjCOE5WatjGuqnYCfETlFVGS4WU5a5gff/ERHQT p+n57RWArBY2NnXIvRU2bPrr2INvoa7isLdXJ1JMKNbbEw67siILhQW/lm1JAAEK LPW6tUe3Ttal8q1CjPDmlkJZRaMNBZnr1Jj5frsIcJ1LfoFvIDZNNPRjSosx6QZc bllQE9zU+hCxSA== Received: by mail.evolve.de (OpenSMTPD) with ESMTPSA id 90364455 (TLSv1.3:AEAD-CHACHA20-POLY1305-SHA256:256:NO); Mon, 2 Aug 2021 20:38:55 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: Discussion about FreeBSD jail(8) List-Archive: https://lists.freebsd.org/archives/freebsd-jail List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-jail@freebsd.org Mime-Version: 1.0 (1.0) Subject: Re: POSIX shared memory, jails, and (lack of) limits From: Michael Gmelin In-Reply-To: Date: Mon, 2 Aug 2021 22:38:54 +0200 Cc: Konstantin Belousov , jail@freebsd.org Message-Id: <26D98CA9-B4ED-4BCB-935D-1EB8EBDA8F5D@grem.de> References: To: Mark Johnston X-Mailer: iPhone Mail (18F72) X-Rspamd-Queue-Id: 4Gdqbr5BKpz3Gm5 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-Spam: Yes X-ThisMailContainsUnwantedMimeParts: N > On 2. Aug 2021, at 21:40, Mark Johnston wrote: >=20 > =EF=BB=BFOn Mon, Aug 02, 2021 at 10:03:27PM +0300, Konstantin Belousov wro= te: >>> On Mon, Aug 02, 2021 at 05:06:43PM +0200, Michael Gmelin wrote: >>>=20 >>>=20 >>>> On 2. Aug 2021, at 15:56, Konstantin Belousov wro= te: >>>>=20 >>>> =EF=BB=BFOn Mon, Aug 02, 2021 at 02:19:00PM +0200, Michael Gmelin wrote= : >>>>> Hi, >>>>>=20 >>>>> I've been playing a bit with POSIX shared memory and, unlike for SysV >>>>> shared memory, I couldn't find any way to limit its use by jails. >>>>>=20 >>>>> First, I looked at racct/rctl, but there is no resource for POSIX shar= ed >>>>> memory and memoryuse/vmemoryuse don't seem to have an effect (which >>>>> makes sense). >=20 > Cyril has written a few patches for racct, including one which includes > POSIX shared memory objects in rctl's "nshm" and "shmsize" resources, > which currently only apply to SysV shm objects: > https://reviews.freebsd.org/D30775 > We plan to get them committed in the next couple of weeks. >=20 > "memoryuse" and "vmemoryuse" only count objects that are mapped into > some process' address space, so they're not the right way to limit > allocations of POSIX shm objects, see below. >=20 >>>>>=20 >>>>> Then I checked if there are jail parameters that could help, but there= >>>>> doesn't seem to be anything like "allow.sysvshm" for POSIX shared >>>>> memory to limit access to the feature. >>>>>=20 >>>>> So, unless I'm missing something, it seems like all jails on a system >>>>> have unlimited access to POSIX shared memory and therefore any single >>>>> jail can use up the jailhost's virtual memory until the jailhost comes= >>>>> to a grinding halt. >>>>>=20 >>>>> I wrote a little test program that keeps allocating POSIX shared memor= y >>>>> inside of a jail and it can easily bring the host down to its knees: >>>>>=20 >>>>> login: Aug 2 12:12:09 test kernel: pid 11825 (getty), jid 0, uid 0, >>>>> was killed: out of swap space >>>>> Aug 2 12:12:10 test init[11827]: getty repeating too quickly on port >>>>> /dev/ttyu0, sleeping 30 secs >>>>> Aug 2 12:12:10 test kernel: pid 11826 (getty), jid 0, uid 0, was >>>>> killed: out of swap space >>>>=20 >>>> Posix shm is limited by the swap accounting. For non-jail consumers, >>>> it is per-uid RLIMIT_SWAP. I do not know if other mechanisms make >>>> RLIMIT_SWAP per-jail per-uid. >=20 > racct/rctl provides the "swapuse" resource which should account for > this. It does not apply to largepage objects, though. I tried to limit swapuse for a jail and it doesn=E2=80=99t limit posix share= d memory created within the jail (I can still create shared memory segments w= ithin the jail until the machine runs out of virtual memory). Should I share the test case to make sure I didn=E2=80=99t mess up? -m >=20 >>> Unfortunately it seems like POSIX shared memory is not linked to the jai= l it was created in (we discussed this on this list in June and I created a f= ew PRs about that), so per jail rctl rules don=E2=80=99t apply (and limiting= uid 0 won=E2=80=99t have the desired effect ^_^). >>>=20 >>=20 >> In what sense 'not linked'? The backing vm_object is created with the >> current process credentials, which are jailed if creator belongs to a jai= l. >=20 > I believe the problem that Michael is referring to is that named POSIX > shm objects created within a jail do not disappear when the jail is > destroyed, and the vm object cred reference is leaked. But this is > unrelated to swap space accounting.