[patch] unprivileged mlock(2)
Alan Cox
alan.l.cox at gmail.com
Wed Aug 29 04:39:12 UTC 2012
On Tue, Aug 28, 2012 at 4:15 PM, Andrey Zonov <zont at freebsd.org> wrote:
> On 8/29/12 12:20 AM, Andriy Gapon wrote:
> > on 28/08/2012 21:07 Bryan Drewery said the following:
> >> On 8/28/2012 11:37 AM, Andrey Zonov wrote:
> >>> Hi,
> >>>
> >>> We've got RLIMIT_MEMLOCK for years, but this limit is useless, because
> >>> only root may call mlock(2), and root may raise any limits.
> >>>
> >>> I suggest patch that allows to call mlock(2) for unprivileged users.
> >>> Are there any objections to got it in tree?
> >>>
> >>
> >> FYI, see previous recent thread on this here:
> >>
> >> http://lists.freebsd.org/pipermail/freebsd-arch/2012-May/012552.html
> >> and
> >> http://lists.freebsd.org/pipermail/freebsd-arch/2012-June/012606.html
> >
> > Yes, Andrey, I highly suggest that you read those threads completely.
> >
> > Here are some observations.
> >
> > It doesn't look like mlockall and mlockall(MCL_FUTURE) in particular
> properly
> > honor RLIMIT_MEMLOCK. If this is not fixed, then it would be premature
> to
> > enable the privilege for non-privileged users.
> >
>
> This should be surely fixed, but I don't know how. Any suggestions are
> welcome.
>
> > I am against adding the sysctl knob. If RLIMIT_MEMLOCK limit is properly
> > implemented then it is sufficient to effectively deny the privilege (and
> with
> > much finer granularity).
> >
>
> Until all bugs around this problem will be fixed, to have such sysctl
> would be nice, and even keep it turned off to not change default
> behavior (not like in patch).
>
> > When the privilege is allowed to ordinary users, then memorylocked in the
> > default login.conf would need to be set to something much lower than the
> current
> > 'unlimited' :-)
> >
>
> It's not a problem to set it to a new reasonable value in the tree, but
> it will be a problem to fix this everywhere.
>
> > Also, note that currently RLIMIT_MEMLOCK is abused at least in vslock()
> (used at
> > least by sysctl's kernel side). The temporary wirings performed as an
> > implementation detail or side-effect should not be accounted against the
> limit.
> > The limit is for wirings that a user makes and controls explicitly. It
> should
> > not be applied to something that kernel does behind the scenes without
> user's
> > knowledge.
> >
>
> I was surprised when I stepped on this few years ago on machine with
> thousands processes. top(8) ate 100% CPU in a forever loop, ps(1)
> didn't work, and that is because memorylocked limit was set to low.
> Later I submitted two patches which fixed kvm (r230873) and sockstat
> (r230874), but I totally agree with you here, we shouldn't check for
> limits in vslock().
>
>
I agree with Andriy's argument for making the following change. Please go
ahead and commit it.
Alan
Index: sys/vm/vm_glue.c
> ===================================================================
> --- sys/vm/vm_glue.c (revision 239772)
> +++ sys/vm/vm_glue.c (working copy)
> @@ -183,7 +183,6 @@ int
> vslock(void *addr, size_t len)
> {
> vm_offset_t end, last, start;
> - unsigned long nsize;
> vm_size_t npages;
> int error;
>
> @@ -195,18 +194,6 @@ vslock(void *addr, size_t len)
> npages = atop(end - start);
> if (npages > vm_page_max_wired)
> return (ENOMEM);
> - PROC_LOCK(curproc);
> - nsize = ptoa(npages +
> - pmap_wired_count(vm_map_pmap(&curproc->p_vmspace->vm_map)));
> - if (nsize > lim_cur(curproc, RLIMIT_MEMLOCK)) {
> - PROC_UNLOCK(curproc);
> - return (ENOMEM);
> - }
> - if (racct_set(curproc, RACCT_MEMLOCK, nsize)) {
> - PROC_UNLOCK(curproc);
> - return (ENOMEM);
> - }
> - PROC_UNLOCK(curproc);
> #if 0
> /*
> * XXX - not yet
> @@ -222,14 +209,6 @@ vslock(void *addr, size_t len)
> #endif
> error = vm_map_wire(&curproc->p_vmspace->vm_map, start, end,
> VM_MAP_WIRE_SYSTEM | VM_MAP_WIRE_NOHOLES);
> -#ifdef RACCT
> - if (error != KERN_SUCCESS) {
> - PROC_LOCK(curproc);
> - racct_set(curproc, RACCT_MEMLOCK,
> -
> ptoa(pmap_wired_count(vm_map_pmap(&curproc->p_vmspace->vm_map))));
> - PROC_UNLOCK(curproc);
> - }
> -#endif
> /*
> * Return EFAULT on error to match copy{in,out}() behaviour
> * rather than returning ENOMEM like mlock() would.
>
> --
> Andrey Zonov
>
>
More information about the freebsd-arch
mailing list