svn commit: r308691 - in head/sys: cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs fs/tmpfs kern vm
Konstantin Belousov
kostikbel at gmail.com
Fri Nov 18 10:37:40 UTC 2016
On Fri, Nov 18, 2016 at 10:22:35AM +0000, Ruslan Bukin wrote:
> On Thu, Nov 17, 2016 at 10:51:40AM -0600, Alan Cox wrote:
> > On 11/16/2016 11:52, Ruslan Bukin wrote:
> > > On Wed, Nov 16, 2016 at 04:59:39PM +0000, Ruslan Bukin wrote:
> > >> On Wed, Nov 16, 2016 at 06:53:43PM +0200, Konstantin Belousov wrote:
> > >>> On Wed, Nov 16, 2016 at 01:37:18PM +0000, Ruslan Bukin wrote:
> > >>>> I have a panic with this on RISC-V. Any ideas ?
> > >>> How did you checked that the revision you replied to, makes the problem ?
> > >>> Note that the backtrace below is not reasonable.
> > >> I reverted this commit like that and rebuilt kernel:
> > >> git show 2fa36073055134deb2df39c7ca46264cfc313d77 | patch -p1 -R
> > >>
> > >> So the problem is reproducible on dual-core with 32mb mdroot.
> > >>
> > > I just found another interesting behavior:
> > > depending on amount of physical memory :
> > > 700m - panic
> > > 800m - works fine
> > > 1024m - panic
> >
> > I think that this behavior is not inconsistent with your report of the
> > system crashing if you enabled two cores but not one. Specifically,
> > changing the number of active cores will slightly affect the amount of
> > memory that is allocated during initialization.
> >
> > There is nothing unusual in the sysctl output that you sent out.
> >
> > I have two suggestions. Try these in order.
> >
> > 1. r308691 reduced the size of struct vm_object. Try undoing the one
> > snippet that reduced the vm object size and see if that makes a difference.
> >
> >
> > @@ -118,7 +118,6 @@
> > vm_ooffset_t backing_object_offset;/* Offset in backing object */
> > TAILQ_ENTRY(vm_object) pager_object_list; /* list of all objects of this pager type */
> > LIST_HEAD(, vm_reserv) rvq; /* list of reservations */
> > - struct vm_radix cache; /* (o + f) root of the cache page radix trie */
> > void *handle;
> > union {
> > /*
> >
> >
> > 2. I'd like to know if vm_page_scan_contig() is being called.
> >
> > Finally, to simply the situation a little, I would suggest that you
> > disable superpage reservations in vmparam.h. You have no need for them.
> >
> >
>
> I made another one merge from svn-head and problem disappeared for 700m,1024m of physical memory, but now I able to reproduce it with 900m of physical memory.
>
> Restoring 'struct vm_radix cache' in struct vm_object gives no behavior changes.
>
> Adding a panic() call to vm_page_scan_contig gives an original panic (so vm_page_scan_contig is not called),
> it looks like size of function is changed and it unhides the original problem.
>
> Disable superpage reservations changes behavior and gives same panic on 1024m boot.
>
> Finally, if I comment ruxagg call in kern_resource then I can't reproduce the problem any more with any amount of memory in any setup:
>
> --- a/sys/kern/kern_resource.c
> +++ b/sys/kern/kern_resource.c
> @@ -1063,7 +1063,7 @@ rufetch(struct proc *p, struct rusage *ru)
> *ru = p->p_ru;
> if (p->p_numthreads > 0) {
> FOREACH_THREAD_IN_PROC(p, td) {
> - ruxagg(p, td);
> + //ruxagg(p, td);
> rucollect(ru, &td->td_ru);
> }
> }
>
> I found this patch in my early RISC-V development directory, so it looks the problem persist whole the freebsd/riscv life, but was hidden until now.
>
If you comment out the rufetch() call in proc0_post(), does the problem go
away as well ?
I suggest to start with fixing the backtrace anyway, because the backtrace
you posted is wrong.
More information about the svn-src-head
mailing list