application coredump behavior differences between FreeBSD 7.0andFreeBSD 10.1
Gavin Mu
gavin.mu at qq.com
Sun Dec 6 08:55:00 UTC 2015
Hi, kib,
It is really related with madvise behavior, I checked code related with MADV_SEQUENTIAL, and it seems there is something wrong with vm_fault() of FreeBSD 10.1.
I did a simple patch:
diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index b5ac58f..135fc67 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -966,6 +966,8 @@ vnode_locked:
*/
if (hardfault)
fs.entry->next_read = fs.pindex + faultcount - reqpage;
+ else
+ fs.entry->next_read = fs.pindex + 1;
vm_fault_dirty(fs.entry, fs.m, prot, fault_type, fault_flags, TRUE);
vm_page_assert_xbusied(fs.m);
without this next_read will not be updated and keeps zero in my testing. I think here next_read should be updated to be pindex + 1. Is my understanding correct? thanks.
Regards,
Gavin Mu
------------------ Original ------------------
From: "Gavin Mu";<gavin.mu at qq.com>;
Date: Sun, Dec 6, 2015 08:14 AM
To: "Konstantin Belousov"<kostikbel at gmail.com>;
Cc: "freebsd-stable"<freebsd-stable at freebsd.org>;
Subject: Re: application coredump behavior differences between FreeBSD 7.0andFreeBSD 10.1
Hi, kib,
It does not help.
I added:
ret = madvise(shm_handle, size * 1024 * 1024 * 1024, MADV_SEQUENTIAL);
if (ret != 0) {
printf("madvise return %d\n", ret);
}
top displays it still uses full memory, below is a snapshot during core dump.
last pid: 3656; load averages: 1.84, 1.29, 1.04 up 0+00:18:06 23:58:37
43 processes: 2 running, 41 sleeping
CPU: 1.2% user, 0.0% nice, 85.2% system, 7.8% interrupt, 5.9% idle
Mem: 924M Active, 57M Inact, 745M Wired, 8980K Cache, 103M Buf, 34M Free
Swap: 4096M Total, 188M Used, 3908M Free, 4% Inuse
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
3646 root 1 84 0 1036M 710M RUN 0:13 42.29% tt
Regards,
Gavin Mu
------------------ Original ------------------
From: "Konstantin Belousov";<kostikbel at gmail.com>;
Date: Sat, Dec 5, 2015 10:24 PM
To: "Gavin Mu"<gavin.mu at qq.com>;
Cc: "freebsd-stable"<freebsd-stable at freebsd.org>;
Subject: Re: application coredump behavior differences between FreeBSD 7.0andFreeBSD 10.1
On Sat, Dec 05, 2015 at 01:09:31PM +0800, Gavin Mu wrote:
> Hi, kib,
>
>
> Please see my testing on FreeBSD 7.0.
> freebsd7# sysctl kern.ipc.shmall
> kern.ipc.shmall: 819200
> freebsd7# sysctl kern.ipc.shmmax
> kern.ipc.shmmax: 3355443200
> freebsd7# uname -a
> FreeBSD freebsd7.localdomain 7.0-RELEASE FreeBSD 7.0-RELEASE #0: Sun Feb 24 10:35:36 UTC 2008 root at driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
>
>
>
> testing code:
> freebsd7# cat tt.c
> #include <stdio.h>
> #include <stdlib.h>
> #include <machine/param.h>
> #include <sys/types.h>
> #include <sys/ipc.h>
> #include <sys/shm.h>
>
>
> int
> main(int argc, char **argv)
> {
> char **p;
> int size;
> int i;
> char *c = NULL;
> int shmid;
> void *shm_handle;
> size = atoi(argv[1]);
> printf("will alloc %dGB\n", size);
>
>
> shmid = shmget(100, size * 1024 * 1024 * 1024, 0644 | IPC_CREAT);
> if (shmid == -1) {
> printf("shmid = %d\n", shmid);
> }
>
>
> shm_handle = shmat(shmid, NULL, 0);
(shm_handle is not a handle).
> if (shm_handle == -1) {
> printf("null shm_handle\n");
> }
>
What if you add
madvise(shm_handle, size, MADV_SEQUENTIAL);
there ? Does 10.x behaviour become similar to that of the 7.x ?
>
> *c = 0;
> return 0;
> }
>
>
>
> freebsd7# ./a.out 1
> will alloc 1GB
> Segmentation fault (core dumped)
>
>
>
> when a.out is running, the RES keeps being 2024K without increasing:
>
>
> last pid: 735; load averages: 0.00, 0.01, 0.03 up 0+00:15:11 04:43:35
> 25 processes: 1 running, 24 sleeping
> CPU states: 0.0% user, 0.0% nice, 22.6% system, 0.8% interrupt, 76.7% idle
> Mem: 13M Active, 6380K Inact, 52M Wired, 32K Cache, 39M Buf, 910M Free
> Swap: 2015M Total, 2015M Free
>
>
> PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
> 734 root 1 -16 0 1027M 2024K wdrain 0:02 13.27% a.out
>
>
>
> but when same code is running on FreeBSD 10.1, the RES keeps increasing to 1GB. From my testing, if the memory is allocated by malloc(), then RES will keep increasing in both 7.0 and 10.1. only sysv_shm in 7.0 has different behavior. I have checked coredump() code but did not find any clue why it is different.
>
>
> Regards,
> Gavin Mu
>
>
> ------------------ Original ------------------
> From: "Konstantin Belousov";<kostikbel at gmail.com>;
> Date: Fri, Dec 4, 2015 05:45 PM
> To: "Gavin Mu"<gavin.mu at qq.com>;
> Cc: "freebsd-stable"<freebsd-stable at freebsd.org>;
> Subject: Re: application coredump behavior differences between FreeBSD 7.0and FreeBSD 10.1
>
>
>
> On Fri, Dec 04, 2015 at 09:35:54AM +0800, Gavin Mu wrote:
> > Hi,
> >
> > We have an application running on old FreeBSD 7.0, and we are upgrading the base system to FreeBSD 10.1. The application uses sysv_shm, and will allocate a lot of share memory, though most of time only a part of the allocated memory is used. aka. large SIZE and small RES from /usr/bin/top view.
> >
> > When the application core dump, the core dump file will be large, and in FreeBSD 7.0, it uses only a little more memory to do core dump, but in FreeBSD 10.1, it seems all share memory are touched and uses a lot of physical memory (RES in /usr/bin/top output will increase very much) and cause memory drain.
> >
> > I have been debugging but can not find any clue yet. Could someone provide some points where the issue happen? Thanks.
>
> Both stable/7 and latest HEAD do read the whole mapped segment to write
> the coredump. This behaviour did not changed, since probably introduction
> of the ELF support into FreeBSD. And, how otherwise could coredump file
> contain the content of the mapped segments ?
>
> What in the FreeBSD 10 changed in this regard, is a deadlock fix which
> could occur in some scenarious, including the coredumping. In stable/7,
> the page instantiation or swap-in for pages accessed by the core write,
> was done while owning several VFS locks. This sometimes caused deadlock.
> In stable/10 the deadlock avoidance code is enabled by default, and
> when kernel detects the possibility of the deadlock, it changes to reading
> carefully by small chunks.
>
> Still, this does not explain the effect that you describe. In fact, I
> am more suspicious to the claim that stable/7 did not increase RSS of
> the dumping process or did not accessed the whole mapped shared segment,
> then the claim that there is a regression in stable/10.
More information about the freebsd-stable
mailing list