amd64/180885: panic: kmem_map too small at heavy packet traffic
tugrul
h.tugrul.erdogan at gmail.com
Sat Jul 27 07:30:02 UTC 2013
>Number: 180885
>Category: amd64
>Synopsis: panic: kmem_map too small at heavy packet traffic
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: freebsd-amd64
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Sat Jul 27 07:30:01 UTC 2013
>Closed-Date:
>Last-Modified:
>Originator: tugrul
>Release: 10.0/Head
>Organization:
>Environment:
FreeBSD test 10.0-CURRENT FreeBSD 10.0-CURRENT #0: Thu Jul 18 17:07:53 EEST 2013 root at test:/usr/obj/usr/src/sys/GENERIC amd64
>Description:
I am using 10.0-CURRENT on Intel(R) Xeon(R) E5620 with 16GB ram. I am taking
"panic: kmem_malloc(-548663296): kmem_map too small: 539459584 total allocated"
message with configuration below:
[root@ ~]# sysctl vm.kmem_size_min vm.kmem_size_max vm.kmem_size vm.kmem_size_scale
vm.kmem_size_min: 0
vm.kmem_size_max: 329853485875
vm.kmem_size: 16686845952
vm.kmem_size_scale: 1
[root@ ~]# sysctl hw.physmem hw.usermem hw.realmem
hw.physmem: 17151787008
hw.usermem: 8282652672
hw.realmem: 18253611008
[root@ ~]# sysctl hw.pagesize hw.pagesizes hw.availpages
hw.pagesize: 4096
hw.pagesizes: 4096 2097152 0
hw.availpages: 4187448
When I compare vmstat and netstat output of boot time result and subsequent result, the major difference are seemed at:
pf_temp 0 0K - 79309736 128 | pf_temp 1077640 134705K - 84330076 128
and after the panic at the core dump file the major vmstat difference is:
temp 110 15K - 76212305 16,32,64,128,256 | temp 117 6742215K - 655115 16,32,64,128,2
Specifically, I am taking this panic when doing ip spoof attack while syn-proxy activated. The output of system arguments below:
kern.malloc_count: 315
vm.md_malloc_wait: 0
vfs.bufmallocspace: 0
vfs.maxmallocbufspace: 86269952
vm.kmem_size: 16686845952
vm.kmem_size_min: 0
vm.kmem_size_max: 329853485875
vm.kmem_size_scale: 1
vm.kmem_map_size: 543973376
vm.kmem_map_free: 15974895616
kern.maxvnodes: 350097
kern.minvnodes: 87524
vfs.numvnodes: 112329
vfs.wantfreevnodes: 87524
vfs.freevnodes: 87502
[root@ ~]# pfctl -si
No ALTQ support in kernel
ALTQ related functions disabled
Status: Enabled for 0 days 00:17:39 Debug: Urgent
State Table Total Rate
current entries 5142886
searches 26982141 25478.9/s
inserts 29055053 27436.3/s
removals 24218654 22869.4/s
Counters
match 24901305 23514.0/s
bad-offset 0 0.0/s
fragment 0 0.0/s
short 0 0.0/s
normalize 0 0.0/s
memory 0 0.0/s
bad-timestamp 0 0.0/s
congestion 0 0.0/s
ip-option 18 0.0/s
proto-cksum 0 0.0/s
state-mismatch 0 0.0/s
state-insert 0 0.0/s
state-limit 0 0.0/s
src-limit 0 0.0/s
synproxy 29378439 27741.7/s
[root@ ~]# panic: kmem_malloc(-1 814 425 600): kmem_map too small: 543 956 992 to
tal allocated
cpuid = 8
Uptime: 1d18h2m14s
(ada0:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada0:ahcich1:0:0:0): CAM status: CCB request is in progress
(ada0:ahcich1:0:0:0): Error 5, Retries exhausted
(ada0:ahcich1:0:0:0): Synchronize cache failed
(ada1:ahcich2:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada1:ahcich2:0:0:0): CAM status: CCB request is in progress
(ada1:ahcich2:0:0:0): Error 5, Retries exhausted
(ada1:ahcich2:0:0:0): Synchronize cache failed
Dumping 8243 out of 16357 MB:..1%..11%
When I explore the source code of kernel (at vm_kern.c and vm_map.c), I see that the panic can occur with the cases at below:
* negative malloc size parameter
* longer than free buffer respect to kmem_map min_offset and max_offset values
* try to allocate when the root entry of map is the rightmost entry of map
* try to allocate bigger than map's max_free value
I think the panic occurs at mbuf creation process when calling malloc() as a result of couldn't be able to allocate memory; but I don't understand why one of this panic case activating? The memory is almost empty but the device is saying kmem_map small when using about 0.5GB memory purely.
The negative written value is directly malloc's size parameter (in fact after some page size alignment enlargements operation). This parameter has been defined as "unsigned long" but printing with "%ld" as signed long. So if the size is very very big (more than 2^63 at amd64), the signed printing can remark the first bit of size as sign bit then write the - sign. But I think the size can not be so big, for this reason this should be bug and there must be a problem (the size parameter can come as negative or the enlargement functions can destroy the size parameter).
>How-To-Repeat:
At every ip-spoof attack when pf syn-proxy activated. At the same time; at each 2 second interval, I am taking some information with "pfctl -ss" by cron while the attack is continuing. After 15 second the panic occurs.
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-amd64
mailing list