[Bug 275594] High CPU usage by arc_prune; analysis and fix
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 23 Feb 2024 19:25:52 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275594 --- Comment #67 from Peter Much <pmc@citylink.dinoex.sub.org> --- So, now I read all the material here. Great work! I had upgraded my deploy engine from 13.2-RELEASE to 13.3-BETA, and found (among some spurious messages from git) that it can no longer build gcc12. There is apparently no problem with rust or llvm15, but trying to build gcc12 does reproducibly crash (10 core, 16081M ram). Apparently the crash happens when gcc fully powers up its LTO for the first time: last pid: 37369; load averages: 9.35, 9.93, 9.27 up 0+03:15:25 07:21:42 417 threads: 14 running, 379 sleeping, 24 waiting CPU: 55.4% user, 0.0% nice, 35.6% system, 0.1% interrupt, 8.8% idle Mem: 7047M Active, 6121M Inact, 2392M Wired, 984M Buf, 60M Free ARC: 518M Total, 45M MFU, 451M MRU, 128K Anon, 3990K Header, 17M Other 467M Compressed, 997M Uncompressed, 2.14:1 Ratio Swap: 15G Total, 15G Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 0 root -8 - 0B 2432K CPU4 4 3:14 99.79% kernel{arc_p 7 root -16 - 0B 48K CPU6 6 2:45 99.79% pagedaemon{d 15 root 52 - 0B 16K CPU0 0 3:00 99.70% vnlru 37334 root 52 0 891M 789M pfault 1 0:37 89.24% lto1 37270 root 52 0 1017M 915M pfault 3 0:43 88.63% lto1 37324 root 52 0 831M 770M pfault 8 0:39 88.59% lto1 37338 root 52 0 843M 785M pfault 2 0:36 88.50% lto1 37333 root 52 0 889M 788M pfault 7 0:37 82.76% lto1 37269 root 52 0 1001M 882M pfault 5 0:42 82.09% lto1 37274 root 52 0 1004M 885M pfault 9 0:42 80.24% lto1 5 root 20 - 0B 1568K t->zth 9 0:02 1.02% zfskern{arc_ 37360 root 20 0 14M 4940K CPU9 9 0:00 0.87% top This is the last output, at this point the system becomes unresponsive, and, when allowed neither to oom-kill nor panic, continues to consume 300% compute. Apparently these are the visible three apocalyptic riders (arc_prune, pagedaemon, vnlru) entertaining themselves. :/ Implementing the patch (i.e. five new git commits from the github repo) solves the issue, and afterwards it looks like this: last pid: 11944; load averages: 7.13, 5.29, 5.77 up 0+03:48:45 16:12:46 424 threads: 19 running, 381 sleeping, 24 waiting CPU: 67.9% user, 0.0% nice, 5.1% system, 0.0% interrupt, 27.0% idle Mem: 9308M Active, 2285M Inact, 20M Laundry, 3643M Wired, 865M Buf, 336M Free eRC: 1638M Total, 855M MFU, 575M MRU, 128K Anon, 11M Header, 198M Other 1305M Compressed, 2980M Uncompressed, 2.28:1 Ratio Swap: 15G Total, 15G Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11579 root 103 0 1269M 1066M CPU6 6 4:09 100.00% lto1 11605 root 103 0 1263M 1052M CPU3 3 4:08 99.87% lto1 11589 root 103 0 1295M 1091M CPU8 8 4:09 99.87% lto1 11599 root 103 0 1259M 1027M CPU9 9 4:08 99.87% lto1 11588 root 103 0 1263M 1035M CPU7 7 4:09 99.87% lto1 11590 root 103 0 1287M 1058M CPU5 5 4:08 99.87% lto1 11598 root 103 0 1311M 1082M CPU1 1 4:08 99.74% lto1 0 root -8 - 0B 2448K - 6 0:03 6.83% kernel{arc_p 5 root -8 - 0B 1568K RUN 9 0:03 5.80% zfskern{arc_ 7 root -16 - 0B 48K psleep 2 0:37 3.11% pagedaemon{d I'm a bit worried the thing is still reluctant to page out, but otherwise this looks good. -- You are receiving this mail because: You are the assignee for the bug.