Pop quiz: what is wrong with this picture?
top - 17:15:07 up 5 days, 17:35, 1 user, load average: 3.76, 4.97, 3.18 Tasks: 135 total, 2 running, 133 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.5%sy, 0.0%ni, 52.3%id, 47.2%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2097152k total, 2088744k used, 8408k free, 2564k buffers Swap: 1048568k total, 946156k used, 102412k free, 45548k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1507 root 15 0 1987m 1.2g 2824 S 0.3 62.3 1096:21 setroubleshootd
The system was under severe memory pressure. This caused huge amounts of IO-Wait and slowed the system down to a crawl.
According to top, setroubleshootd is using 1.2GB out of a total of 2GB of RAM. This is ridiculous. The daemon basically only checks for SElinux-related audit messages and displays a dialog informing you of the event. Why does it need to use more than 60% of physical memory on my system?
As far as I can tell, there’s no real need to run this daemon. Probably only on a graphical workstation, but not on a server.
chkconfig setroubleshootd off service setroubleshootd stop
Problem solved ;-)
This was not an incident; a quick “ps aux |grep” confirmed that setroubleshoot was using a lot of memory on most systems:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2926 3.0 13.0 545672 317292 ? Ssl Nov11 265:09 /usr/bin/python -E /usr/sbin/setroubleshootd USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 3107 11.9 48.5 1664092 1437740 ? Ssl Nov11 1033:51 /usr/bin/python -E /usr/sbin/setroubleshootd USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1503 13.7 74.7 2524604 1566892 ? Ssl Nov11 1187:44 /usr/bin/python -E /usr/sbin/setroubleshootd
These are all RHEL / CentOS 5.5 systems, 64-bit with latest updates installed…