Pop quiz: what is wrong with this picture?

     top - 17:15:07 up 5 days, 17:35,  1 user,  load average: 3.76, 4.97, 3.18
     Tasks: 135 total,   2 running, 133 sleeping,   0 stopped,   0 zombie
     Cpu(s):  0.0%us,  0.5%sy,  0.0%ni, 52.3%id, 47.2%wa,  0.0%hi,  0.0%si,  0.0%st
     Mem:   2097152k total,  2088744k used,     8408k free,     2564k buffers
     Swap:  1048568k total,   946156k used,   102412k free,    45548k cached

     PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
     1507 root      15   0 1987m 1.2g 2824 S  0.3 62.3   1096:21 setroubleshootd

The system was under severe memory pressure. This caused huge amounts of IO-Wait and slowed the system down to a crawl.

According to top, setroubleshootd is using 1.2GB out of a total of 2GB of RAM. This is ridiculous. The daemon basically only checks for SElinux-related audit messages and displays a dialog informing you of the event. Why does it need to use more than 60% of physical memory on my system?

As far as I can tell, there’s no real need to run this daemon. Probably only on a graphical workstation, but not on a server.

     chkconfig setroubleshootd off
     service setroubleshootd stop

Problem solved ;-)

This was not an incident; a quick “ps aux grep” confirmed that setroubleshoot was using a lot of memory on most systems:
     USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
     root      2926  3.0 13.0 545672 317292 ?       Ssl  Nov11 265:09 /usr/bin/python -E /usr/sbin/setroubleshootd

     USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
     root      3107 11.9 48.5 1664092 1437740 ?     Ssl  Nov11 1033:51 /usr/bin/python -E /usr/sbin/setroubleshootd

     USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
     root      1503 13.7 74.7 2524604 1566892 ?     Ssl  Nov11 1187:44 /usr/bin/python -E /usr/sbin/setroubleshootd

These are all RHEL / CentOS 5.5 systems, 64-bit with latest updates installed…

Updated: