Recently our OpenNMS GUI has had issues with responsiveness. Symptoms have included a generic "OpenNMS has encountered an error it doesn't know how to handle" error page when accessing, for example, the "All Nodes" page, and when I've viewed top on the server it has shown the java process taking up a lot of CPU. The responsiveness issue has also affected OpenNMS' ability to properly poll the services it's monitoring, and so I've had a lot of false "service down" errors.
ICMP polling doesn't appear to be affected, but service polling (HTTP, SSH, SNMP etc) definitely is. I'm getting a lot of outage events in the logs with the explanation "Too many open files".
Does anyone know what "too many open files" means and how I can track down what's causing the problem? I'm not sure where to even start.
Thanks.
The default values for soft and hard limits can be checked with
The value is per user and each new process inherits these limits. OpenNMS changes the hard limit during the start with the default init script and changes with
If you start OpenNMS you can see the limits for the OpenNMS JVM with
You can see how much file descriptors OpenNMS has allocated with:
If you use
lsofwith the process id of OpenNMS you will see a larger number than in/proc/pid/fdThe reason is memory mapped
.sofiles are listed which don’t count for the configured limits and are listed withlsof.If you want to see how many filesystem handles are used, you can run:
you can see three values:
In hope this helps to investigate your file handle issues.