Tracking down cause of "too many open files" in OpenNMS

1.1k Views Asked by At

Recently our OpenNMS GUI has had issues with responsiveness. Symptoms have included a generic "OpenNMS has encountered an error it doesn't know how to handle" error page when accessing, for example, the "All Nodes" page, and when I've viewed top on the server it has shown the java process taking up a lot of CPU. The responsiveness issue has also affected OpenNMS' ability to properly poll the services it's monitoring, and so I've had a lot of false "service down" errors.

ICMP polling doesn't appear to be affected, but service polling (HTTP, SSH, SNMP etc) definitely is. I'm getting a lot of outage events in the logs with the explanation "Too many open files".

Does anyone know what "too many open files" means and how I can track down what's causing the problem? I'm not sure where to even start.

Thanks.

2

There are 2 best solutions below

0
indigo On

The default values for soft and hard limits can be checked with

ulimit -a
ulimit -a -H

The value is per user and each new process inherits these limits. OpenNMS changes the hard limit during the start with the default init script and changes with

ulimit -n 20480

If you start OpenNMS you can see the limits for the OpenNMS JVM with

cat /proc/$(cat /var/run/opennms.pid)/limits

You can see how much file descriptors OpenNMS has allocated with:

ls -l /proc/$(cat /var/run/opennms.pid)/fd | wc -l

If you use lsof with the process id of OpenNMS you will see a larger number than in /proc/pid/fd

lsof -p $(cat /var/run/opennms.pid) | wc -l

The reason is memory mapped .so files are listed which don’t count for the configured limits and are listed with lsof.

lsof | grep $(cat /var/run/opennms.pid) | wc -l

If you want to see how many filesystem handles are used, you can run:

cat /proc/sys/fs/file-nr

4128    0   262144

you can see three values:

number of allocated file handles: 4128
number of used file handles:      0
maximum number of file handles:   262144

In hope this helps to investigate your file handle issues.

0
Sriraag On

ohh this is something i have had to face before too check opennms.conf file in /opt/opennms/etc that file should have the MAXFILEDESCRIPTOR LIMIT set for opennms which you can increase this will help you avoid this issue, but since this occurs because of OpenNMS processes trying to open more files than its set limit this issue occurs or that is what i found out from my encounter with this issue.

You can further look into /opt/opennms/logs/ and check output.log and manager.log mostly output.log should have you info on why this is happening

you could also increase verbosity on these logs in /opt/opennms/logs by changing WARN to DEBUG in log4j2.xml in /opt/opennms/etc/

That should provide you with substantial insight on what could be causing this issue.

Hope this helps.