NetXMS Support Forum

English Support => General Support => Topic started by: pvo on January 22, 2021, 11:15:09 PM

Title: strange Critical interface Status
Post by: pvo on January 22, 2021, 11:15:09 PM
What does the Critical interface status mean?
Title: Re: strange Critical interface Status
Post by: Filipp Sudanov on January 23, 2021, 02:04:58 AM
What device is that? Is it polled by SNMP or agent?
Title: Re: strange Critical interface Status
Post by: pvo on January 23, 2021, 10:49:18 AM
It was reported by and Linux agent 3.7.130 on CentOS 7.
The interface is OpenVPN interface tun0. After agent restart everything was OK.

There were lot of following messages in the agent log

2021.01.22 22:17:14.368 *E* [                   ] Unable to accept incoming connection (24 Too many open files)


When I check the number of open files by agent now and compare it with the number after start it grows.
The most open files are of these types (listed by lsof -p, the name of the server with agent is changed to xxxxx):

nxagentd 2524095 root *360u  IPv4         3489842327      0t0        TCP xxxxx:netxms-agent->192.168.201.1:36936 (CLOSE_WAIT)
nxagentd 2524095 root *361r  FIFO               0,12      0t0 3489847436 pipe


If the agent sends the Status via different TCP connection then the Oper State the above error message in the log can be the reason.

The agent is a Zone proxy and a lot of AgentExecuteActionWithOutput calls are done on the agent. This can be the reason for the open pipes, but not so many actions are started at the same time that it could exceed the maximal number of open files.
The maximal number of open files is set to 65535. I can set a higher value, but it is short term solution only.
Title: Re: strange Critical interface Status
Post by: pvo on January 23, 2021, 11:37:48 AM
Current open files situation on the agent:
2279 connections form the server
12745 pipes
Title: Re: strange Critical interface Status
Post by: pvo on January 23, 2021, 03:24:27 PM
I had to restart the agent once again because there were the same messages in the log.
The problem with interface status didn't occur again therefore it is sure tah the main reason for the strange Status was the agent problem with the open files.
Before restart there were:
80 connections from the server
43694 pipes

I don't understand the number of open open pipes.
Currently (30 minutes after agent restart) there are  4228 open pipes but the number but that number is not only rising but also falling.
What is strange that there were only 214 running processes on the agent at this time therefore it cannot be pipes waiting for the output form the processes started by AgentExecuteActionWithOutput call or the pipes are not closed by the agent.
Title: Re: strange Critical interface Status
Post by: Victor Kirhenshtein on January 28, 2021, 01:45:40 PM
It looks like pipes are not closed ater command execution (maybe if certain conditions are met). We will investigate it further.

Best regards,
Victor
Title: Re: strange Critical interface Status
Post by: pvo on January 29, 2021, 10:32:20 PM
Can I help with some specific logging?
Title: Re: strange Critical interface Status
Post by: Victor Kirhenshtein on February 01, 2021, 07:31:47 PM
I was unable to reproduce this issue so far. What kind of actions and/or external parameters you are using? Can you share your agent configuration file?

Best regards,
Victor
Title: Re: strange Critical interface Status
Post by: pvo on February 01, 2021, 07:39:47 PM
I've attached the agent configuration file.
Title: Re: strange Critical interface Status
Post by: Victor Kirhenshtein on February 01, 2021, 07:54:06 PM
Are you using TCP proxy functionality?
Title: Re: strange Critical interface Status
Post by: pvo on February 01, 2021, 07:59:15 PM
No, Im not (as far as I know). I use SNMP proxy only but it is a Zone proxy therefore I've enabled all proxies.
It is no problem to disable SNMPTrapProxy, SyslogProxy, and TCPProxy.
Title: Re: strange Critical interface Status
Post by: Victor Kirhenshtein on February 01, 2021, 08:05:03 PM
Can you get lsof output before and after action execution and check for possible new entries? And if there will be new entries, please post them.
Title: Re: strange Critical interface Status
Post by: pvo on February 01, 2021, 08:10:36 PM
OK I will do it, but the Actions are used in DCIs therefore it would be better to stop disable all DCIs using the Actions and start the DCI script manually. It takes some time to prepare it.
Title: Re: strange Critical interface Status
Post by: pvo on February 06, 2021, 05:41:56 PM
I've set all Nodes behind the proxy as unmanaged  and disabled all DCIs od the proxy to avoid false results.
Then I've captured the lsof output of the  nxagentd process to a file before the action few seconds after the action and the diff output of the two files is following (server name is changed to xxxxx):
55a56
> nxagentd 100 root   16u  IPv4         1130371273      0t0        TCP xxxxx:netxms-agent->192.168.201.1:51656 (CLOSE_WAIT)
58a60,61
> nxagentd 100 root   20r  FIFO               0,12      0t0 1130371274 pipe
> nxagentd 100 root   21w  FIFO               0,12      0t0 1130371274 pipe


Then I captured the output 1 minute after the action and the lines were still there. 2 minutes after the action all 3 lines have disappeared from the lsof output.
This means that closing the pipes takes some time even if the process on the other side of the pipe is no longer running (checked with ps).
I did the test multiple times, each time with the same result.

The question is whether if a large number of requests come, closing pipes does not take longer a therefore the average number of open pipes is increasing.
Another question is how to modify the configuration to avoid this. CPU and free memory on the server and proxy are OK all the time and the actions DCIs are started every 15 minutes.