Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - pvo

#2
General Support / Re: strange Critical interface Status
February 06, 2021, 05:41:56 PM
I've set all Nodes behind the proxy as unmanaged  and disabled all DCIs od the proxy to avoid false results.
Then I've captured the lsof output of the  nxagentd process to a file before the action few seconds after the action and the diff output of the two files is following (server name is changed to xxxxx):
55a56
> nxagentd 100 root   16u  IPv4         1130371273      0t0        TCP xxxxx:netxms-agent->192.168.201.1:51656 (CLOSE_WAIT)
58a60,61
> nxagentd 100 root   20r  FIFO               0,12      0t0 1130371274 pipe
> nxagentd 100 root   21w  FIFO               0,12      0t0 1130371274 pipe


Then I captured the output 1 minute after the action and the lines were still there. 2 minutes after the action all 3 lines have disappeared from the lsof output.
This means that closing the pipes takes some time even if the process on the other side of the pipe is no longer running (checked with ps).
I did the test multiple times, each time with the same result.

The question is whether if a large number of requests come, closing pipes does not take longer a therefore the average number of open pipes is increasing.
Another question is how to modify the configuration to avoid this. CPU and free memory on the server and proxy are OK all the time and the actions DCIs are started every 15 minutes.

#3
General Support / Re: strange Critical interface Status
February 01, 2021, 08:10:36 PM
OK I will do it, but the Actions are used in DCIs therefore it would be better to stop disable all DCIs using the Actions and start the DCI script manually. It takes some time to prepare it.
#4
General Support / Re: strange Critical interface Status
February 01, 2021, 07:59:15 PM
No, Im not (as far as I know). I use SNMP proxy only but it is a Zone proxy therefore I've enabled all proxies.
It is no problem to disable SNMPTrapProxy, SyslogProxy, and TCPProxy.
#5
General Support / Re: strange Critical interface Status
February 01, 2021, 07:39:47 PM
I've attached the agent configuration file.
#7
General Support / Re: strange Critical interface Status
January 29, 2021, 10:32:20 PM
Can I help with some specific logging?
#8
General Support / Re: strange Critical interface Status
January 23, 2021, 03:24:27 PM
I had to restart the agent once again because there were the same messages in the log.
The problem with interface status didn't occur again therefore it is sure tah the main reason for the strange Status was the agent problem with the open files.
Before restart there were:
80 connections from the server
43694 pipes

I don't understand the number of open open pipes.
Currently (30 minutes after agent restart) there are  4228 open pipes but the number but that number is not only rising but also falling.
What is strange that there were only 214 running processes on the agent at this time therefore it cannot be pipes waiting for the output form the processes started by AgentExecuteActionWithOutput call or the pipes are not closed by the agent.
#9
General Support / Re: strange Critical interface Status
January 23, 2021, 11:37:48 AM
Current open files situation on the agent:
2279 connections form the server
12745 pipes
#10
General Support / Re: strange Critical interface Status
January 23, 2021, 10:49:18 AM
It was reported by and Linux agent 3.7.130 on CentOS 7.
The interface is OpenVPN interface tun0. After agent restart everything was OK.

There were lot of following messages in the agent log

2021.01.22 22:17:14.368 *E* [                   ] Unable to accept incoming connection (24 Too many open files)


When I check the number of open files by agent now and compare it with the number after start it grows.
The most open files are of these types (listed by lsof -p, the name of the server with agent is changed to xxxxx):

nxagentd 2524095 root *360u  IPv4         3489842327      0t0        TCP xxxxx:netxms-agent->192.168.201.1:36936 (CLOSE_WAIT)
nxagentd 2524095 root *361r  FIFO               0,12      0t0 3489847436 pipe


If the agent sends the Status via different TCP connection then the Oper State the above error message in the log can be the reason.

The agent is a Zone proxy and a lot of AgentExecuteActionWithOutput calls are done on the agent. This can be the reason for the open pipes, but not so many actions are started at the same time that it could exceed the maximal number of open files.
The maximal number of open files is set to 65535. I can set a higher value, but it is short term solution only.
#11
General Support / strange Critical interface Status
January 22, 2021, 11:15:09 PM
What does the Critical interface status mean?
#12
Thank you.
According description this setting is common for DNS and SNMP system name. Is it possible to assign SNMP name only or change the name assigned by DNS if the SNMP agent is enabled later?
#13
My server property UseDNSNameForDiscoveredNodes has the value 0  but the Nodes accessible via Zone proxy running on Amazon VPS receive the name like "ip-192-168-213-2.eu-central-1.compute.internal" as the Object name. This name is probably some internal Amazon DNS name.
On Zone proxies which are not running on Amazon VPS the IP address is assigned but there are no such DNS names available.

Is the property UseDNSNameForDiscoveredNodes not working on proxies or did I understand something wrong?
#14
I do notification via following nxsl script:

/*
$1 - Unique ID of event source object
$2 - Email subject
$3 - Email body
*/

node = FindObject($1);
if (node != null) {
    emails = node->getCustomAttribute("notify");
    if (emails != null && emails != "") {
         SendMail(emails, $2, $3);
    }
}


The notify  Custom Attribute contains email addresses separated by ";". The value can be set for each Node differently or as Inheritable for whole Container.
It it is missing no notification is sent.

I solve the different thresholds for different nodes similar way (Custom Attribute and nxsl script checking the threshold).
#15
Did you mean not receive notifications (for example via email) or not see the alarms?

I have solution for the first case which even allows notification to different people for different nodes.