SYS_AGENT_UNREACHABLE

Started by Nomis, January 15, 2016, 03:09:55 PM

Previous topic - Next topic

Nomis

Hello!

One of the servers we're monitoring (Windows) has begun to send alarm SYS_AGENT_UNREACHABLE. There are no other alarms from this server during this period, and it seems OK.

The alarm indicates that the NetXMS server is not reaching the NetXMS agent. But this is on and off. Say once in every half hour or so - the alarm comes and goes.

It is not a big problem, but it is quite annoying. I was wondering if there is anything one can do, check, configure, or whatever, to make this stop.

The server is more or less right by the NetXMS server, physically, and we do not have any general network problems. As I said, everything works fine - except these alarms.

Any help appreciated!

tomaskir

Rather than just making the alarms go away, I would suggest investigating the root issue, if NetXMS agent really is unreachable or not during those alarms.

You can start with some simple debugging like this:
Setup monitoring of TCP port 4700 from the NetXMS server to the node (with for example 10 sec poll time), and see if the server does or doesnt actually have connectivity issues to the NetXMS agent on that node.

Nomis

Thank you, you're probably right. Unfortunately I'm not an experienced NetXMS admin. While I have successfully set up monitoring of system services, I can't figure out how to monitor a certain port (cannot find it in the Admin Guide or via Google either). Would you be so kind to point out the direction for me?

tomaskir

1) Enable PortCheck subagent
Add this line to the main section of the agent config file on the server
SubAgent = portcheck.nsm
Restart the agent so it loads the new config and the SubAgent.

2) Create a DCI on the monitored server node.
Origin "NetXMS Agent"
Click "Select" next to "Parameter"
Find "ServiceCheck.Custom" and use it.
Modify the "Parameter" like this (substitute node IP for 1.2.3.4):

ServiceCheck.Custom("1.2.3.4", "4700", "1000")

Set "Polling mode" to custom and to 1 second.
Set "Source node" to your NetXMS server.

3) Set threshold to create alarms when NetXMS Agent connection is unavailable
Under the DCI, go to "Thresholds" section
"Add" and just change "Operation" to "!= : not equal to" and keep "Value" at 0

Nomis

Thank you very much, tomaskir. I've created a rule according to your guide. It does not set off any alarm. The previous alarm keeps coming and going as before, but the one I created is silent.

Victor Kirhenshtein

Hi,

try to switch agent logging to file if it's not done already (by changing LogFile parameter in nxagentd.conf), set debug level to 6 (by adding DebugLevel = 6 to nxagentd.conf) and run agent for some time. After you got some agent unreachable errors, post your log file (or send it to [email protected]).

Best regards,
Victor

Nomis

Thank you. I did as you suggested. Where will the log file end up?

Victor Kirhenshtein

You should set full path to log file in LogFile parameter in nxagentd.conf.

Best regards,
Victor

Nomis

Hm. This is quite weird. If I add the line "LogFile = C:\Program\NetXMS\etc\" (without quotation marks, and there's nothing wrong with the path), I am unable to restart the NetXMS service, or start it when it is stopped. If I delete the line, I can restart/start the service.

I came to this in my fruitless attempts to generate a log file, I assumed that the service needed a restart to have the file created after the parameter was added.

tomaskir

You need a file path, if you give it just directory, it will of course fail to start.

Use for example:
"LogFile = C:\Program\NetXMS\etc\nxagentd.log"

Then restart the Agent service to apply that change.

Nomis

Thank you, tomaskir. When I at first wrote [Path]\logfile.txt, NetXMS created a folder called logfile.txt, so I assumed that that last part should be a folder, but obviously the file extension .log is the key. -- I'll get back with the log file contents!

Nomis

Nothing seems to get logged except the start of the service. Since then there have been several "Native agent is not responding" alarms, but nothing the logging picked up on.

Quote
[25-Jan-2016 11:52:02.347] Log file opened
[25-Jan-2016 11:52:02.347] [INFO ] Additional configs was loaded from F:\Program\NetXMS\etc\nxagentd.conf.d
[25-Jan-2016 11:52:02.347] [INFO ] Debug level set to 0
[25-Jan-2016 11:52:02.550] [INFO ] DB Library: Database driver "sqlite.ddr" loaded and initialized successfully
[25-Jan-2016 11:52:02.659] [INFO ] Subagent "WINNT.NSM" loaded successfully
[25-Jan-2016 11:52:02.659] [INFO ] Subagent "ecs.nsm" loaded successfully
[25-Jan-2016 11:52:02.659] [INFO ] Subagent "ping.nsm" loaded successfully
[25-Jan-2016 11:52:02.659] [INFO ] Subagent "logwatch.nsm" loaded successfully
[25-Jan-2016 11:52:02.659] [INFO ] Subagent "portcheck.nsm" loaded successfully
[25-Jan-2016 11:52:07.573] [INFO ] Subagent "winperf.nsm" loaded successfully
[25-Jan-2016 11:52:07.573] [INFO ] Subagent "wmi.nsm" loaded successfully
[25-Jan-2016 11:52:07.573] [INFO ] Subagent "ups.nsm" loaded successfully
[25-Jan-2016 11:52:08.587] [INFO ] Listening on socket 0.0.0.0:4700
[25-Jan-2016 11:52:08.587] [INFO ] Listening on socket [::]:4700
[25-Jan-2016 11:52:09.601] [INFO ] NetXMS Agent started

tomaskir

Did you set "DebugLevel = 6"?

Nomis


tomaskir

Quote from: Nomis on January 25, 2016, 02:22:45 PM
Yes, "DebugLevel = 6".
You log file shows

[25-Jan-2016 11:52:02.347] [INFO ] Debug level set to 0