Hello!
One of the servers we're monitoring (Windows) has begun to send alarm SYS_AGENT_UNREACHABLE. There are no other alarms from this server during this period, and it seems OK.
The alarm indicates that the NetXMS server is not reaching the NetXMS agent. But this is on and off. Say once in every half hour or so - the alarm comes and goes.
It is not a big problem, but it is quite annoying. I was wondering if there is anything one can do, check, configure, or whatever, to make this stop.
The server is more or less right by the NetXMS server, physically, and we do not have any general network problems. As I said, everything works fine - except these alarms.
Any help appreciated!
			
			
			
				Rather than just making the alarms go away, I would suggest investigating the root issue, if NetXMS agent really is unreachable or not during those alarms.
You can start with some simple debugging like this:
Setup monitoring of TCP port 4700 from the NetXMS server to the node (with for example 10 sec poll time), and see if the server does or doesnt actually have connectivity issues to the NetXMS agent on that node.
			
			
			
				Thank you, you're probably right. Unfortunately I'm not an experienced NetXMS admin. While I have successfully set up monitoring of system services, I can't figure out how to monitor a certain port (cannot find it in the Admin Guide or via Google either). Would you be so kind to point out the direction for me?
			
			
			
				1) Enable PortCheck subagent
Add this line to the main section of the agent config file on the server
SubAgent = portcheck.nsm
Restart the agent so it loads the new config and the SubAgent.
2) Create a DCI on the monitored server node.
Origin "NetXMS Agent"
Click "Select" next to "Parameter"
Find "ServiceCheck.Custom" and use it.
Modify the "Parameter" like this (substitute node IP for 1.2.3.4):
ServiceCheck.Custom("1.2.3.4", "4700", "1000")
Set "Polling mode" to custom and to 1 second.
Set "Source node" to your NetXMS server.
3) Set threshold to create alarms when NetXMS Agent connection is unavailable
Under the DCI, go to "Thresholds" section
"Add" and just change "Operation" to "!= : not equal to" and keep "Value" at 0
			
			
			
				Thank you very much, tomaskir. I've created a rule according to your guide. It does not set off any alarm. The previous alarm keeps coming and going as before, but the one I created is silent.
			
			
			
				Hi,
try to switch agent logging to file if it's not done already (by changing LogFile parameter in nxagentd.conf), set debug level to 6 (by adding DebugLevel = 6 to nxagentd.conf) and run agent for some time. After you got some agent unreachable errors, post your log file (or send it to 
[email protected]).
Best regards,
Victor
			
				Thank you. I did as you suggested. Where will the log file end up?
			
			
			
				You should set full path to log file in LogFile parameter in nxagentd.conf.
Best regards,
Victor
			
			
			
				Hm. This is quite weird. If I add the line "LogFile = C:\Program\NetXMS\etc\" (without quotation marks, and there's nothing wrong with the path), I am unable to restart the NetXMS service, or start it when it is stopped. If I delete the line, I can restart/start the service.
I came to this in my fruitless attempts to generate a log file, I assumed that the service needed a restart to have the file created after the parameter was added.
			
			
			
				You need a file path, if you give it just directory, it will of course fail to start.
Use for example:
"LogFile = C:\Program\NetXMS\etc\nxagentd.log"
Then restart the Agent service to apply that change.
			
			
			
				Thank you, tomaskir. When I at first wrote [Path]\logfile.txt, NetXMS created a folder called logfile.txt, so I assumed that that last part should be a folder, but obviously the file extension .log is the key. -- I'll get back with the log file contents!
			
			
			
				Nothing seems to get logged except the start of the service. Since then there have been several "Native agent is not responding" alarms, but nothing the logging picked up on.
Quote
[25-Jan-2016 11:52:02.347] Log file opened
[25-Jan-2016 11:52:02.347] [INFO ] Additional configs was loaded from F:\Program\NetXMS\etc\nxagentd.conf.d
[25-Jan-2016 11:52:02.347] [INFO ] Debug level set to 0
[25-Jan-2016 11:52:02.550] [INFO ] DB Library: Database driver "sqlite.ddr" loaded and initialized successfully
[25-Jan-2016 11:52:02.659] [INFO ] Subagent "WINNT.NSM" loaded successfully
[25-Jan-2016 11:52:02.659] [INFO ] Subagent "ecs.nsm" loaded successfully
[25-Jan-2016 11:52:02.659] [INFO ] Subagent "ping.nsm" loaded successfully
[25-Jan-2016 11:52:02.659] [INFO ] Subagent "logwatch.nsm" loaded successfully
[25-Jan-2016 11:52:02.659] [INFO ] Subagent "portcheck.nsm" loaded successfully
[25-Jan-2016 11:52:07.573] [INFO ] Subagent "winperf.nsm" loaded successfully
[25-Jan-2016 11:52:07.573] [INFO ] Subagent "wmi.nsm" loaded successfully
[25-Jan-2016 11:52:07.573] [INFO ] Subagent "ups.nsm" loaded successfully
[25-Jan-2016 11:52:08.587] [INFO ] Listening on socket 0.0.0.0:4700
[25-Jan-2016 11:52:08.587] [INFO ] Listening on socket [::]:4700
[25-Jan-2016 11:52:09.601] [INFO ] NetXMS Agent started
			
				Did you set "DebugLevel = 6"?
			
			
			
				Yes, "DebugLevel = 6".
			
			
			
				Quote from: Nomis on January 25, 2016, 02:22:45 PM
Yes, "DebugLevel = 6".
You log file shows
[25-Jan-2016 11:52:02.347] [INFO ] Debug level set to 0
 
			
			
				Am I missing something obvious or is this the Twilight Zone. I once more opened the nxagentd.conf file, ensured that DebugLevel is 6, saved and closed, restarted the Agent, and the log file once again says "0". This is the content of the conf-file:
Quote#
# NetXMS agent configuration file
# Created by agent installer at Thu Sep 29 15:41:41 2011
#
MasterServers = 192.168.25.17
FileStore = F:\Program\NetXMS\var\
LogFile = F:\Program\NetXMS\etc\nxagentd.log
DebugLevel = 6
SubAgent = ecs.nsm
SubAgent = ping.nsm
SubAgent = logwatch.nsm
SubAgent = portcheck.nsm
SubAgent = winperf.nsm
SubAgent = wmi.nsm
SubAgent = ups.nsm
			
				Please also check "F:\Program\NetXMS\etc\nxagentd.conf.d" if there is some config there or not.
			
			
			
				That folder is empty.
			
			
			
				Hi,
check if you have -D0 in service command line. If yes, remove it.
Best regards,
Victor
			
			
			
				That did the trick, Victor. Now there's a lot of logging going on. I'll get back tomorrow with the result.
			
			
			
				I'm attaching an hour's worth of the log file, between 09:00 and 10:00. During that time there were five alarms, approximately 09:17, 09:21, 09:23, 09:32, 09:42, and 09:48. Thanks for taking time!
			
			
			
				Personally I can't find anything consistent with the alarms in the logs. There are "Session diconnected by timeout", but these are quite frequent and not necessarily at the same minutes that we get the alarms.
If anybody finds anything or has any idea of a next step, please don't hesitate to let me know. :-)
			
			
			
				Activity looks strange actually. There are multiple occurences of this pattern:
[27-Jan-2016 09:19:38.747] [DEBUG] [session:1] Session disconnected by timeout (last activity timestamp is 1453882717)
[27-Jan-2016 09:19:38.747] [DEBUG] [session:1] Session with 192.168.25.17 closed
[27-Jan-2016 09:19:42.835] [DEBUG] Incoming connection from 192.168.25.17
[27-Jan-2016 09:19:42.835] [DEBUG] Connection from 192.168.25.17 accepted
then normal session init, parameter queries, and suddenly again session disconnect by timeout and almost immediate reconnect:
[27-Jan-2016 09:19:42.850] [DEBUG] [session:1] Requesting parameter "Agent.Uptime"
[27-Jan-2016 09:19:42.850] [DEBUG] [session:1] Sending message CMD_REQUEST_COMPLETED (size 56)
[27-Jan-2016 09:20:43.861] [DEBUG] [session:1] Session disconnected by timeout (last activity timestamp is 1453882782)
[27-Jan-2016 09:20:43.861] [DEBUG] [session:1] Session with 192.168.25.17 closed
and inactivity timeout seems to be very short. Could it be that you have agent configuration parameter SessionIdleTimeout set to 0?
Best regards,
Victor
			
			
			
				Thank you for taking time, Victor Kirhenshtein, I really appreciate it. Sorry to have to respond to every suggestion with a question, but where would that parameter be? - There's no such parameter in the agent's config file, and I can't find it in the object's Properties either.
			
			
			
				It is in nxagentd.conf,  but it has default value of 60 seconds on Windows, and in the log we see "idle" disconnect after just one second. Try to add
SessionIdleTimeout = 600
to nxagentd.conf and check if idle disconnect messages will still appear.
Best regards,
Victor
			
			
			
				Thank you. I'll do that and report back later.
			
			
			
				The annoying SYS_AGENT_UNREACHABLE seems to have disappeared after the SessionIdleTimout configuration. Thank you very much, Victor.
			
			
			
				Hi,
I have same issue too and found this topic :).
I made this change 'SessionIdleTimeout = 600' and now hope that the problem also be solved :)
Tks a lot! :)