On all the machines that I am monitoring, startup times are quite long
Yesterday I found out that the NetXMS agent is causing this problem, basically the agent service is started but it immediately crashes, after some time it starts up again and everything is fine. The readouts are also fine.
This problem is causing a two minute delay (or longer) on boot times for some of my more slower machines >:(
I am using version 0.2.21, I haven't yet tested on X64 machines but most likely the problem occurs there as well. So far I have only tested windows XP machines for this problem, I will test vista machines as well if you need that infromation.
Config file:
#
# NetXMS agent configuration file
# Created by agent installer at Tue May 13 14:14:16 2008
#
MasterServers = [censored]
LogFile = {syslog}
FileStore = C:\NetXMS\var
SubAgent = winperf.nsm
Here are all the syslog reports (not running an english OS so I translated some things)
1st one
Type of event: Information
Source of event: NetXMS Win32 Agent
Category of event: None
Event-ID: 3
Date 28-5-2008
Time: 9:04:57
User: none
Computer: Censored
Description:
Subagent "WINNT.NSM" loaded successfully
2nd
Type of event: error
Source of event: Service Control Manager
Category of event: None
event-ID: 7022
Date: 28-5-2008
Time: 9:06:32
User: none
Computer: [censored]
Description:
The NeTXMS agent-service has crashed at startup.
20 Second after that 5 other syslog events regarding the agent are created with the following descriptions:
3d:
Counter set B is empty, collector thread for that set will not start
4th:
Subagent "winperf.nsm" loaded successfully
5th:
Counter set C is empty, collector thread for that set will not start
6th:
Listening on socket 0.0.0.0:4700
7th (23 seconds after error or so):
NetXMS Agent started
This is a very big problem as you can understand, if you need any other info please ask.
Hi!
Sounds very strange, I'll check the agent startup code. Just one question so far: do you use host name or IP address in MasterServers parameter?
Best regards,
Victor
Thank you!
I use the IP address, it's impossible to use the host name in my setup (Set the firewall to block DNS to NetXMS section)
Hi, any updates you can give me on this problem are very welcome :)
Hello!
I would really appreciate any updates :)
Sorry for delay, I was quite busy :( Could you please do some additional tests?
1. Comment out winperf.nsm subagent - I suspect that it can be the cause of crash.
2. If agent service crashes without winperf.nsm, run nxagentd.exe from command line with -D switch - it will cause agent to log additional debug information.
Best regards,
Victor
One more question: how counter names \System\System Up Time and \System\Threads looks in your localized Windows?
Best regards,
Victor
Quote from: Victor Kirhenshtein on June 04, 2008, 09:18:38 AM
Sorry for delay, I was quite busy :( Could you please do some additional tests?
1. Comment out winperf.nsm subagent - I suspect that it can be the cause of crash.
2. If agent service crashes without winperf.nsm, run nxagentd.exe from command line with -D switch - it will cause agent to log additional debug information.
Best regards,
Victor
Here are some tests I ran for you:
Config file change:
Quote
#
# NetXMS agent configuration file
# Created by agent installer at Tue May 13 14:14:16 2008
#
MasterServers = 192.168.150.9
LogFile = {syslog}
FileStore = C:\NetXMS\var
# SubAgent = winperf.nsm
Time from login screen to usable system: 2.40 Minutes :(
Agent completely disabled, time: 2.05 Minutes
Standard settings: 3 minutes
Winperf renamed and taken out of config: 2.30 minutes
Placed config in C: root to test nxagentd -D : 3.4 minutes
System.uptime: Gives value of 574 (on for 5 minutes ~ )
system.threads: gives value of 386
NetXMS agent crash is gone after I make sure winperf doesn't start but slow start up isn't fixed by this.
Hope this gives you enough info!
Hello!
I'm still unable to reproduce your situation nor identify possible cause. Could you please change logging to file by placing
LogFile = C:\nxagentd.log
to agent's config, run nxagentd -D and send me nxagentd.log file?
Also, one more test for crash: try to uncomment winperf.nsm and add the following to agent's config:
*WinPerf
EnableDefaultCounters = no
will it crash as well?
Best regards,
Victor
Hi :)
Here is your requested log file, I left a comment for you at the bottom
**************************************************************
[10-Jun-2008 11:28:08] Log file opened
[10-Jun-2008 11:28:08] Debug: Log file opened
[10-Jun-2008 11:28:08] Debug: Subagent API initialized
[10-Jun-2008 11:28:09] Subagent "WINNT.NSM" loaded successfully
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \System\Threads ==> \Systeem\Threads
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \System\System Up Time ==> \Systeem\Systeem ingeschakeld
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \System\Processor Queue Length ==> \Systeem\Lengte van de processorwachtrij
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \System\Processor Queue Length ==> \Systeem\Lengte van de processorwachtrij
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \System\Processor Queue Length ==> \Systeem\Lengte van de processorwachtrij
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \Processor(_Total)\% Processor Time ==> \Processor(_Total)\Percentage processortijd
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \Processor(_Total)\% Processor Time ==> \Processor(_Total)\Percentage processortijd
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \Processor(_Total)\% Processor Time ==> \Processor(_Total)\Percentage processortijd
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \PhysicalDisk(_Total)\Avg. Disk Queue Length ==> \Fysieke schijf(_Total)\Gemiddelde wachtrijlengte voor schijf
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \PhysicalDisk(_Total)\% Disk Time ==> \Fysieke schijf(_Total)\Percentage schijftijd
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \Processor(0)\% Processor Time ==> \Processor(0)\Percentage processortijd
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \Processor(0)\% Processor Time ==> \Processor(0)\Percentage processortijd
[10-jun-2008 11:28:19] Debug: WINPERF: Counter translated: \Processor(0)\% Processor Time ==> \Processor(0)\Percentage processortijd
[10-jun-2008 11:28:19] Subagent "winperf.nsm" loaded successfully
[10-jun-2008 11:28:19] Counter set B is empty, collector thread for that set will not start
[10-jun-2008 11:28:19] Counter set C is empty, collector thread for that set will not start
[10-jun-2008 11:28:20] Unable to bind socket: Elk socketadres (protocol/netwerkadres/poort) kan normaal slechts één keer worden gebruikt.
=========== Victor: Bottom line of this log says: Every socket address (protocol/networkadress/port) can only be used once. ===========
I am rebooting right now to test the default counter config setting :)
Hi!
After changing the config fiel in netxms\etc to your specified settings the crashing problem has been fixed :)
What does this mean?
This means that WINPERF subagent has some problems when translating English counter names to Dutch names. I will check that code. It never runs on English version of Windows (it's why I was unable to reproduce this problem on my systems), and as far as I know runs without problem on Russian version.
Regarding startup time - I see that there are only 12 seconds between agent start and socket binding error (11:28:08 - 11:28:20). Socket error most likely caused by already running nxagentd.exe instance. Could you please check that agent service is stopped and nobody listens on port 4700, and run agent in debug mode again so we have full startup timing?
Best regards,
Victor