News:

We really need your input in this questionnaire

Main Menu

Random NetXMS server crashes

Started by bdefloo, September 19, 2012, 10:44:45 AM

Previous topic - Next topic

bdefloo

Hi,

We've been having some problems with our NetXMS environment since going from ~200 nodes to above 400 on v1.2.0, and currentely 787 on v1.2.3. The server crashes at random times, every 2-10 days. Crashes have occured with and without users logged on.

NetXMS runs on a 32bit Windows 2003 server with a MS SQL Express 2008 R2 database on the same server.

At first, we would see network service objects changing status to "unknown" up to an hour before the actual crash. The Windows event log shows the agent reporting
"Communication session broken: An existing connection was forcibly closed by the remote host."
Until finally the server crashed and had to be restarted. The server log shows no errors.

More recently, the v1.2.0 server crashed again but this time without the agent errors. The event log shows there were no SYS_SERVICE_UNKNOWN events prior to this crash either.

I hoped an upgrade to v1.2.3 would resolve this problem, however the server crashed again tonight with no users logged on, and no logged SYS_SERVICE_UNKNOWN events, after having upgraded Monday.

I'll send the Dr. Watson crash dump to [email protected]. If I can provide any other information, feel free to ask.

bdefloo

Hi,

We haven't had a crash since my last post, so it appears that v1.2.3 is running more stable.

I will keep you informed if it does happen again.

bdefloo

Crashed again saturday afternoon. Last messages in log file were
[06-Oct-2012 16:08:27] Thread "Syncer Thread" does not respond to watchdog thread
[06-Oct-2012 16:08:27] Thread "Poll Manager" does not respond to watchdog thread

[06-Oct-2012 16:09:36] Log file opened
[06-Oct-2012 16:09:37] Database driver "mssql.ddr" loaded and initialized successfully
[06-Oct-2012 16:09:39] Stalled database lock removed
[06-Oct-2012 16:09:40] Unable to load module "": The specified module could not be found.
[06-Oct-2012 16:09:40] Network device driver "BAYSTACK" loaded successfully
[06-Oct-2012 16:09:40] Network device driver "CATALYST-2900XL" loaded successfully
[06-Oct-2012 16:09:40] Network device driver "CATALYST-GENERIC" loaded successfully
[06-Oct-2012 16:09:40] Network device driver "CISCO-ESW" loaded successfully
[06-Oct-2012 16:09:40] Network device driver "DELL-PWC" loaded successfully
[06-Oct-2012 16:09:40] Network device driver "ERS8000" loaded successfully
[06-Oct-2012 16:09:40] Network device driver "NETSCREEN" loaded successfully
[06-Oct-2012 16:09:40] Network device driver "PROCURVE" loaded successfully
[06-Oct-2012 16:12:12] Unable to bind socket to port 58942 in function LocalAdminListener: Only one usage of each socket address (protocol/network address/port) is normally permitted.
[06-Oct-2012 16:12:12] Unable to bind socket to port 162 in function SNMPTrapReceiver: Only one usage of each socket address (protocol/network address/port) is normally permitted.
[06-Oct-2012 16:12:12] NetXMS Server started
[06-Oct-2012 16:12:12] Unable to bind socket to port 4701 in function ClientListener: Only one usage of each socket address (protocol/network address/port) is normally permitted.
[06-Oct-2012 16:12:12] Unable to create socket in function ClientListener
[06-Oct-2012 16:12:52] Thread "Item Poller" does not respond to watchdog thread


I had to reboot the server to free up the ports, server is running fine again since. I also enabled the CreateCrashDumps option in netxmsd.conf, if that still exists, and will send you the crash dump if it occurs again.

Victor Kirhenshtein

Yes, crash dump will be extremely helpful.

Best regards,
Victor

bdefloo

#4
Hi,

I've been having more crashes lately (4 in the last 24 hours).

Probably they're caused by a large amount of tests I added which poll an external parameter via the NetXMS server node. Maybe the crashes I've had before have the same root cause, maybe not. Either way, I sent the NetXMS crash dump files to [email protected]

Thanks in advance for any help you can offer!

Update:
I just checked the server logs. Only one of the crashes generated a crash dump, in the other 3 netxmsd just stopped working.
I also noticed the crash dump log refers to AgentPolicy::ModifyFromMessage, isn't this something that's called when changes are made to agent policies by users? Nobody was in NetXMS at the time.

bdefloo

Hi,

Just had another crash out of the blue. I disabled the large number of tests shortly after my last post, and NetXMS has been running stable since then, until now.

Again, same symptoms: First network services go to unknown and agents report "Communication session broken", and a while later, the netxmsd service crashes. Crash dump info is attached, netxmsd-9352-1351585156.mdmp is an empty file (0KB) so I didn't include it.