1.2.7 - Syncer Thread Not responding

Started by KjellO, May 28, 2013, 04:33:26 PM

Previous topic - Next topic

KjellO

Hi,
Recently updated to 1.2.7, but having an issue with Syncer Thread. A few minutes after server start, it stops responding and won't recover again.


Item Poller                                      20       Running
Syncer Thread                                    130      Not responding
Poll Manager                                     60       Running


nxadm command show pollers indicates that some pollers seems to be stuck, in particular topology pollers. Show queues gives that Topology poller has a large number, the other queues are fine.
Is is possible to completely disable topology polling at the server level?

Data collection and alarms seems to work. What is the impact of non-responding Syncer Thread? Any ideas on how to fix this?

Thanks in advance!

Victor Kirhenshtein

Hi!

Due to changes in 1.2.7, you can occasionally get "syncer thread not responding" message. That's ok, and you can safely ignore it, unless it's stuck in not responding state forever. Can you double check that syncer thread really never come back from not responding state?

Best regards,
Victor

KjellO

Unfortunately it seems stuck. 9 hours now since last restart of netxmsd, and has been in Not responding state since. Besides from showing up in alarm browser, this is also logged to syslog. But no more occurrences since the initial one right after server start, so I'm quite sure that it been stuck all day.

Victor Kirhenshtein

You mention that some pollers seems to be stuck. Please send me list of these pollers and information about nodes they trying to process.

Best regards,
Victor

KjellO

Further investigations.... started a backup on a cold standby server. Now, the Syncer thread seems fine and no pollers stuck. But, when server is doing configuration polls, there is lot of Node down's and "Unable to create raw socket for ICMP protocol" in syslog. The Node down alarms will recover though, until next configuration poll cycle.

Ok, the usual file descriptor limit. ulimit -n shows 1024, increased and restarted server. No node downs, no errors in syslog, but... the stuck pollers and Syncer thread problem is back...

Reverted to 1024. Pollers/Syncer OK but lots of node downs. Now when the Syncer thread is alive, I will try to disable routing/topology polling on a bunch of nodes to see if I can get the best from these worlds.


Victor Kirhenshtein

Hi,

very interesting observation. Looks like something got broken when large number of sockets can be open in parallel. I'll look in this direction.

Best regards,
Victor

Victor Kirhenshtein

Btw, how many nodes you are monitoring? And what kind of nodes (more switches/routers or servers)? Could it be that you have network devices with very large MAC tables or routing tables?

Best regards,
Victor

KjellO

Hi, sorry for late reply.
Reverted to 1.2.6. No Syncer Thread problems, however some pollers still might get stuck. Have a feeling that this been the case in previous versions as well, but it is not a big problem.

For your question about number of nodes, output from nxadmc:

netxmsd: sh stat
Total number of objects:     5988
Number of monitored nodes:   1530
Number of collectable DCIs:  45318


Almost 900 nodes with agents, but not any large core routers/switches.

Best regards,
Kjell

Sympology

Hi, I'm also having issues with this issue since upgrading.
W2003K server monitoring about 40 windows servers and about 20 other devices.
Often I can get timeout errors when applying Templates and polling configurations. Only happing since the upgrade.

Sympology

#9
Pictures speak a thousand words....

Victor Kirhenshtein

Hi!

This issue should be fixed in 1.2.8 release.

Best regards,
Victor

Sympology

Thanks guys, keep up the good work...

anotherspot

Great to hear that I was not alone in this issue.
Any E.T.A. for 1.2.8? Any solution until then?

Victor Kirhenshtein

E.T.A. is today :) Most components already packed, I'll put everything on to web site during the day.

Best regards,
Victor

yshiro