NetXMS Support Forum

English Support => General Support => Topic started by: KjellO on May 28, 2013, 04:33:26 PM

Title: 1.2.7 - Syncer Thread Not responding
Post by: KjellO on May 28, 2013, 04:33:26 PM
Hi,
Recently updated to 1.2.7, but having an issue with Syncer Thread. A few minutes after server start, it stops responding and won't recover again.


Item Poller                                      20       Running
Syncer Thread                                    130      Not responding
Poll Manager                                     60       Running


nxadm command show pollers indicates that some pollers seems to be stuck, in particular topology pollers. Show queues gives that Topology poller has a large number, the other queues are fine.
Is is possible to completely disable topology polling at the server level?

Data collection and alarms seems to work. What is the impact of non-responding Syncer Thread? Any ideas on how to fix this?

Thanks in advance!
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: Victor Kirhenshtein on May 28, 2013, 06:45:38 PM
Hi!

Due to changes in 1.2.7, you can occasionally get "syncer thread not responding" message. That's ok, and you can safely ignore it, unless it's stuck in not responding state forever. Can you double check that syncer thread really never come back from not responding state?

Best regards,
Victor
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: KjellO on May 28, 2013, 07:57:38 PM
Unfortunately it seems stuck. 9 hours now since last restart of netxmsd, and has been in Not responding state since. Besides from showing up in alarm browser, this is also logged to syslog. But no more occurrences since the initial one right after server start, so I'm quite sure that it been stuck all day.
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: Victor Kirhenshtein on May 29, 2013, 06:05:23 PM
You mention that some pollers seems to be stuck. Please send me list of these pollers and information about nodes they trying to process.

Best regards,
Victor
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: KjellO on May 30, 2013, 06:37:57 PM
Further investigations.... started a backup on a cold standby server. Now, the Syncer thread seems fine and no pollers stuck. But, when server is doing configuration polls, there is lot of Node down's and "Unable to create raw socket for ICMP protocol" in syslog. The Node down alarms will recover though, until next configuration poll cycle.

Ok, the usual file descriptor limit. ulimit -n shows 1024, increased and restarted server. No node downs, no errors in syslog, but... the stuck pollers and Syncer thread problem is back...

Reverted to 1024. Pollers/Syncer OK but lots of node downs. Now when the Syncer thread is alive, I will try to disable routing/topology polling on a bunch of nodes to see if I can get the best from these worlds.

Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: Victor Kirhenshtein on June 03, 2013, 01:13:24 PM
Hi,

very interesting observation. Looks like something got broken when large number of sockets can be open in parallel. I'll look in this direction.

Best regards,
Victor
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: Victor Kirhenshtein on June 04, 2013, 10:05:46 PM
Btw, how many nodes you are monitoring? And what kind of nodes (more switches/routers or servers)? Could it be that you have network devices with very large MAC tables or routing tables?

Best regards,
Victor
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: KjellO on June 20, 2013, 11:19:41 AM
Hi, sorry for late reply.
Reverted to 1.2.6. No Syncer Thread problems, however some pollers still might get stuck. Have a feeling that this been the case in previous versions as well, but it is not a big problem.

For your question about number of nodes, output from nxadmc:

netxmsd: sh stat
Total number of objects:     5988
Number of monitored nodes:   1530
Number of collectable DCIs:  45318


Almost 900 nodes with agents, but not any large core routers/switches.

Best regards,
Kjell
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: Sympology on July 15, 2013, 05:24:03 PM
Hi, I'm also having issues with this issue since upgrading.
W2003K server monitoring about 40 windows servers and about 20 other devices.
Often I can get timeout errors when applying Templates and polling configurations. Only happing since the upgrade.
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: Sympology on July 17, 2013, 01:47:21 PM
Pictures speak a thousand words....
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: Victor Kirhenshtein on July 17, 2013, 04:37:00 PM
Hi!

This issue should be fixed in 1.2.8 release.

Best regards,
Victor
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: Sympology on July 18, 2013, 12:55:43 PM
Thanks guys, keep up the good work...
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: anotherspot on July 19, 2013, 06:39:27 AM
Great to hear that I was not alone in this issue.
Any E.T.A. for 1.2.8? Any solution until then?
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: Victor Kirhenshtein on July 19, 2013, 09:04:31 AM
E.T.A. is today :) Most components already packed, I'll put everything on to web site during the day.

Best regards,
Victor
Title: Re: 1.2.7 - Syncer Thread Not responding
Post by: yshiro on July 19, 2013, 11:15:42 AM
 ;D Wooot nice news!