News:

We really need your input in this questionnaire

Main Menu

NetXMS Server Crashing

Started by Netvoid, November 05, 2010, 05:15:49 PM

Previous topic - Next topic

Victor Kirhenshtein

Few additional questions:

Do you have a Windows firewall running on server machine?
Is there any error messages in netxmsd log?

Best regards,
Victor

Netvoid

Yes windows firewall is running, disabled for domain, enabled in private and public. Not currently logging.
Don't see any netxmsd log / error.

Victor Kirhenshtein

Could you please try to install updated build: https://www.netxms.org/download/rc/netxms-1.0.8-rc2.exe. If it will not solve the problem, please set logging to file (by setting LogFile = some_file in netxmsd.conf) and run netxmsd.exe with -D 6 command line option (this will enable debug output). It will generate a lot of messages in the log file - I'm interested in messages containing words "agent unreachable, error=".

Best regards,
Victor

Netvoid

Okay performed the update and I pretty much don't see a difference.

Here are a variety of entries for the error you were asking about.

This is the most common one, lots of these 98% is this.

agent unreachable, error=910, socketError=0

Then we have some of these,

agent unreachable, error=500, socketError=0

A few of these,

agent unreachable, error=500, socketError=10053
agent unreachable, error=500, socketError=10054


Victor Kirhenshtein

Hi!

I'm still unable to reproduce this issue or find any clue to the source of the problem. Please try to replace libnetxms.dll to attached one - it has another changes in communication code. At least, it may give more meaningful error codes in "agent unreachable" messages in the log.

Also, do you experience this issue from the beginning, or it appears after latest upgrades?

Best regards,
Victor

Netvoid

It all seemed to start after we went from about 150-200 agents to about 400+ agents.

I'm going to be moving the system to a 32 bit server today or tomorrow to see if that elimates the connection issue, I don't think it would be related but worth a try.


Victor Kirhenshtein

It also could be Windows 2008 issue. We have installation in Riga with 700+ agents - and I never seen issues like that. They are running on Windows Server 2003 x86.

Best regards,
Victor

Sumit Pandya

Do you guys has any reluctance on my suggestion put about
1. Microsoft Loopback adapter
2. Registry values for MaxUserPort and TcpTimedWaitDelay
3. Using SO_REUSEADDR by setsockopt()
Please give them a try. I undestand that some setup have much higher load then current setup but you need to understand that every setup/network is different!!!

Victor Kirhenshtein

Hi!

Last special build uses SO_REUSEADDR option - it makes no difference. As for MaxUserPort, things are different on Windows 2008 - http://support.microsoft.com/kb/929851. So by default we should have about 16000 ports for outgoing connections, which is much more then required.

Best regards,
Victor