Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Netvoid

#1
General Support / Re: NetXMS Server Crashing
November 16, 2010, 10:49:52 PM
It all seemed to start after we went from about 150-200 agents to about 400+ agents.

I'm going to be moving the system to a 32 bit server today or tomorrow to see if that elimates the connection issue, I don't think it would be related but worth a try.

#2
General Support / Re: NetXMS Server Crashing
November 16, 2010, 05:54:45 PM
Okay performed the update and I pretty much don't see a difference.

Here are a variety of entries for the error you were asking about.

This is the most common one, lots of these 98% is this.

agent unreachable, error=910, socketError=0

Then we have some of these,

agent unreachable, error=500, socketError=0

A few of these,

agent unreachable, error=500, socketError=10053
agent unreachable, error=500, socketError=10054

#3
General Support / Re: NetXMS Server Crashing
November 15, 2010, 10:42:45 PM
Yes windows firewall is running, disabled for domain, enabled in private and public. Not currently logging.
Don't see any netxmsd log / error.
#4
General Support / Re: NetXMS Server Crashing
November 15, 2010, 05:12:47 PM
Still getting hundreds of agent timeouts per minute with negligible amounts of visible system resource utilization.

Also, even the console and command line nxadm -i timeouts are very common. Even when running the console locally.

For example the, "show queues" command at times never responds on the server. If I close it and run it a few more times it will eventually give results. According to the results nothing is queued.

The netxmsd process is at about 300mb and holding, the server has been running for at least 4 days now without a crash. It takes about 5-10 timeouts and retries with the console app to get it connected.
#5
General Support / Re: NetXMS Server Crashing
November 12, 2010, 07:21:33 PM
292 total for port 4700, about 90 are in TIME_WAIT or SYN_SENT...

After a fresh reboot and a few minutes time, I see about 950 with 600 in TIME_WAIT.
#6
General Support / Re: NetXMS Server Crashing
November 12, 2010, 05:20:33 PM
Server has not crashed in almost 24 hours. The only remaning issue seems to be the connectivity loss of the agents dropping and restoring.
#7
General Support / Re: NetXMS Server Crashing
November 12, 2010, 02:01:26 AM
After a couple minutes the event viewer started responding more quickly again..

The queues went to all zero.

And just as a heads up this is what my SQL server activity is looking like over the last while...
#8
General Support / Re: NetXMS Server Crashing
November 12, 2010, 01:53:11 AM
Server responding very slowly now.. For example event log takes a minute to load up with a progress bar I had not seen before... The log is riddled with agent connection loss/restored messages. 50-100 clients are losing connection and restoring connections per minute.


The queues are currently,

netxmsd: show queues
Condition poller                 : 0
Configuration poller             : 0
Data collector                   : 0
Database writer                  : 0
Event processor                  : 0
Network discovery poller         : 0
Node poller                      : 0
Routing table poller             : 0
Status poller                    : 132


netxmsd: show mutex
Mutex status:
  g_hMutexIdIndex: unlocked
  g_hMutexNodeIndex: unlocked
  g_hMutexSubnetIndex: unlocked
  g_hMutexInterfaceIndex: unlocked

netxmsd: show flags
Flags: 0x43300257
  AF_DAEMON                        = 1
  AF_USE_SYSLOG                    = 1
  AF_ENABLE_NETWORK_DISCOVERY      = 1
  AF_ACTIVE_NETWORK_DISCOVERY      = 0
  AF_LOG_SQL_ERRORS                = 1
  AF_DELETE_EMPTY_SUBNETS          = 0
  AF_ENABLE_SNMP_TRAPD             = 1
  AF_ENABLE_ZONING                 = 0
  AF_SYNC_NODE_NAMES_WITH_DNS      = 0
  AF_CHECK_TRUSTED_NODES           = 1
  AF_WRITE_FULL_DUMP               = 0
  AF_RESOLVE_NODE_NAMES            = 1
  AF_CATCH_EXCEPTIONS              = 1
  AF_INTERNAL_CA                   = 0
  AF_DB_LOCKED                     = 1
  AF_ENABLE_MULTIPLE_DB_CONN       = 1
  AF_DB_CONNECTION_LOST            = 0
  AF_NO_NETWORK_CONNECTIVITY       = 0
  AF_EVENT_STORM_DETECTED          = 0
  AF_SERVER_INITIALIZED            = 1
  AF_SHUTDOWN                      = 0

The memory utilization is holding at about 121mb.
#9
General Support / Re: NetXMS Server Crashing
November 12, 2010, 12:14:25 AM
The client connectivity loss/restore entries all still quite high. Dozens every few minutes, as shown by the attached event log.

#10
General Support / Re: NetXMS Server Crashing
November 11, 2010, 10:43:08 PM
I am noticing a tremendous amount of these disconnect and reconnects in the event log since I revised the settings. I attached screen shot. I put the status poller interval back down to 90 from 120 in case that is the cause.

memory is sitting at 110mb after being up for an hour or so..

queues look like this now,

netxmsd: show queues
Condition poller                 : 0
Configuration poller             : 523
Data collector                   : 5343
Database writer                  : 0
Event processor                  : 0
Network discovery poller         : 0
Node poller                      : 0
Routing table poller             : 0
Status poller                    : 190

#11
General Support / Re: NetXMS Server Crashing
November 11, 2010, 09:16:58 PM
Memory usage of the netxmsd before making all these changes and restarting was holding at about 130mb.

The database should be healthy, I'll watch the perfomance logs on that after these changes but the SQL server is low usage and on a higher performance segment of our SAN.

These suggestions,

StatusPollingInterval = 90
ConfigurationPollingInterval = 14400

I had already moved status interval to 90, so I bumped it to 120. The configuration I put at 14400 as you suggest and it was 3600.

Just restarted with the revised setting and removed the processor affinity. I'll check and post the stats in an hour.



#12
General Support / Re: NetXMS Server Crashing
November 11, 2010, 08:58:29 PM
This hour stats..


netxmsd: show queues
Condition poller                 : 0
Configuration poller             : 2978
Data collector                   : 0
Database writer                  : 181423
Event processor                  : 0
Network discovery poller         : 0
Node poller                      : 0
Routing table poller             : 0
Status poller                    : 2316

netxmsd: show mutex
Mutex status:
  g_hMutexIdIndex: unlocked
  g_hMutexNodeIndex: unlocked
  g_hMutexSubnetIndex: unlocked
  g_hMutexInterfaceIndex: unlocked

netxmsd: show flags
Flags: 0x43300257
  AF_DAEMON                        = 1
  AF_USE_SYSLOG                    = 1
  AF_ENABLE_NETWORK_DISCOVERY      = 1
  AF_ACTIVE_NETWORK_DISCOVERY      = 0
  AF_LOG_SQL_ERRORS                = 1
  AF_DELETE_EMPTY_SUBNETS          = 0
  AF_ENABLE_SNMP_TRAPD             = 1
  AF_ENABLE_ZONING                 = 0
  AF_SYNC_NODE_NAMES_WITH_DNS      = 0
  AF_CHECK_TRUSTED_NODES           = 1
  AF_WRITE_FULL_DUMP               = 0
  AF_RESOLVE_NODE_NAMES            = 1
  AF_CATCH_EXCEPTIONS              = 1
  AF_INTERNAL_CA                   = 0
  AF_DB_LOCKED                     = 1
  AF_ENABLE_MULTIPLE_DB_CONN       = 1
  AF_DB_CONNECTION_LOST            = 0
  AF_NO_NETWORK_CONNECTIVITY       = 0
  AF_EVENT_STORM_DETECTED          = 0
  AF_SERVER_INITIALIZED            = 1
  AF_SHUTDOWN                      = 0

netxmsd: show stats
Total number of objects:     9309
Number of monitored nodes:   3649
Number of collectable DCIs:  8147
#13
General Support / Re: NetXMS Server Crashing
November 11, 2010, 07:48:58 PM
Attaching results from,

show pollers
show queues
show flags
show mutex

After server has been up for about an hour rather than right after startup. I expect server crash in 2-4 hours. I will try to run these commands every hour while the server is up to help pinpoint....

Also ran a show stats,

netxmsd: show stats
Total number of objects:     9309
Number of monitored nodes:   3649
Number of collectable DCIs:  8147
#14
General Support / Re: NetXMS Server Crashing
November 11, 2010, 01:19:27 AM
Yeah I tried this one and it failed to start properly also. Agent and core on the server give similar error upon startup attempts.



#15
General Support / Re: NetXMS Server Crashing
November 10, 2010, 10:53:54 PM
Victor,

Replaced these files in the bin folder but had no luck because the service wouldn't start. I reboot just in case and still no luck service wouldn't start. I restored the files and system was able to come up again.

Regards,

Clark