Server crashes after a few minutes

Started by Tom, May 21, 2008, 05:29:34 PM

Previous topic - Next topic

Tom

Hi,

now I have a big problem with netxms...

I made some changes to devices. Deleted a few, changed the snmp community on some others and changed the default community in discovery.
Suddenly the server crashed, means the service stopped.

Now when I start it again, it runs a few minutes and then crashes again.
I test now with a disabled Active Discover


Any ideas what I can do?

NetXMS 0.2.21
75MB MySQL Database

netxmsd: show stats
Total number of objects:     5207
Number of monitored nodes:   142
Number of collectable DCIs:  724


netxmsd: sh queues
Condition poller                 : 0
Configuration poller             : 142
Data collector                   : 0
Database writer                  : 0
Event processor                  : 0
Network discovery poller         : 0
Node poller                      : 1424
Routing table poller             : 0
Status poller                    : 0

Thanks,

Tom

Tom

#1
Seems ok with disabled Discovery... It now runs without crash, before it crashed after 3 minutes...

netxmsd: sh mutex
Mutex status:
  g_hMutexIdIndex: unlocked
  g_hMutexNodeIndex: unlocked
  g_hMutexSubnetIndex: unlocked
  g_hMutexInterfaceIndex: unlocked

---------------------------------------------------------
netxmsd: sh queues
Condition poller                 : 0
Configuration poller             : 0
Data collector                   : 0
Database writer                  : 0
Event processor                  : 0
Network discovery poller         : 0
Node poller                      : 0
Routing table poller             : 0
Status poller                    : 0


Tom

i enabled the autodiscover after it runs stable a while.

no crash during the last hour now...

strange, but ok  ;)

Tom

it just happened again...

I deleted round about 10 devices and changed the discover parameters. After restarting netxms core the server crashes after about 3 minutes.

Server restart, crash after some minutes.

server restart, disable discover - server restart - crash after a few minutes.

server restart, now it runs for about 10 minutes without problems.

Any idea? I think it happend because I deleted some devices...

Victor Kirhenshtein

On what operating system you run NetXMS server? If it is Windows, please add following lines to your netxmsd.conf


CreateCrashDumps = yes
DumpDirectory = C:\


This will cause server to create crash dumps in root of disk C:. You can set any other directory for dumps of course.
When it crashes again, please send dumps to [email protected].

Best regards,
Victor

Tom

yes, windows.

ok, I will try to reproduce the error.

Thanks

Tom

Hi,

the server runs normal now after activating the dump...

But I think there is a problem now with the pollers or discovering. Discover is activated, but no new devices are found until now.

Here is the output:
netxmsd: sh qu
Condition poller                 : 0
Configuration poller             : 0
Data collector                   : 0
Database writer                  : 0
Event processor                  : 0
Network discovery poller         : 0
Node poller                      : 3469
Routing table poller             : 0
Status poller                    : 0

netxmsd: sh poll
PT  TIME                   STATE
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:41:01   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:41:01   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:41:01   wait
S   29/May/2008 13:41:01   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:41:01   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:41:01   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:41:21   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:41:01   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:41:01   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:41:16   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
S   29/May/2008 13:40:56   wait
C   29/May/2008 13:15:22   wait
C   29/May/2008 13:12:51   wait
C   29/May/2008 13:15:17   wait
C   29/May/2008 13:14:47   wait
R   29/May/2008 13:38:32   wait
R   29/May/2008 13:38:32   wait
R   29/May/2008 13:38:39   wait
R   29/May/2008 13:38:32   wait
R   29/May/2008 13:38:32   wait
D   29/May/2008 13:06:46   wait
N   29/May/2008 11:05:49   wait
N   29/May/2008 11:05:49   wait
N   29/May/2008 11:05:49   wait
N   29/May/2008 11:05:49   wait
N   29/May/2008 11:05:49   wait
N   29/May/2008 11:05:49   wait
N   29/May/2008 11:05:49   wait
N   29/May/2008 11:05:49   wait
N   29/May/2008 11:05:49   wait
N   29/May/2008 11:05:49   wait
A   29/May/2008 12:45:49   wait


Is this a normal behavior?

DiscoverInterval is set to 6000 and active.

Greetings
Tom

Tom

The node pollers increase rapidly now...

netxmsd: sh que
Condition poller                 : 0
Configuration poller             : 0
Data collector                   : 0
Database writer                  : 0
Event processor                  : 0
Network discovery poller         : 0
Node poller                      : 10422
Routing table poller             : 0
Status poller                    : 0

netxmsd:

Victor Kirhenshtein

Could you please type


dump


on server console to create core dump of server process and send it to us for analysis?

Best regards,
Victor

Tom

#9
Good morning,

update:

Discover found the new devices at about 8 pm.

queues at the moment (still increasing):
netxmsd: sh queue
Condition poller                 : 0
Configuration poller             : 0
Data collector                   : 0
Database writer                  : 0
Event processor                  : 0
Network discovery poller         : 0
Node poller                      : 77701
Routing table poller             : 0
Status poller                    : 0

I created the dump, but I think you can see all the devices and related community strings in it. For security reasons I renamed all community strings in the file, I hope it is still readable for you.

Link to the dump was sent to you per pn.

Thanks for the help,

Greetings
Tom