Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - prichardson

#1
Overnight we had an object that suddenly started flooding NetXMS with severity changes due to what I can only describe as Event flapping. Specifically...
 
12/13 21:00 - Device goes unreachable due to network failure.
12/13 21:01 - Device returns to normal.
12/13 22:04 - Device goes unreachable again
12/13 22:14 - Device back up and goes normal.
12/13 22:22 - Object Severity starts bouncing from CRITICAL to NORMAL multiple times per second; Event column shows flapping between SYS_NODE_CRITICAL and SYS_NODE_UNKNOWN;  20+ times per second. A network unreachable alert is logged at this time.
12/13 22:22 to 12/14 08:47 - Flapping continues unabated; two more network unreachable entries are logged at 22:28 and 22:54; none after that.  
12/14 08:47 - Placed in Maintenance Mode.
12/14 09:20 - Taken out of Maintenance Mode.  No further bouncing.

We confirmed through other sources there were no issues with the device or the connection to it.

We've seen this happen a couple of times in the past, but the issue occurs so infrequently that this is the first time we've had a chance to look into it.  Forum searching turns up nothing related to this that is useful.  The closest we could find was this thread https://www.netxms.org/forum/general-support/too-much-alarm/ .  But the cause noted there did not apply here.

So we're at a loss as to what would have caused this flood of Event log entries.  Any pointers on where to look for further clues would be helpful.