Status polling and alerting

Started by curruscanis, June 26, 2015, 08:42:28 PM

Previous topic - Next topic

curruscanis

I am having issues with nodes not showing the proper status.  I have about 100 nodes being monitored on this system.

The server is Version 2.0-M4, on CentOS.

If I disconnect network to some of my nodes their status never changes, however if I disconnect it from other nodes the status changes in about 1-2 minutes and is recovered back to normal in 1-2 minutes upon being reconnected.

It also affects whole subnets, when I disconnect a router interface and have multiple nodes on a monitored subnet not all nodes will show down.  But some will.

I can not seem to find any reason as to why it affects some nodes and not others.  I am monitoring both Windows systems with agents, and network equipment with SNMP but no correlation seems to be apparent.

my "show queues" at the console all show "0" zeros.

Can someone help me with instructions as to how to troubleshoot this?

curruscanis

Here is more information, I am still no closer to understanding why this is happening but here is the return of a "Poll Status" of a known down node.  At the end it says node status normal, yet it should be critical.


------------------------
[26.06.2015 14:08:09] **** Poll request sent to server ****
[26.06.2015 14:08:09] Poll request accepted
[26.06.2015 14:08:09] Starting status poll for node Baker SSG
[26.06.2015 14:08:09] Checking SNMP agent connectivity
[26.06.2015 14:08:17] SNMP agent unreachable
[26.06.2015 14:08:17]    Starting status poll on interface serial0/0
[26.06.2015 14:08:17]       Current interface status is UNKNOWN
[26.06.2015 14:08:17]       Interface status cannot be determined
[26.06.2015 14:08:17]       Interface is UNKNOWN for 4 polls (1 poll required for status change)
[26.06.2015 14:08:17]       Interface status after poll is UNKNOWN
[26.06.2015 14:08:17]    Finished status poll on interface serial0/0
[26.06.2015 14:08:17]    Starting status poll on interface ethernet0/0
[26.06.2015 14:08:17]       Current interface status is NORMAL
[26.06.2015 14:08:17]       Starting ICMP ping
[26.06.2015 14:08:23]       Interface is NORMAL for 9 polls (1 poll required for status change)
[26.06.2015 14:08:23]       Interface status after poll is NORMAL
[26.06.2015 14:08:23]    Finished status poll on interface ethernet0/0
[26.06.2015 14:08:23]    Starting status poll on interface ethernet0/1
[26.06.2015 14:08:23]       Current interface status is UNKNOWN
[26.06.2015 14:08:23]       Interface status cannot be determined
[26.06.2015 14:08:23]       Interface is UNKNOWN for 4 polls (1 poll required for status change)
[26.06.2015 14:08:23]    Finished status poll on interface ethernet0/1
[26.06.2015 14:08:23]    Starting status poll on interface ethernet0/2
[26.06.2015 14:08:23]       Interface status after poll is UNKNOWN
[26.06.2015 14:08:23]       Current interface status is UNKNOWN
[26.06.2015 14:08:23]       Interface status cannot be determined
[26.06.2015 14:08:23]       Interface is UNKNOWN for 4 polls (1 poll required for status change)
[26.06.2015 14:08:23]       Interface status after poll is UNKNOWN
[26.06.2015 14:08:23]    Finished status poll on interface ethernet0/2
[26.06.2015 14:08:23]    Starting status poll on interface ethernet0/3
[26.06.2015 14:08:23]       Current interface status is UNKNOWN
[26.06.2015 14:08:23]       Interface status cannot be determined
[26.06.2015 14:08:23]       Interface is UNKNOWN for 4 polls (1 poll required for status change)
[26.06.2015 14:08:23]       Interface status after poll is UNKNOWN
[26.06.2015 14:08:23]    Finished status poll on interface ethernet0/3
[26.06.2015 14:08:23]    Starting status poll on interface ethernet0/4
[26.06.2015 14:08:23]       Current interface status is UNKNOWN
[26.06.2015 14:08:23]       Interface status cannot be determined
[26.06.2015 14:08:23]       Interface is UNKNOWN for 4 polls (1 poll required for status change)
[26.06.2015 14:08:23]       Interface status after poll is UNKNOWN
[26.06.2015 14:08:23]    Finished status poll on interface ethernet0/4
[26.06.2015 14:08:23]    Starting status poll on interface ethernet0/5
[26.06.2015 14:08:23]       Current interface status is UNKNOWN
[26.06.2015 14:08:23]       Interface status cannot be determined
[26.06.2015 14:08:23]       Interface is UNKNOWN for 4 polls (1 poll required for status change)
[26.06.2015 14:08:23]       Interface status after poll is UNKNOWN
[26.06.2015 14:08:23]    Finished status poll on interface ethernet0/5
[26.06.2015 14:08:23]    Starting status poll on interface ethernet0/6
[26.06.2015 14:08:23]       Current interface status is UNKNOWN
[26.06.2015 14:08:23]       Interface status cannot be determined
[26.06.2015 14:08:23]       Interface is UNKNOWN for 4 polls (1 poll required for status change)
[26.06.2015 14:08:23]       Interface status after poll is UNKNOWN
[26.06.2015 14:08:23]    Finished status poll on interface ethernet0/6
[26.06.2015 14:08:23]    Starting status poll on interface bgroup0
[26.06.2015 14:08:23]       Current interface status is NORMAL
[26.06.2015 14:08:23]       Starting ICMP ping
[26.06.2015 14:08:29]       Interface is NORMAL for 9 polls (1 poll required for status change)
[26.06.2015 14:08:29]       Interface status after poll is NORMAL
[26.06.2015 14:08:29]    Finished status poll on interface bgroup0
[26.06.2015 14:08:29]    Starting status poll on interface bgroup1
[26.06.2015 14:08:29]       Current interface status is NORMAL
[26.06.2015 14:08:29]       Starting ICMP ping
[26.06.2015 14:08:35]       Interface is NORMAL for 9 polls (1 poll required for status change)
[26.06.2015 14:08:35]       Interface status after poll is NORMAL
[26.06.2015 14:08:35]    Finished status poll on interface bgroup1
[26.06.2015 14:08:35]    Starting status poll on interface bgroup2
[26.06.2015 14:08:35]       Current interface status is UNKNOWN
[26.06.2015 14:08:35]       Interface status cannot be determined
[26.06.2015 14:08:35]       Interface is UNKNOWN for 4 polls (1 poll required for status change)
[26.06.2015 14:08:35]       Interface status after poll is UNKNOWN
[26.06.2015 14:08:35]    Finished status poll on interface bgroup2
[26.06.2015 14:08:35]    Starting status poll on interface bgroup3
[26.06.2015 14:08:35]       Current interface status is UNKNOWN
[26.06.2015 14:08:35]       Interface status cannot be determined
[26.06.2015 14:08:35]       Interface is UNKNOWN for 4 polls (1 poll required for status change)
[26.06.2015 14:08:35]       Interface status after poll is UNKNOWN
[26.06.2015 14:08:35]    Finished status poll on interface bgroup3
[26.06.2015 14:08:35]    Starting status poll on interface tunnel.1
[26.06.2015 14:08:35]       Current interface status is UNKNOWN
[26.06.2015 14:08:35]       Interface status cannot be determined
[26.06.2015 14:08:35]       Interface is UNKNOWN for 4 polls (1 poll required for status change)
[26.06.2015 14:08:35]       Interface status after poll is UNKNOWN
[26.06.2015 14:08:35]    Finished status poll on interface tunnel.1
[26.06.2015 14:08:35]    Starting status poll on interface vlan1
[26.06.2015 14:08:35]       Current interface status is UNKNOWN
[26.06.2015 14:08:35]       Interface status cannot be determined
[26.06.2015 14:08:35]       Interface is UNKNOWN for 4 polls (1 poll required for status change)
[26.06.2015 14:08:35]       Interface status after poll is UNKNOWN
[26.06.2015 14:08:35]    Finished status poll on interface vlan1
[26.06.2015 14:08:35] Node is connected
[26.06.2015 14:08:43] Finished status poll for node Baker SSG
[26.06.2015 14:08:43] Node status after poll is NORMAL
[26.06.2015 14:08:43] **** Poll completed successfully ****


curruscanis

Ok, I don't know if this is by design but I think I have found the answer.

I had set many of my nodes to "Ignore" interface state as they have many interfaces that are erroneous.

If you have the interface set in this manner alerting and alarms do not fire.


Victor Kirhenshtein

Hi,

if interface set to "ignored" state it is ignored (as name implies). However, node still should be marked as down if it cannot be reached by any means (SNMP, ICMP, or agent). Do you have at least one interface set to expected state "up"?

Best regards,
Victor