Node down vs Node is unreachable by ICMP

Egert143 · February 22, 2022, 08:30:07 AM

Hello

Could someone please explain how is it decided what error is displayed when node is down?

For example when node is really down, no network connectivity etc, sometimes Netxms shows status as Node down, other times as unreachable by icmp and sometimes both. What is deciding factor in there?

Also for icmp down it seems very sensitive, maybe 1 packet is lost and status is already unreachable.

Egert

Victor Kirhenshtein · March 02, 2022, 09:38:33 AM

Hi,

there are two independent polls that produce those two events. One is status poll, that checks node connectivity using agent, SNMP, and finally ICMP if neither SNMP agent nor NetXMS agent are responding. This poll can generate events SYS_SNMP_UNREACHABLE, SYS_AGENT_UNREACHABLE, SYS_NODE_UNREACHABLE, and SYS_NODE_DOWN. Difference between SYS_NODE_UNREACHABLE and SYS_NODE_DOWN is that "unreachable" generated when server can detect network failure between itself and target node.
Another poll is ICMP poll, it was added in version 4.0. It's main use is to do regular ICMP pings and collect response time and packet loss statistic. In addition, it will generate event SYS_ICMP_UNREACHABLE when node is not responding to ICMP. If that happen after SYS_NODE_DOWN then ICMP unreachable event will be correlated to node down event, but because those two types of poll run asynchronously it is possible that node became actually unreachable after last status poll run and before next ICMP poll run, thus generating ICMP unreachable event that cannot be correlated to node down event.
You can effectively hide those ICMP unreachable events by disabling or removing rules that generate alarms from them, and unchecking "write to event log" option in event template configuration for SYS_ICMP_UNREACHABLE and SYS_ICMP_OK.

Best regards,
Victor

Egert143 · March 02, 2022, 10:40:28 AM

Thanks for the explanation.

Is it possible to adjust icmp polling parameters to accept more ping loss before saying icmp unreachable? I adjusted PingTimeout from 1500 to 3000 but it doesent change much.

Egert

Storm-Donovan · March 02, 2022, 02:59:56 PM

Hi Egert,

I had the same issue with Nodes Unreachable by ICMP. Victor and I determined that the ICMP poll was ignoring the server setting for PollCountForStatusChange (mine is set to 5 and it alarmed on one missed ping). I believe they are working on fixing this. Until that happens, I just disabled the Event Processing Policy.

Cheers,
Donovan.

Egert143 · March 02, 2022, 03:24:36 PM

Thank for the info! Will be waiting for fix then.

Egert

dreamscape · May 12, 2023, 11:47:21 AM

Can I ask if this is still an issue please, our system will report in this order for example for most devices on a regular basis?

09:41:32 - SYS_ICMP_UNREACHABLE
09:41:32 - SYS_NODE_MAJOR
09:42:34 - SYS_ICMP_OK
09:42:34 - SYS_NODE_NORMAL

dreamscape · May 18, 2023, 03:20:35 PM

Hi, Does anyone know please??

Victor Kirhenshtein · May 22, 2023, 09:21:45 AM

Hi,

problem with ICMP polls ignoring "poll count for status change" was fixed. Do you have poll count set to more than 1 and still get regular ICMP unreachable?

Best regards,
Victor

dreamscape · May 26, 2023, 01:44:37 PM

Hi Victor,

Sorry I didn't see you had commented.

I've upped my poll count from the default of 1 to 2 but i'm still getting these messages?

See below

Victor Kirhenshtein · May 29, 2023, 02:25:39 PM

Can you try to set debug level to 7 for tag poll.icmp, and when problem repeats, send me extract from the server log filtered by object name?

Best regards,
Victor

dreamscape · May 31, 2023, 12:20:58 PM

Hi Victor,

Please see below for one client, went unreachable, major at 09:58:23, back to normal 10:00:42

Thanks
Nick

2023.05.31 09:56:11.912 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: calling IcmpPing(192.168.1.85,3,1500,46)

2023.05.31 09:56:11.912 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: ping status=0 RTT=0

2023.05.31 09:57:16.994 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: calling IcmpPing(192.168.1.85,3,1500,46)

2023.05.31 09:57:18.292 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: ping status=2 RTT=0

2023.05.31 09:58:22.612 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: calling IcmpPing(192.168.1.85,3,1500,46)

2023.05.31 09:58:23.800 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: ping status=2 RTT=0

2023.05.31 09:59:33.326 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: calling IcmpPing(192.168.1.85,3,1500,46)

2023.05.31 09:59:33.326 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: ping status=0 RTT=2

2023.05.31 10:00:42.308 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: calling IcmpPing(192.168.1.85,3,1500,46)

2023.05.31 10:00:42.308 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: ping status=0 RTT=0

2023.05.31 10:01:52.415 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: calling IcmpPing(192.168.1.85,3,1500,46)

2023.05.31 10:01:52.415 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: ping status=0 RTT=1

2023.05.31 10:02:57.539 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: calling IcmpPing(192.168.1.85,3,1500,46)

2023.05.31 10:02:57.539 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: ping status=0 RTT=0

2023.05.31 10:04:02.641 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: calling IcmpPing(192.168.1.85,3,1500,46)

2023.05.31 10:04:02.641 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: ping status=0 RTT=0

2023.05.31 10:05:07.675 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: calling IcmpPing(192.168.1.85,3,1500,46)

2023.05.31 10:05:07.675 *D* [poll.icmp ] Node::icmpPollAddress(GARDENW10-1 [14138], PRI, 192.168.1.85):: ping status=0 RTT=0

Victor Kirhenshtein · June 13, 2023, 06:46:01 PM

According to log server got two timeouts - at 09:57:18 and 09:58:23 (status=2 means timeout). Next ping was successful again. Do you think there was a possibility of network issues at that time that could cause loss of ICMP packets? Also, if you change poll count to 3 or 4, will it fix the situation?

Best regards,
Victor

dreamscape · June 16, 2023, 10:49:40 AM

Hi Victor,

Thanks for the reply, I'm unsure on network issues, it's small, around 50 switches.

Will increases the poll count to 3 and see.

Thanks
Nick

NetXMS Support Forum

News:

Node down vs Node is unreachable by ICMP

Egert143

Victor Kirhenshtein

Egert143

Storm-Donovan

Egert143

dreamscape

dreamscape

Victor Kirhenshtein

dreamscape

Victor Kirhenshtein

dreamscape

Victor Kirhenshtein

dreamscape