Netxms claims a node down when it can't understand SNMP response

Started by cyril, October 13, 2017, 02:02:16 PM

Previous topic - Next topic

cyril

Hi! We noticed that NetXMS generates SYS_NODE_DOWN event followed by SYS_NODE_UP with 1 min delay from time to time polling our equipment, in particular Juniper switches and Ubiquiti access points.
Further inspection revealed a common pattern.


For Ubiquiti AP the pattern is:
Events:

"ID","Time","Source","DCI","Event","Severity","Message","User tag","Root ID"
"2127109","13.10.2017 10:28:39","wifi-ap","0","SYS_NODE_NORMAL","Normal","Node status changed to NORMAL","","0"
"2127104","13.10.2017 10:28:39","wifi-ap","0","SYS_IF_UP","Normal","Interface ""wifi0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 5)","","0"
"2127107","13.10.2017 10:28:39","wifi-ap","0","SYS_IF_UP","Normal","Interface ""br0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 2)","","0"
"2127106","13.10.2017 10:28:39","wifi-ap","0","SYS_IF_UP","Normal","Interface ""ath0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 6)","","0"
"2127108","13.10.2017 10:28:39","wifi-ap","0","SYS_NODE_UP","Normal","Node up","","0"
"2127105","13.10.2017 10:28:39","wifi-ap","0","SYS_IF_UP","Normal","Interface ""br0.9"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 8)","","0"
"2127103","13.10.2017 10:28:38","wifi-ap","0","SYS_IF_UP","Normal","Interface ""eth0.9"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 7)","","0"
"2127102","13.10.2017 10:28:38","wifi-ap","0","SYS_IF_UP","Normal","Interface ""eth0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 3)","","0"
"2127100","13.10.2017 10:28:38","wifi-ap","0","SYS_IF_UP","Normal","Interface ""lo"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 1)","","0"
"2127101","13.10.2017 10:28:38","wifi-ap","0","SYS_IF_DISABLED","Normal","Interface ""eth1"" disabled (IP Addr: UNSPEC/0, IfIndex: 4)","","0"
"2127099","13.10.2017 10:28:37","wifi-ap","0","SYS_SNMP_OK","Normal","Connectivity with SNMP agent restored","","0"
"2127097","13.10.2017 10:27:32","wifi-ap","0","SYS_NODE_CRITICAL","Critical","Node status changed to CRITICAL","","0"
"2127095","13.10.2017 10:27:32","wifi-ap","0","SYS_NODE_DOWN","Critical","Node down","","0"
"2127096","13.10.2017 10:27:32","wifi-ap","0","SYS_NODE_UNKNOWN","Normal","Node status changed to UNKNOWN","","0"

Network:

"No.","Time","Source","Destination","Protocol","Length","Info"
"1","2017-10-13 10:24:07.407391","NETXMS_IP","UBIQUITI_IP","SNMP","93","get-next-request 1.3.6.1.2.1.4.21.1.1"
"2","2017-10-13 10:24:07.612367","UBIQUITI_IP","NETXMS_IP","SNMP","95","get-response 1.3.6.1.2.1.11.1.0"
"3","2017-10-13 10:27:32.882671","NETXMS_IP","UBIQUITI_IP","SNMP","96","get-next-request 1.3.6.1.2.1.17.7.1.4.3.1.1"
"4","2017-10-13 10:27:33.229215","UBIQUITI_IP","NETXMS_IP","SNMP","100","get-response 1.3.6.1.4.1.10002.1.1.1.1.1.0"
"5","2017-10-13 10:27:33.229282","NETXMS_IP","UBIQUITI_IP","SNMP","96","get-next-request 1.3.6.1.2.1.17.7.1.4.2.1.4"
"6","2017-10-13 10:27:33.361641","UBIQUITI_IP","NETXMS_IP","SNMP","100","get-response 1.3.6.1.4.1.10002.1.1.1.1.1.0"
"7","2017-10-13 10:27:33.361698","NETXMS_IP","UBIQUITI_IP","SNMP","96","get-next-request 1.3.6.1.2.1.17.7.1.4.3.1.2"
"8","2017-10-13 10:27:33.504172","UBIQUITI_IP","NETXMS_IP","SNMP","100","get-response 1.3.6.1.4.1.10002.1.1.1.1.1.0"
"9","2017-10-13 10:27:33.504390","NETXMS_IP","UBIQUITI_IP","SNMP","97","get-next-request 1.3.6.1.4.1.14988.1.1.1.2.1.3"
"10","2017-10-13 10:27:33.797663","UBIQUITI_IP","NETXMS_IP","SNMP","104","get-response 1.3.6.1.4.1.41112.1.6.1.1.1.2.0"
"11","2017-10-13 10:28:37.498911","NETXMS_IP","UBIQUITI_IP","SNMP","92","get-request 1.3.6.1.2.1.1.2.0"
"12","2017-10-13 10:28:37.631093","UBIQUITI_IP","NETXMS_IP","SNMP","100","get-response 1.3.6.1.2.1.1.2.0"

Every response has error-status: noError. We were capturing icmp too, but it seems that NetXMS did not fall back to icmp ping.


For juniper:

"ID","Time","Source","DCI","Event","Severity","Message","User tag","Root ID"
"2127333","13.10.2017 11:29:56","jun-switch","0","SYS_NODE_UP","Normal","Node up","","0"
"2127331","13.10.2017 11:29:56","jun-switch","0","SYS_IF_UP","Normal","Interface ""pime"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 10)","","0"
"2127330","13.10.2017 11:29:56","jun-switch","0","SYS_IF_UP","Normal","Interface ""tap"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 7)","","0"
"2127332","13.10.2017 11:29:56","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/1 "" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 503)","","0"
"2127334","13.10.2017 11:29:56","jun-switch","0","SYS_NODE_NORMAL","Normal","Node status changed to NORMAL","","0"
"2127328","13.10.2017 11:29:55","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/23 "" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 533)","","0"
"2127326","13.10.2017 11:29:55","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/14.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 514)","","0"
"2127329","13.10.2017 11:29:55","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/8.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 546)","","0"
"2127327","13.10.2017 11:29:55","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/19 "" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 523)","","0"
"2127325","13.10.2017 11:29:54","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/22 "" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 531)","","0"
"2127324","13.10.2017 11:29:54","jun-switch","0","SYS_IF_UP","Normal","Interface ""lo0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 6)","","0"
"2127323","13.10.2017 11:29:54","jun-switch","0","SYS_IF_UP","Normal","Interface ""bme0.32768"" changed state to UP (IP Addr: 128.0.0.1/2, IfIndex: 38)","","0"
"2127321","13.10.2017 11:29:53","jun-switch","0","SYS_IF_UP","Normal","Interface ""pimd"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 11)","","0"
"2127322","13.10.2017 11:29:53","jun-switch","0","SYS_IF_UP","Normal","Interface ""ipip"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 9)","","0"
"2127319","13.10.2017 11:29:53","jun-switch","0","SYS_IF_UP","Normal","Interface ""vlan.6"" changed state to UP (IP Addr: 10.201.6.2/24, IfIndex: 551)","","0"
"2127320","13.10.2017 11:29:53","jun-switch","0","SYS_IF_UP","Normal","Interface ""bme0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 37)","","0"
"2127318","13.10.2017 11:29:52","jun-switch","0","SYS_IF_UP","Normal","Interface ""mtun"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 12)","","0"
"2127316","13.10.2017 11:29:52","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/23.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 534)","","0"
"2127317","13.10.2017 11:29:52","jun-switch","0","SYS_IF_UP","Normal","Interface ""gre"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 8)","","0"
"2127315","13.10.2017 11:29:52","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/20 "" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 527)","","0"
"2127312","13.10.2017 11:29:51","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/20.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 528)","","0"
"2127313","13.10.2017 11:29:51","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/21.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 530)","","0"
"2127314","13.10.2017 11:29:51","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/5.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 540)","","0"
"2127308","13.10.2017 11:29:50","jun-switch","0","SYS_IF_UP","Normal","Interface ""vlan"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 549)","","0"
"2127309","13.10.2017 11:29:50","jun-switch","0","SYS_IF_UP","Normal","Interface ""lsi"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 4)","","0"
"2127310","13.10.2017 11:29:50","jun-switch","0","SYS_IF_UP","Normal","Interface ""dsc"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 5)","","0"
"2127311","13.10.2017 11:29:50","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/21 "" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 529)","","0"
"2127305","13.10.2017 11:29:49","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/2"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 525)","","0"
"2127306","13.10.2017 11:29:49","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/8"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 545)","","0"
"2127304","13.10.2017 11:29:49","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 501)","","0"
"2127307","13.10.2017 11:29:49","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/1.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 504)","","0"
"2127300","13.10.2017 11:29:48","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/22.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 532)","","0"
"2127302","13.10.2017 11:29:48","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/19.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 524)","","0"
"2127303","13.10.2017 11:29:48","jun-switch","0","SYS_IF_UP","Normal","Interface ""lo0.16384"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 21)","","0"
"2127301","13.10.2017 11:29:48","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/5"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 539)","","0"
"2127297","13.10.2017 11:29:47","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/17.0 "" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 520)","","0"
"2127299","13.10.2017 11:29:47","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/16 "" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 517)","","0"
"2127298","13.10.2017 11:29:47","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/14 "" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 513)","","0"
"2127296","13.10.2017 11:29:46","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/17"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 519)","","0"
"2127293","13.10.2017 11:29:46","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/0.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 502)","","0"
"2127294","13.10.2017 11:29:46","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/18.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 522)","","0"
"2127295","13.10.2017 11:29:46","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/2.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 526)","","0"
"2127291","13.10.2017 11:29:45","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/18 "" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 521)","","0"
"2127292","13.10.2017 11:29:45","jun-switch","0","SYS_IF_UP","Normal","Interface ""ge-0/0/16.0"" changed state to UP (IP Addr: UNSPEC/0, IfIndex: 518)","","0"
"2127290","13.10.2017 11:29:45","jun-switch","0","SYS_SNMP_OK","Normal","Connectivity with SNMP agent restored","","0"
"2127285","13.10.2017 11:28:41","jun-switch","0","SYS_NODE_CRITICAL","Critical","Node status changed to CRITICAL","","0"
"2127284","13.10.2017 11:28:41","jun-switch","0","SYS_NODE_DOWN","Critical","Node down","","0"

Network:

"No.","Time","Source","Destination","Protocol","Length","Info"
"1","2017-10-13 11:23:54.816540","netxms_ip","jun_ip","SNMP","93","get-next-request 1.3.6.1.2.1.4.21.1.1"
"2","2017-10-13 11:23:54.953011","jun_ip","netxms_ip","SNMP","100","get-response 1.3.6.1.2.1.4.22.1.1.38.128.0.0.1"
"3","2017-10-13 11:28:41.391314","netxms_ip","jun_ip","SNMP","98","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.5.1"
"4","2017-10-13 11:28:41.544835","jun_ip","netxms_ip","SNMP","114","get-response 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.2.2"
"5","2017-10-13 11:28:41.545043","netxms_ip","jun_ip","SNMP","100","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.2.2"
"6","2017-10-13 11:28:41.681306","jun_ip","netxms_ip","SNMP","114","get-response 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.2.3"
"7","2017-10-13 11:28:41.681604","netxms_ip","jun_ip","SNMP","100","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.2.3"
"8","2017-10-13 11:28:41.824843","jun_ip","netxms_ip","SNMP","108","get-response 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.2.4"
"9","2017-10-13 11:28:41.824936","netxms_ip","jun_ip","SNMP","100","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.2.4"
"10","2017-10-13 11:28:41.962373","jun_ip","netxms_ip","SNMP","112","get-response 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.2.5"
"11","2017-10-13 11:28:41.962455","netxms_ip","jun_ip","SNMP","100","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.2.5"
"12","2017-10-13 11:28:42.098320","jun_ip","netxms_ip","SNMP","111","get-response 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.2.6"
"13","2017-10-13 11:28:42.098420","netxms_ip","jun_ip","SNMP","100","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.2.6"
"14","2017-10-13 11:28:42.235346","jun_ip","netxms_ip","SNMP","107","get-response 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.2.8"
"15","2017-10-13 11:28:42.235420","netxms_ip","jun_ip","SNMP","100","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.2.8"
"16","2017-10-13 11:28:42.379286","jun_ip","netxms_ip","SNMP","101","get-response 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.3.2"
"17","2017-10-13 11:28:42.379498","netxms_ip","jun_ip","SNMP","100","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.5.1.3.2"
...
"387","2017-10-13 11:29:07.878230","netxms_ip","jun_ip","SNMP","102","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.7.1.5.6.531"
"388","2017-10-13 11:29:08.014139","jun_ip","netxms_ip","SNMP","103","get-response 1.3.6.1.4.1.2636.3.40.1.5.1.7.1.5.6.532"
"389","2017-10-13 11:29:08.014320","netxms_ip","jun_ip","SNMP","102","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.7.1.5.6.532"
"390","2017-10-13 11:29:08.150575","jun_ip","netxms_ip","SNMP","103","get-response 1.3.6.1.4.1.2636.3.40.1.5.1.7.1.5.6.533"
"391","2017-10-13 11:29:08.150677","netxms_ip","jun_ip","SNMP","102","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.7.1.5.6.533"
"392","2017-10-13 11:29:08.305570","jun_ip","netxms_ip","SNMP","103","get-response 1.3.6.1.4.1.2636.3.40.1.5.1.7.1.5.6.534"
"393","2017-10-13 11:29:08.305660","netxms_ip","jun_ip","SNMP","102","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.7.1.5.6.534"
"394","2017-10-13 11:29:08.443077","jun_ip","netxms_ip","SNMP","103","get-response 1.3.6.1.4.1.2636.3.40.1.5.1.7.1.5.6.535"
"395","2017-10-13 11:29:08.443153","netxms_ip","jun_ip","SNMP","102","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.7.1.5.6.535"
"396","2017-10-13 11:29:08.580558","jun_ip","netxms_ip","SNMP","103","get-response 1.3.6.1.4.1.2636.3.40.1.5.1.7.1.5.6.536"
"397","2017-10-13 11:29:08.580629","netxms_ip","jun_ip","SNMP","102","get-next-request 1.3.6.1.4.1.2636.3.40.1.5.1.7.1.5.6.536"
"398","2017-10-13 11:29:08.725585","jun_ip","netxms_ip","SNMP","101","get-response 1.3.6.1.4.1.2636.3.40.1.7.1.1.1.1.0"
"399","2017-10-13 11:29:45.176593","netxms_ip","jun_ip","SNMP","92","get-request 1.3.6.1.2.1.1.2.0"
"400","2017-10-13 11:29:45.306845","jun_ip","netxms_ip","SNMP","104","get-response 1.3.6.1.2.1.1.2.0"

Every response has error-status: noError. We were capturing icmp too, but it seems that NetXMS did not fall back to icmp ping.


It seems that NetXMS believes that node is down when it gets unexpected snmp response.

Victor Kirhenshtein

Hi,

please check that you don't have ICMP disabled and that netxmsd process can send ICMP packets (runs as root or given necessary access). I will check why SNMP responses are not understood.

Best regards,
Victor

cyril

Hi, Victor.
ICMP ping for status polling is not disabled and netxmsd runs from root and successfully sends echo requests when nxagent on some node goes down.