Crashing after upgrade to 2.2.15

Started by troffasky, May 31, 2019, 08:04:07 PM

Previous topic - Next topic

troffasky

Upgraded from 2.2.13 to 2.2.15 and since then the service isn't staying up. If I start it with netxmsd -D 9, these are the dying gasps:


2019.05.31 18:03:13.434 *D* StatusPoll(PDU): finished child object poll
2019.05.31 18:03:13.434 *D* StatusPoll(PDU): allDown=false, dynFlags=0x00001001
2019.05.31 18:03:13.441 *D* Node(PDU)->GetItemFromSNMP(.1.3.6.1.2.1.1.3.0): dwResult=0
2019.05.31 18:03:13.441 *D* StatusPoll(PDU [1458]): boot time set to 1554531149 from SNMP
2019.05.31 18:03:13.441 *D* Finished status poll for node PDU (ID: 1458)
2019.05.31 18:03:13.441 *D* ConfigReadStr: (cached) name=DeleteUnreachableNodesPeriod value="0"
2019.05.31 18:03:13.441 *D* [poll.conf          ] Starting configuration poll for node Big-Rack-PDU (ID: 1588)
2019.05.31 18:03:13.441 *D* Node is marked as unreachable, configuration poll aborted
2019.05.31 18:03:13.441 *D* Finished configuration poll for node Big-Rack-PDU (ID: 1588)
2019.05.31 18:03:13.441 *D* [poll.status        ] Starting status poll for node Big-Rack-PDU (ID: 1588)
2019.05.31 18:03:13.441 *D* ConfigReadStr: (cached) name=CapabilityExpirationTime value="604800"
2019.05.31 18:03:13.441 *D* [poll.status        ] StatusPoll(Big-Rack-PDU): check SNMP
2019.05.31 18:03:13.448 *D* [snmp.entity        ] Building component tree for BarkS-Core3-POE [1054
Segmentation fault (core dumped)



The output varies, but out of about 10 attempts, more than half of them end with the same line. That node is a Cisco SB switch, which I see mentioned in the release notes for 2.2.15, so I don't think this is a coincidence. How can I troubleshoot this?
nxdbmgr check comes back clear.

troffasky

I have worked around this temporarily by moving cisco.ndd out of the way. I have no idea what effect this is going to have on NetXMS but at least it's staying up!

Victor Kirhenshtein

Hi,

could you please run netxmsd under gdb (with Cisco driver back) and when it crashes show output of bt command. You'll need to install -dbg packages if you installed NetXMS from deb packages.

Best regards,
Victor

troffasky


2019.06.03 22:12:53.762 *D* [poll.conf          ] ConfPoll(BarkP2P-1): node is wireless controller, reading access point information
2019.06.03 22:12:53.778 *D* [snmp.entity        ] Building component tree for BarkS-Core3-POE [1054

Thread 74 "$POLLERS/WRK" received signal SIGBUS, Bus error.
[Switching to Thread 0x7fffe30ee700 (LWP 593)]
0x00007fffea4777ce in HandlerPhysicalPorts (var=<optimised out>, snmp=snmp@entry=0x7fffebc6e500, arg=arg@entry=0x7fffe30dad50) at sb.cpp:113
113     sb.cpp: No such file or directory.
(gdb) bt
#0  0x00007fffea4777ce in HandlerPhysicalPorts (var=<optimised out>, snmp=snmp@entry=0x7fffebc6e500, arg=arg@entry=0x7fffe30dad50) at sb.cpp:113
#1  0x00007ffff6871da7 in SnmpWalk (transport=transport@entry=0x7fffebc6e500, rootOid=rootOid@entry=0x7fffe30daae0, rootOidLen=14,
    handler=handler@entry=0x7fffea477600 <HandlerPhysicalPorts(SNMP_Variable*, SNMP_Transport*, void*)>, userArg=userArg@entry=0x7fffe30dad50, logErrors=<optimised out>, failOnShutdown=false) at util.cpp:334
#2  0x00007ffff6871eea in SnmpWalk (transport=transport@entry=0x7fffebc6e500, rootOid=rootOid@entry=0x7fffea478310 L".1.3.6.1.4.1.9.6.1.101.53.3.1.5",
    handler=handler@entry=0x7fffea477600 <HandlerPhysicalPorts(SNMP_Variable*, SNMP_Transport*, void*)>, userArg=userArg@entry=0x7fffe30dad50, logErrors=logErrors@entry=false, failOnShutdown=failOnShutdown@entry=false)
    at util.cpp:265
#3  0x00007fffea477a3a in CiscoSbDriver::getPhysicalPortLayout (this=<optimised out>, snmp=0x7fffebc6e500, layout=0x7fffe30dad50) at sb.cpp:126
#4  0x00007fffea477aa2 in CiscoSbDriver::getModuleLayout (this=<optimised out>, snmp=<optimised out>, attributes=<optimised out>, driverData=<optimised out>, module=1, layout=0x7fffe30df340) at sb.cpp:217
#5  0x00007ffff7a22b5c in Node::confPollSnmp (this=0x7fffe671b000, rqId=0) at node.cpp:3486
#6  0x00007ffff7a2cfe4 in Node::configurationPoll (this=this@entry=0x7fffe671b000, pSession=pSession@entry=0x0, rqId=rqId@entry=0, poller=poller@entry=0x7fffec8ee500, maskBits=maskBits@entry=0) at node.cpp:2724
#7  0x00007ffff7a2da93 in Node::configurationPoll (this=0x7fffe671b000, poller=0x7fffec8ee500) at node.cpp:2652
#8  0x00007ffff79b9e41 in __ThreadPoolExecute_Wrapper_1<Node, PollerInfo*> (arg=0x7fffec80e160) at ../../../include/nms_threads.h:1215
#9  0x00007ffff74eb398 in ProcessSerializedRequests (data=0x7fffec80e180) at tp.cpp:466
#10 0x00007ffff74eb19b in WorkerThread (arg=0x7fffec803660) at tp.cpp:186
#11 0x00007ffff6e706db in start_thread (arg=0x7fffe30ee700) at pthread_create.c:463
#12 0x00007ffff6b9988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95



troffasky

Here's a walk from that OID, for what it's worth:

CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.49 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.50 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.51 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.52 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.53 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.54 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.55 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.56 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.57 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.58 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.59 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.60 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.61 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.62 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.63 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.64 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.65 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.66 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.67 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.68 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.69 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.70 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.71 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.72 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.107 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.108 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.109 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.5.110 = INTEGER: 1

Victor Kirhenshtein

Could you please also show walk output for .1.3.6.1.4.1.9.6.1.101.53.3.1.6 and .1.3.6.1.4.1.9.6.1.101.53.3.1.7?

Best regards.
Victor

troffasky

 .1.3.6.1.4.1.9.6.1.101.53.3.1.6

CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.49 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.50 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.51 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.52 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.53 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.54 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.55 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.56 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.57 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.58 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.59 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.60 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.61 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.62 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.63 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.64 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.65 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.66 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.67 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.68 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.69 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.70 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.71 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.72 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.107 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.108 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.109 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.6.110 = INTEGER: 2

.1.3.6.1.4.1.9.6.1.101.53.3.1.7

CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.49 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.50 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.51 = INTEGER: 3
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.52 = INTEGER: 4
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.53 = INTEGER: 5
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.54 = INTEGER: 6
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.55 = INTEGER: 7
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.56 = INTEGER: 8
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.57 = INTEGER: 9
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.58 = INTEGER: 10
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.59 = INTEGER: 11
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.60 = INTEGER: 12
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.61 = INTEGER: 1
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.62 = INTEGER: 2
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.63 = INTEGER: 3
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.64 = INTEGER: 4
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.65 = INTEGER: 5
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.66 = INTEGER: 6
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.67 = INTEGER: 7
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.68 = INTEGER: 8
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.69 = INTEGER: 9
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.70 = INTEGER: 10
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.71 = INTEGER: 11
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.72 = INTEGER: 12
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.107 = INTEGER: 13
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.108 = INTEGER: 13
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.109 = INTEGER: 14
CISCOSB-Physicaldescription-MIB::rlPhysicalDescription.3.1.7.110 = INTEGER: 14

Victor Kirhenshtein

Quite strange, I don't see anything wrong with the data. Could you please make it crash under debugger again, and then do the following commands:

bt
print module
print *module
print row
print column
print *var
print *request
print *response

Best regards,
Victor

troffasky

Not sure if "optimised out" means I'm not using the -dbg packages properly or not, but they are installed.


2019.06.10 14:00:38.335 *D* [poll.conf          ] Starting configuration poll for node analyzer.example.co.uk (ID: 1586)
2019.06.10 14:00:38.335 *D* [poll.conf          ] ConfPoll(analyzer.example.co.uk): checking for NetXMS agent Flags={02000000} DynamicFlags={00000003}
2019.06.10 14:00:38.335 *D* [poll.conf          ] ConfPoll(analyzer.example.co.uk): calling SnmpCheckCommSettings()

Thread 70 "$POLLERS/WRK" received signal SIGBUS, Bus error.
[Switching to Thread 0x7fffe46bd700 (LWP 13158)]
0x00007fffea4777ce in HandlerPhysicalPorts (var=<optimised out>, snmp=snmp@entry=0x7fffe2309500, arg=arg@entry=0x7fffe46a9d50) at sb.cpp:113
113     sb.cpp: No such file or directory.
(gdb) bt
#0  0x00007fffea4777ce in HandlerPhysicalPorts (var=<optimised out>, snmp=snmp@entry=0x7fffe2309500, arg=arg@entry=0x7fffe46a9d50) at sb.cpp:113
#1  0x00007ffff6871da7 in SnmpWalk (transport=transport@entry=0x7fffe2309500, rootOid=rootOid@entry=0x7fffe46a9ae0, rootOidLen=14,
    handler=handler@entry=0x7fffea477600 <HandlerPhysicalPorts(SNMP_Variable*, SNMP_Transport*, void*)>, userArg=userArg@entry=0x7fffe46a9d50, logErrors=<optimised out>, failOnShutdown=false) at util.cpp:334
#2  0x00007ffff6871eea in SnmpWalk (transport=transport@entry=0x7fffe2309500, rootOid=rootOid@entry=0x7fffea478310 L".1.3.6.1.4.1.9.6.1.101.53.3.1.5",
    handler=handler@entry=0x7fffea477600 <HandlerPhysicalPorts(SNMP_Variable*, SNMP_Transport*, void*)>, userArg=userArg@entry=0x7fffe46a9d50, logErrors=logErrors@entry=false, failOnShutdown=failOnShutdown@entry=false)
    at util.cpp:265
#3  0x00007fffea477a3a in CiscoSbDriver::getPhysicalPortLayout (this=<optimised out>, snmp=0x7fffe2309500, layout=0x7fffe46a9d50) at sb.cpp:126
#4  0x00007fffea477aa2 in CiscoSbDriver::getModuleLayout (this=<optimised out>, snmp=<optimised out>, attributes=<optimised out>, driverData=<optimised out>, module=1, layout=0x7fffe46ae340) at sb.cpp:217
#5  0x00007ffff7a22b5c in Node::confPollSnmp (this=0x7fffe6729000, rqId=0) at node.cpp:3486
#6  0x00007ffff7a2cfe4 in Node::configurationPoll (this=this@entry=0x7fffe6729000, pSession=pSession@entry=0x0, rqId=rqId@entry=0, poller=poller@entry=0x7fffec83c500, maskBits=maskBits@entry=0) at node.cpp:2724
#7  0x00007ffff7a2da93 in Node::configurationPoll (this=0x7fffe6729000, poller=0x7fffec83c500) at node.cpp:2652
#8  0x00007ffff79b9e41 in __ThreadPoolExecute_Wrapper_1<Node, PollerInfo*> (arg=0x7fffec80e160) at ../../../include/nms_threads.h:1215
#9  0x00007ffff74eb398 in ProcessSerializedRequests (data=0x7fffec80e180) at tp.cpp:466
#10 0x00007ffff74eb19b in WorkerThread (arg=0x7fffec803640) at tp.cpp:186
#11 0x00007ffff6e706db in start_thread (arg=0x7fffe46bd700) at pthread_create.c:463
#12 0x00007ffff6b9988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) print module
$1 = (SB_MODULE_LAYOUT *) 0x7fffe46a9d50
(gdb) print *module
$2 = {index = 1, minIfIndex = 49, maxIfIndex = 107, rows = 2, columns = 12, interfaces = {49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 0 <repeats 20 times>, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
    0 <repeats 20 times>}}
(gdb) print row
$3 = <optimised out>
(gdb) print column
$4 = <optimised out>
(gdb) print *var
value has been optimised out
(gdb) print *requestprint *request
No symbol "requestprint" in current context.
(gdb) print *request
No symbol "operator*" in current context.
(gdb) print *response
$5 = {m_version = 1, m_command = 2, m_variables = 0x7fffe2323780, m_pEnterprise = 0x0, m_trapType = 0, m_specificTrap = 0, m_dwTimeStamp = 0, m_dwAgentAddr = 0, m_dwRqId = 1044, m_dwErrorCode = 0, m_dwErrorIndex = 0,
  m_msgId = 0, m_msgMaxSize = 65536, m_contextEngineId = '\000' <repeats 255 times>, m_contextEngineIdLen = 0, m_contextName = '\000' <repeats 255 times>, m_salt = "\000\000\000\000\000\000\000", m_reportable = true,
  m_flags = 0 '\000', m_authObject = 0x7fffe2301f30 "8eareWa7ching", m_authoritativeEngine = {m_id = '\000' <repeats 255 times>, m_idLen = 0, m_engineBoots = 0, m_engineTime = 0}, m_securityModel = 1,
  m_signature = '\000' <repeats 11 times>, m_signatureOffset = 0}
(gdb) quit

troffasky

Version 3 seems to have fixed this. Been up for 45 minutes now.