netxmsd 3.1 crashing on startup

Started by jermudgeon, December 04, 2019, 05:59:16 PM

Previous topic - Next topic

jermudgeon

 I'm getting a segfault on startup:

<snip>
DCObject::filterInstanceList(.1.3.6.1.4.1.2636.3.60.1.1.1.1.7.{instance} [242469]): instance "548" removed by filtering script
2019.12.04 06:56:55.606 *D* [                   ] DCObject::filterInstanceList(.1.3.6.1.4.1.2636.3.60.1.1.1.1.8.{instance} [243303]): instance "507" name set to "0"
2019.12.04 06:56:55.606 *D* [                   ] DCObject::filterInstanceList(.1.3.6.1.4.1.2636.3.60.1.1.1.1.8.{instance} [243303]): instance "507" removed by filtering script
2019.12.04 06:56:55.609 *D* [                   ] DataCollectionTarget::doInstanceDiscovery(js2.jber7079.mxu.acsalaska.net [18199]): read 25 values
2019.12.04 06:56:55.610 *D* [                   ] DCObject::filterInstanceList(.1.3.6.1.4.1.2636.3.60.1.1.1.1.1.{instance} [243281]): instance "514" name set to "???"
2019.12.04 06:56:55.610 *D* [                   ] DCObject::filterInstanceList(.1.3.6.1.4.1.2636.3.60.1.1.1.1.1.{instance} [243281]): instance "514" removed by filtering script

Thread 372 "$POLLERS/WRK" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff74e2e700 (LWP 6249)]
0x00007ffff79bfbfe in Node::reconcileWithDuplicateNode(Node*) () from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
(gdb) bt
#0  0x00007ffff79bfbfe in Node::reconcileWithDuplicateNode(Node*) () from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
#1  0x00007ffff79d73c4 in Node::configurationPoll(PollerInfo*, ClientSession*, unsigned int) ()
   from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
#2  0x00007ffff796e3d2 in DataCollectionTarget::configurationPollWorkerEntry(PollerInfo*, ClientSession*, unsigned int) ()
   from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
#3  0x00007ffff793f621 in ?? () from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
#4  0x00007ffff698d337 in ?? () from /usr/lib/x86_64-linux-gnu/libnetxms.so.31
#5  0x00007ffff698d10e in ?? () from /usr/lib/x86_64-linux-gnu/libnetxms.so.31
#6  0x00007ffff4ec04a4 in start_thread (arg=0x7fff74e2e700) at pthread_create.c:456
#7  0x00007ffff3a15d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Filipp Sudanov

What version are you using? Is it from packages, downloaded from netxms web site or you compiled it yourself? What Linux version are you using?

Does it segfaults all the time, or it happened only once?

If segfaults repeat, can you set DebugLevel=9 in server config, launch the server and attach the resulting log file?



jermudgeon

From packages;

netxms-agent/stretch,now 3.1.242-2 amd64 [installed]
netxms-agent-java/stretch,now 3.1.242-2 amd64 [installed,automatic]
netxms-base/stretch,now 3.1.242-2 amd64 [installed,automatic]
netxms-client/stretch,now 3.1.242-2 amd64 [installed]
netxms-console/now 3.0.2258-1 amd64 [installed,local]
netxms-dbdrv-pgsql/stretch,now 3.1.242-2 amd64 [installed]
netxms-dbdrv-sqlite3/stretch,now 3.1.242-2 amd64 [installed,automatic]
netxms-release/now 1.5 all [installed,local]
netxms-server/stretch,now 3.1.242-2 amd64 [installed]

Segfaults whenever merge detection is enabled.

bt  full:

Thread 523 "$POLLERS/WRK" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff7282e700 (LWP 19692)]
0x00007ffff79bfbfe in Node::reconcileWithDuplicateNode(Node*) () from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
(gdb) bt full
#0  0x00007ffff79bfbfe in Node::reconcileWithDuplicateNode(Node*) () from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
No symbol table info available.
#1  0x00007ffff79d73c4 in Node::configurationPoll(PollerInfo*, ClientSession*, unsigned int) ()
   from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
No symbol table info available.
#2  0x00007ffff796e3d2 in DataCollectionTarget::configurationPollWorkerEntry(PollerInfo*, ClientSession*, unsigned int) ()
   from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
No symbol table info available.
#3  0x00007ffff793f621 in ?? () from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
No symbol table info available.
#4  0x00007ffff698d337 in ?? () from /usr/lib/x86_64-linux-gnu/libnetxms.so.31
No symbol table info available.
#5  0x00007ffff698d10e in ?? () from /usr/lib/x86_64-linux-gnu/libnetxms.so.31
No symbol table info available.
#6  0x00007ffff4ec04a4 in start_thread (arg=0x7fff7282e700) at pthread_create.c:456
        __res = <optimized out>
        pd = 0x7fff7282e700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140735114569472, 6749483827744397886, 140736350368030, 140736350368031,
                140735114305536, 3, -6749736120991163842, -6749468538011329986}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0,
              0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#7  0x00007ffff3a15d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
No locals.
(gdb)

jermudgeon

2019.12.04 07:52:28.425 *D* [                   ] Started topology poll for node <snip> [12506]

Thread 495 "$POLLERS/WRK" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff7258a700 (LWP 25764)]
Node::reconcileWithDuplicateNode (this=this@entry=0x7fffb038a000, node=0x7fffa56ed000) at node.cpp:3512
3512   node.cpp: No such file or directory.
(gdb) bt full
#0  Node::reconcileWithDuplicateNode (this=this@entry=0x7fffb038a000, node=0x7fffa56ed000) at node.cpp:3512
        i = 6
#1  0x00007ffff79d73c4 in Node::configurationPoll (this=0x7fffb038a000, poller=0x7fff7968bf80, session=0x0, rqId=0) at node.cpp:3230
        duplicateNode = 0x7fffa56ed000
        reason = L"Primary IP address 23.235.105.154 of node js1.northpointe01.mxu.acsalaska.net [43621] found on interface vlan.1997 of node js1.northpointe.cpe.lec.mgt [45588]", '\000' <repeats 152 times>...
        dcr = <optimized out>
        hypervisorType = L'\000' <repeats 31 times>
        hypervisorInfo = L"Primary IP address 23.235.105.154 of node js1.northpointe01.mxu.acsalaska.net [43621] found on interface vlan.1997 of node js1.northpointe.cpe.lec.mgt [45588]", '\000' <repeats 97 times>
        type = <optimized out>
        oldCapabilities = <optimized out>
        szBuffer = L'\000' <repeats 3494 times>...
        modified = 0
        __pollStartTime = 1575478327685
        __pollState = 0x7fffb038a768
#2  0x00007ffff796e3d2 in DataCollectionTarget::configurationPollWorkerEntry (this=0x7fffb038a000, poller=0x7fff7968bf80, session=0x0, rqId=0)
    at dctarget.cpp:1560
No locals.
#3  0x00007ffff793f621 in __ThreadPoolExecute_Wrapper_1<DataCollectionTarget, PollerInfo*> (arg=0x7fff79691420)
    at ../../../include/nms_threads.h:1346
        wd = 0x7fff79691420
#4  0x00007ffff698d337 in ProcessSerializedRequests (data=0x7fff796913c0) at tp.cpp:466
        rq = 0x7fff79691440
#5  0x00007ffff698d10e in WorkerThread (arg=0x7fff7a780390) at tp.cpp:186
        rq = 0x7fff796913e0
        waitTime = 256899072
        p = 0x7fff7dc00000
        q = 0x7fff7dc06000
        threadName = "$POLLERS/WRK\000\000\000"
#6  0x00007ffff4ec04a4 in start_thread (arg=0x7fff7258a700) at pthread_create.c:456
        __res = <optimized out>
        pd = 0x7fff7258a700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140735111800576, 7851966113022693045, 140736350368030, 140736350368031, 140735111536640, 3,
                -7851694838165582155, -7851990439467594059}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0,
              cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
---Type <return> to continue, or q <return> to quit---
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#7  0x00007ffff3a15d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
No locals.
(gdb)

Tatjana Dubrovica

Problem fixed will be included in next patch release