News:

We really need your input in this questionnaire

Main Menu

NetXMS 4.4.1 server failed

Started by lhpaladin, August 30, 2023, 05:38:57 AM

Previous topic - Next topic

lhpaladin

The server started crashing. The netxms process starts and stops shortly after. For users the message connection refused appears. 
Using nxdbmgr check no longer shows errors, however there are SQL errors in the log file with lines similar to this:

SQL query failed (Query = "SELECT tdata_value,tdata_timestamp FROM tdata_25925 WHERE item_id=9696 ORDER BY tdata_timestamp DESC LIMIT 1"): 42P01 ERROR: relation "tdata_25925" does not exist
LINE 1: SELECT tdata_value,tdata_timestamp FROM tdata_25925 WHERE it...

Could this be the cause of the service execution crash?
It's in the latest version.


Thanks,

Filipp Sudanov

nxdbmgr check should be able to fix this specific sql error. And also this error should not crash the server.

Please try starting netxms server in foreground:
systemctl stop netxmsd
netxmsd -D 6

this way server will be putting it's log on the screen. When it crashes, you should see some info on the reason it crashed. Try running it a couple of times that way to see if it's crashing on the same log line.

lhpaladin

Tested several times using debug which shows an error but unfortunately it is not clear to me:


*D* [ndd.common         ] NetworkDeviceDriver::getInterfaces(0x7f60e65fcec0): completed, ifList=0x7f60e65ba100
*** stack smashing detected ***: terminated




Thanks,

lhpaladin

Can anyone explain this error, problem with the NetXMS source code?


Thanks,

Victor Kirhenshtein

This looks like a bug in the server. Please try upgrading to 4.4.2, if this will not help, please try to run netxmsd under debugger and provide stack trace after crash.

Best regards,
Victor

Malutki_27

#5
Hi Guys,

I'm facing the same issue on the 4.4.2 version.

The Netxms server is restarting all the time

From gdb I receive that

Thread 47 "$POLLERS/WRK" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff1f55640 (LWP 10295)]
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737252775488) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.



BR
Marcin

lhpaladin

Hi,


In my case, I set all nodes as unmanaged and changed them to managed little by little until I found the node that caused the error. Then I noticed that the node crashes the system in the configuration poll.

uasy

I have problem too. My enveropment:
Docker, Ubuntu 18.04, PG 12.5, NetXMS 4.4.2

#0  __GI_abort () at abort.c:107
        act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {18446744073709551615 <repeats 16 times>}}, sa_flags = 0,
          sa_restorer = 0x0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
        __cnt = <optimized out>
        __set = <optimized out>
        __cnt = <optimized out>
        __set = <optimized out>
#1  0x00007131ffe4d837 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7131fff7a869 "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:181
        ap = {{gp_offset = 32, fp_offset = 0, overflow_arg_area = 0x7131dee08160, reg_save_area = 0x7131dee080f0}}
        fd = <optimized out>
        list = <optimized out>
        nlist = <optimized out>
        cp = <optimized out>
        written = <optimized out>
        on_2 = <optimized out>
        next = <optimized out>
        str = <optimized out>
        len = <optimized out>
        newp = <optimized out>
        iov = <optimized out>
        total = <optimized out>
        cnt = <optimized out>
        buf = <optimized out>
        wp = <optimized out>
        old = <optimized out>
        cnt = <optimized out>
        result = <optimized out>
#2  0x00007131ffef8b31 in __GI___fortify_fail_abort (need_backtrace=need_backtrace@entry=false, msg=msg@entry=0x7131fff7a847 "stack smashing detected")
    at fortify_fail.c:33
No locals.
#3  0x00007131ffef8af2 in __stack_chk_fail () at stack_chk_fail.c:29
No locals.
#4  0x000071320127ac1a in NetworkDeviceDriver::getInterfaces(SNMP_Transport*, NObject*, DriverData*, bool) () from /usr/lib/x86_64-linux-gnu/libnxsrv.so.44
No symbol table info available.
#5  0x00007132016410f2 in Node::getInterfaceList() () from /usr/lib/x86_64-linux-gnu/libnxcore.so.44
No symbol table info available.
#6  0x0000713201658f8b in Node::updateInterfaceConfiguration(unsigned int) () from /usr/lib/x86_64-linux-gnu/libnxcore.so.44
No symbol table info available.
#7  0x000071320165b8a5 in Node::configurationPoll(PollerInfo*, ClientSession*, unsigned int) () from /usr/lib/x86_64-linux-gnu/libnxcore.so.44
No symbol table info available.
#8  0x00007132016bb3cd in Pollable::doConfigurationPoll(PollerInfo*) () from /usr/lib/x86_64-linux-gnu/libnxcore.so.44
No symbol table info available.
#9  0x00007132015634a1 in ?? () from /usr/lib/x86_64-linux-gnu/libnxcore.so.44
No symbol table info available.
#10 0x0000713200df7dce in ?? () from /usr/lib/x86_64-linux-gnu/libnetxms.so.44
No symbol table info available.
#11 0x0000713200df7c16 in ?? () from /usr/lib/x86_64-linux-gnu/libnetxms.so.44
No symbol table info available.
#12 0x0000713200df9b9a in ?? () from /usr/lib/x86_64-linux-gnu/libnetxms.so.44
No symbol table info available.
#13 0x000071320075d6db in start_thread (arg=0x7131dee0b700) at pthread_create.c:463
        pd = 0x7131dee0b700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {124459006605056, 1317378752416278104, 124459006599488, 0, 124459395576720, 124459381890880, -1140544028988971432,
                -1140461746747644328}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#14 0x00007131ffee561f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Filipp Sudanov

QuoteI have problem too. My enveropment:
Docker, Ubuntu 18.04, PG 12.5, NetXMS 4.4.2
Do you have netxms-server-dbg package installed on your system? If not, please install it and make the backtrace again.

Alex Kirhenshtein

#9
Quote from: uasy on October 02, 2023, 10:59:16 AMI have problem too. My enveropment:
Docker, Ubuntu 18.04, PG 12.5, NetXMS 4.4.2

Try installing 4.4.2.30 from unstable repo.

Add to sources:
 
deb http://packages.netxms.org/ubuntu bionic unstable

uasy

with netxms-server-dbg package

#0  0x00007f032e9128e0 in abort () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x00007f032e95b837 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#2  0x00007f032ea06b31 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#3  0x00007f032ea06af2 in __stack_chk_fail () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#4  0x00007f032fd88c1a in NetworkDeviceDriver::getInterfaces (this=0x7f0330652720 <s_defaultDriver>, snmp=0x7f031f86b6e0, node=0x7f0319568000, driverData=<optimized out>, useIfXTable=true)
    at ndd.cpp:658
        success = <optimized out>
        interfaceCount = 8
        ifList = 0x7f031f1537e0
#5  0x00007f033014f0f2 in Node::getInterfaceList (this=this@entry=0x7f0319568000) at node.cpp:1616
        useIfXTable = true
        snmpTransport = 0x7f031f86b6e0
        ifList = <optimized out>
#6  0x00007f0330166f8b in Node::updateInterfaceConfiguration (this=this@entry=0x7f0319568000, requestId=requestId@entry=0) at node.cpp:6258
        hasChanges = false
        ifList = <optimized out>
#7  0x00007f03301698a5 in Node::configurationPoll (this=0x7f0319568000, poller=0x7f031f9a9400, session=0x0, rqId=0) at node.cpp:4582
        buffer = L"Hardware: Intel64 Family 6 Model 62 Stepping 4 AT/AT COMPATIBLE - Software: Windows Version 6.3 (Build 14393 Multiprocessor Free)", '\000' <repeats 126 times>
        hypervisorType = L"\x26c089c0缃\xfb6aac00\x98305d7PO\x3032ac98缃@\000\000\000\x195683b8缃\x19568b10缃\x3063d840缃\x2e9907d0缃\xffffffff\x1fffffff\xefc4b90缃\000\000\000缃\xefc4b90缃\a"
        hypervisorInfo = L"\000\070\000\000\000\000N 00° 00' 00.000\"\000.2E 00° 00' 00.000\"\000\000\000\001", '\000' <repeats 31 times>, "\x2f90829b缃\000\000\xfb6aac00\x98305d7\062\060\062\063.10.02 14:41:39.009\000d.%d.%d\000mpPollAddress(%S [%u], %S, %S):", '\000' <repeats 108 times>
        type = <optimized out>
        __pollStartTime = 1696246899094
        __pollState = 0x7f0319568ca8
        oldCapabilities = <optimized out>
        modified = 0
#8  0x00007f03301c93cd in Pollable::doConfigurationPoll (this=0x7f0319568af8, poller=0x7f031f9a9400) at pollable.cpp:116
No locals.
#9  0x00007f03300714a1 in __ThreadPoolExecute_Wrapper_1<Pollable, PollerInfo*> (arg=0x7f031f8b0f20) at ../../../include/nms_threads.h:1191
        wd = 0x7f031f8b0f20
#10 0x00007f032f905dce in ProcessSerializedRequests (data=0x7f031f9226c0) at tp.cpp:487
        rq = 0x7f0324fad8d0
#11 0x00007f032f905c16 in WorkerThread (threadInfo=0x7f0324aed710) at tp.cpp:199
        rq = <optimized out>
        waitTime = 69515
        p = 0x7f0324fa8600
        threadName = "$POLLERS/WRK\000\000\000"
#12 0x00007f032f907b9a in ThreadCreate_Wrapper_1<WorkerThreadInfo*> (context=0x7f0324aed720) at ../../include/nms_threads.h:539
        wd = 0x7f0324aed720
#13 0x00007f032f26b6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#14 0x00007f032e9f361f in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.

uasy

#11
from unstable repo can't build

Step 5/14 : RUN echo "deb http://packages.netxms.org/ubuntu bionic unstable" > /etc/apt/sources.list.d/netxms.list
Step 6/14 : RUN wget -q -O - http://packages.netxms.org/netxms.gpg | apt-key add -
Warning: apt-key output should not be parsed (stdout is not a terminal)
OK
Step 7/14 : RUN apt-get update[/font]
Get:1 http://packages.netxms.org/ubuntu bionic InRelease [362 kB]
Hit:2 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:3 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:5 http://packages.netxms.org/ubuntu bionic/unstable amd64 Packages [11.7 kB]
Get:6 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [83.3 kB]
Fetched 634 kB in 1s (637 kB/s)
Reading package lists...
Step 8/14 : RUN apt-get -y install netxms-server-dbg netxms-server netxms-dbdrv-pgsql netxms-agent &&     apt-get clean

Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
 netxms-agent : Depends: libnxmodbus but it is not installable
 netxms-server : Depends: libnxmodbus but it is not installable
E: Unable to correct problems, you have held broken packages.

Alex Kirhenshtein

Quote from: uasy on October 02, 2023, 02:53:21 PMfrom unstable repo can't build

You need both main and unstable.
We don't push dependencies like our fork of libmodbus to unstable repo.

uasy

4.4.2-30 Work fine, without crashes..