Issues after upgrade to NetXms 1.2.17

Started by aron, December 03, 2014, 11:15:26 AM

Previous topic - Next topic

aron

Hello

We are experiencing with crashing netxms after the upgrade to 1.2.17. This generally happens after 10 minutes of loading, setting up the server for CrashDumpLog does not produce a log file for program crash. Running the server with debug mode 9 also does not give any clear indication before netxms stops. We have currently tried this on a windows instance and on a linux instance, both appear to have the same issue. The only current indication of issues are from;

[03-Dec-2014 08:57:48.626] [ERROR] SQL query failed (Query = "INSERT INTO alarm_events (alarm_id,event_id,event_code,event_name,severity,source_object_id,event_timestamp,message) VALUES (?,?,?,?,?,?,?,?)"): Duplicate entry '405363-30146854' for key 'PRIMARY'
[03-Dec-2014 08:57:48.639] [ERROR] SQL query failed (Query = "INSERT INTO alarm_events (alarm_id,event_id,event_code,event_name,severity,source_object_id,event_timestamp,message) VALUES (?,?,?,?,?,?,?,?)"): Duplicate entry '405363-30146855' for key 'PRIMARY'
[03-Dec-2014 08:57:48.668] [ERROR] SQL query failed (Query = "INSERT INTO alarm_events (alarm_id,event_id,event_code,event_name,severity,source_object_id,event_timestamp,message) VALUES (?,?,?,?,?,?,?,?)"): Duplicate entry '405941-30146859' for key 'PRIMARY'
[03-Dec-2014 08:57:48.695] [ERROR] SQL query failed (Query = "INSERT INTO alarm_events (alarm_id,event_id,event_code,event_name,severity,source_object_id,event_timestamp,message) VALUES (?,?,?,?,?,?,?,?)"): Duplicate entry '405941-30146862' for key 'PRIMARY'
[03-Dec-2014 08:57:48.714] [ERROR] SQL query failed (Query = "INSERT INTO alarm_events (alarm_id,event_id,event_code,event_name,severity,source_object_id,event_timestamp,message) VALUES (?,?,?,?,?,?,?,?)"): Duplicate entry '405363-30146864' for key 'PRIMARY'
[03-Dec-2014 08:57:48.725] [ERROR] SQL query failed (Query = "INSERT INTO alarm_events (alarm_id,event_id,event_code,event_name,severity,source_object_id,event_timestamp,message) VALUES (?,?,?,?,?,?,?,?)"): Duplicate entry '406032-30146865' for key 'PRIMARY'
[03-Dec-2014 08:57:48.737] [ERROR] SQL query failed (Query = "INSERT INTO alarm_events (alarm_id,event_id,event_code,event_name,severity,source_object_id,event_timestamp,message) VALUES (?,?,?,?,?,?,?,?)"): Duplicate entry '405363-30146866' for key 'PRIMARY'

We have made no amendments to the Database structure. Deleting the content of the alarm_events table completely makes no difference. 'nxdbmgr check' passes all the current checks.

Not to sure what the next step should be. Not particularly clear how the BIGINT is shown as a hyphenated number?

Regards

Aron

Victor Kirhenshtein

Hi,

if you have Linux installation, can you please run it under gdb?

Commands would be like this:

gdb /path/to/netxmsd

will show (gdb) prompt, then

run -D5

server will run in foreground. When crash will happen, (gdb) prompt will be shown again. Type

bt

and send me an output.

Best regards,
Victor

aron

Hello

Found a core dump for which I have a backtrace of, awaiting the gdb version to run and provide a live version;

(gdb) bt
#0  Node::topologyPoll (this=0xcbd8090, pSession=0x0, dwRqId=0, nPoller=115) at node.cpp:5329
#1  0xb7666ba1 in TopologyPoller (arg=0x73) at poll.cpp:586
#2  0xb7335d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#3  0xb6f999de in clone () from /lib/i386-linux-gnu/libc.so.6
(gdb) bt full
#0  Node::topologyPoll (this=0xcbd8090, pSession=0x0, dwRqId=0, nPoller=115) at node.cpp:5329
        peerNode = 0x0
        ifaceFound = <optimized out>
        iface = 0x18eb61d8
        i = <optimized out>
        fdb = <optimized out>
        nbs = 0xa26a7f60
#1  0xb7666ba1 in TopologyPoller (arg=0x73) at poll.cpp:586
        node = 0xcbd8090
        szBuffer = L"poll: BLC-SW1 [110]\000]\000 [50574]\000]\000\000\060]\000]", '\000' <repeats 89 times>
#2  0xb7335d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0xb6f999de in clone () from /lib/i386-linux-gnu/libc.so.6
No symbol table info available.
(gdb) thread apply all bt

Summary look at the node does not imply any particular issue;
Seems to be un-connected with the other SQL errors which are happening -D5

[03-Dec-2014 11:40:44.198] [ERROR] SQL query failed (Query = "INSERT INTO alarm_events (alarm_id,event_id,event_code,event_name,severity,source_object_id,event_timestamp,message) VALUES (?,?,?,?,?,?,?,?)"): Duplicate entry '405363-30157884' for key 'PRIMARY'
[Thread 0xa5107b40 (LWP 28681) exited]
[03-Dec-2014 11:40:44.272] [DEBUG] EVENT 52 (ID:30157885 F:0x0001 S:4 TAG:"") FROM Ldn-NetXMS: Database query failed (Query: INSERT INTO alarm_events (alarm_id,event_id,event_code,event_name,severity,source_object_id,event_timestamp,message) VALUES (?,?,?,?,?,?,?,?); Error: Duplicate entry '405363-30157075' for key 'PRIMARY')
[Thread 0x824fcb40 (LWP 27788) exited]

Regards

Aron

Victor Kirhenshtein

Hi,

it's a bug that is already fixed in development branch (will be released as 2.0-M1 soon). In the meantime you can patch server code manually if you are building it from source:

In file src/server/core/node.cpp find line


             Node *peerNode = (Node *)FindObjectById(iface->getPeerNodeId(), OBJECT_NODE);


(should be around line 5329). Immediately after that line add the following code:


            if (peerNode == NULL)
            {
               iface->clearPeer();
               continue;
            }


Best regards,
Victor

aron

Thank you,

Will apply the fix now.

Regards

Aron