NetXMS Support Forum

English Support => General Support => Topic started by: MarcusH on September 14, 2023, 05:09:31 PM

Title: 4.4.2 Crash
Post by: MarcusH on September 14, 2023, 05:09:31 PM
Hi,

Upgraded from 4.2.395 to 4.4.2 and now server crashes on some strange sqlquery, we had this query on 4.2.395 also but never caused crash.

Any idea on what this query is?

2023.09.14 16:04:19.220 *E* [db.drv             ] SQL query failed (Query = "UPDATE nodes SET primary_ip=?,primary_name=?,snmp_port=?,capabilities=?,snmp_version=?,community=?,agent_port=?,secret=?,snmp_oid=?,uname=?,agent_version=?,platform_name=?,poller_node_id=?,zone_guid=?,proxy_node=?,snmp_proxy=?,icmp_proxy=?,required_polls=?,use_ifxtable=?,usm_auth_password=?,usm_priv_password=?,usm_methods=?,snmp_sys_name=?,bridge_base_addr=?,down_since=?,driver_name=?,rack_image_front=?,rack_position=?,rack_height=?,physical_container_id=?,boot_time=?,agent_cache_mode=?,snmp_sys_contact=?,snmp_sys_location=?,last_agent_comm_time=?,syslog_msg_count=?,snmp_trap_count=?,node_type=?,node_subtype=?,ssh_login=?,ssh_password=?,ssh_key_id=?,ssh_port=?,ssh_proxy=?,port_rows=?,port_numbering_scheme=?,agent_comp_mode=?,tunnel_id=?,lldp_id=?,fail_time_snmp=?,fail_time_agent=?,fail_time_ssh=?,rack_orientation=?,rack_image_rear=?,agent_id=?,agent_cert_subject=?,hypervisor_type=?,hypervisor_info=?,icmp_poll_mode=?,chassis_placement_config=?,vendor=?,product_code=?,product_name=?,product_version=?,serial_number=?,cip_device_type=?,cip_status=?,cip_state=?,eip_proxy=?,eip_port=?,hardware_id=?,cip_vendor_code=?,agent_cert_mapping_method=?,agent_cert_mapping_data=?,snmp_engine_id=?,snmp_context_engine_id=?,syslog_codepage=?,snmp_codepage=?,ospf_router_id=?,mqtt_proxy=?,modbus_proxy=?,modbus_tcp_port=?,modbus_unit_id=? WHERE id=?"): [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]String or binary data would be truncated.
2023.09.14 16:04:19.220 *D* [event.proc         ] EVENT SYS_DB_QUERY_FAILED [52] at {0} (ID:57617035 F:0x0001 S:4 TAGS:"") FROM netxmsd: Database query failed (Query: UPDATE nodes SET primary_ip=?,primary_name=?,snmp_port=?,capabilities=?,snmp_version=?,community=?,agent_port=?,secret=?,snmp_oid=?,uname=?,agent_version=?,platform_name=?,poller_node_id=?,zone_guid=?,proxy_node=?,snmp_proxy=?,icmp_proxy=?,required_polls=?,use_ifxtable=?,usm_auth_password=?,usm_priv_password=?,usm_methods=?,snmp_sys_name=?,bridge_base_addr=?,down_since=?,driver_name=?,rack_image_front=?,rack_position=?,rack_height=?,physical_container_id=?,boot_time=?,agent_cache_mode=?,snmp_sys_contact=?,snmp_sys_location=?,last_agent_comm_time=?,syslog_msg_count=?,snmp_trap_count=?,node_type=?,node_subtype=?,ssh_login=?,ssh_password=?,ssh_key_id=?,ssh_port=?,ssh_proxy=?,port_rows=?,port_numbering_scheme=?,agent_comp_mode=?,tunnel_id=?,lldp_id=?,fail_time_snmp=?,fail_time_agent=?,fail_time_ssh=?,rack_orientation=?,rack_image_rear=?,agent_id=?,agent_cert_subject=?,hypervisor_type=?,hypervisor_info=?,icmp_poll_mode=?,chassis_placement_config=?,vendor=?,product_code=?,product_name=?,product_version=?,serial_number=?,cip_device_type=?,cip_status=?,cip_state=?,eip_proxy=?,eip_port=?,hardware_id=?,cip_vendor_code=?,agent_cert_mapping_method=?,agent_cert_mapping_data=?,snmp_engine_id=?,snmp_context_engine_id=?,syslog_codepage=?,snmp_codepage=?,ospf_router_id=?,mqtt_proxy=?,modbus_proxy=?,modbus_tcp_port=?,modbus_unit_id=? WHERE id=?; Error: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]String or binary data would be truncated.)
Title: Re: 4.4.2 Crash
Post by: Alex Kirhenshtein on September 14, 2023, 10:19:09 PM
Server crashed, or just failed query? If it's crashed, please share stack trace from core file or just sent us core file and we'll check it.
Title: Re: 4.4.2 Crash
Post by: MarcusH on September 15, 2023, 10:03:20 AM
Running server on docker making core dump a challange.

I was about to install server on a clean temp VM and test it without docker and then i noticed an issue.

Why does netxms still depend on libssl1.1 and not libssl3?
Title: Re: 4.4.2 Crash
Post by: Alex Kirhenshtein on September 15, 2023, 10:20:53 AM
Quote from: MarcusH on September 15, 2023, 10:03:20 AMWhy does netxms still depend on libssl1.1 and not libssl3?

Because Debian 11 ships with OpenSSL 1.1:

root@da539131bae5:/# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye
root@da539131bae5:/# apt-cache search libssl
libssl-ocaml - OCaml bindings for OpenSSL (runtime)
libssl-ocaml-dev - OCaml bindings for OpenSSL
libssl-dev - Secure Sockets Layer toolkit - development files
libssl-doc - Secure Sockets Layer toolkit - development documentation
libssl1.1 - Secure Sockets Layer toolkit - shared libraries
libssl-utils-clojure - library for SSL certificate management on the JVM
root@da539131bae5:/# apt-cache show libssl-dev|grep Version
Version: 1.1.1n-0+deb11u5
Version: 1.1.1n-0+deb11u4


There are no reference to libssl1 in the official packages for debian 12:

root@d483efbfc136:~# ldd /usr/bin/netxmsd | grep ssl
        libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007ffb9ae88000)
root@d483efbfc136:~# dpkg -l netxms-server
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                Version      Architecture Description
+++-===================-============-============-=================================
ii  netxms-server:amd64 4.4.2-1      amd64        meta package
Title: Re: 4.4.2 Crash
Post by: MarcusH on September 15, 2023, 10:30:30 AM
Thanks, this was my bad i imported old source list for netxms

On the issue i reverted back to 4.2.395 making at least server stable and this strange query is outputed to the server console and only this i have nothing else outputed to server console. Any way to figure out what creates this query?

Title: Re: 4.4.2 Crash
Post by: Alex Kirhenshtein on September 15, 2023, 10:31:49 AM
Quote from: MarcusH on September 15, 2023, 10:03:20 AMRunning server on docker making core dump a challange.
it's rather straighforward if you can control host's core pattern

sysctl -w kernel.core_pattern='/core/core.%e.%p.%t'
docker volume create core_vol
docker run --ulimit core=-1 --mount source=core_vol,target=/core container
sysctl -w kernel.core_pattern=core # reset it back to the default
Title: Re: 4.4.2 Crash
Post by: MarcusH on September 15, 2023, 10:36:51 AM
Quote from: Alex Kirhenshtein on September 15, 2023, 10:31:49 AM
Quote from: MarcusH on September 15, 2023, 10:03:20 AMRunning server on docker making core dump a challange.
it's rather straighforward if you can control host's core pattern

sysctl -w kernel.core_pattern='/core/core.%e.%p.%t'
docker volume create core_vol
docker run --ulimit core=-1 --mount source=core_vol,target=/core container
sysctl -w kernel.core_pattern=core # reset it back to the default

I have a core dump but it gives a lot of reference error and shows no stack trace.
Title: Re: 4.4.2 Crash
Post by: Alex Kirhenshtein on September 15, 2023, 10:38:19 AM
Quote from: MarcusH on September 15, 2023, 10:36:51 AMI have a core dump but it gives a lot of reference error and shows no stack trace.

Have you installed netxms-dbg package? It contains all debug symbols for the product.
Title: Re: 4.4.2 Crash
Post by: Alex Kirhenshtein on September 15, 2023, 10:40:07 AM
Quote from: MarcusH on September 15, 2023, 10:30:30 AMAny way to figure out what creates this query?
It's executed by object syncer thread, which saves node changes back into the database.

From the error message it's unclear which field is not accepted by the SQL server, this link might help with tracing it: https://stackoverflow.com/a/62905763
Title: Re: 4.4.2 Crash
Post by: MarcusH on September 15, 2023, 10:43:43 AM
Quote from: Alex Kirhenshtein on September 15, 2023, 10:38:19 AM
Quote from: MarcusH on September 15, 2023, 10:36:51 AMI have a core dump but it gives a lot of reference error and shows no stack trace.

Have you installed netxms-dbg package? It contains all debug symbols for the product.


I have not i will have look at this on the test VM
Title: Re: 4.4.2 Crash
Post by: MarcusH on September 15, 2023, 04:45:21 PM
Quote from: Alex Kirhenshtein on September 15, 2023, 10:40:07 AM
Quote from: MarcusH on September 15, 2023, 10:30:30 AMAny way to figure out what creates this query?
It's executed by object syncer thread, which saves node changes back into the database.

From the error message it's unclear which field is not accepted by the SQL server, this link might help with tracing it: https://stackoverflow.com/a/62905763
I though all the =? values was for log obfuscation but even the trace on the SQL server only shows =? i guess that would cause this issue since a lot of the columns are int and it tries to update ? into that.

Any idea on what could generate this type of node update?
Title: Re: 4.4.2 Crash
Post by: MarcusH on September 15, 2023, 04:49:33 PM
Ah it is the "INSERT INTO event_log" for the issue i see in the trace that explains the "=?"
Title: Re: 4.4.2 Crash
Post by: MarcusH on September 15, 2023, 05:06:06 PM
Think i found it

exec sp_prepexec @p1 output,N'@P1 varchar(15),@P2 varchar(15),@P3 int,@P4 int,@P5 int,@P6 varchar(7),@P7 int,@P8 varchar(1),@P9 varchar(567)
@P9 varchar(567), P9 is snmp_oid and it is max 255

this is has been strange i removed it and will see if the issue is gone.
Title: Re: 4.4.2 Crash
Post by: MarcusH on September 20, 2023, 09:18:43 AM
Removed the node that caused the faulty query on poll and now 4.4.2 server is stable.

Noticed that there is another thread that also restarted server on "SQL query failed" is this intended behavior now or bug?
Title: Re: 4.4.2 Crash
Post by: MarcusH on September 20, 2023, 09:29:54 AM
Scratch that server still crashes now without any obvious error.
Might not have time to trace this error and revert again to 4.2.395
Title: Re: 4.4.2 Crash
Post by: MarcusH on September 20, 2023, 02:27:56 PM
Had some time today to look at this issue and it is very illusive.
My knowledge on debugging is limited and the core dump that is saved shows nothing, a few addresses that points to ??

I have tried running with debug 6 and see if i can see anything in the logs but no errors there but i see a trend.

Line before "Log file opened" is always "NetworkDeviceDriver::getInterfaces"

example:
2023.09.20 09:45:19.570 *D* [ndd.common         ] NetworkDeviceDriver::getInterfaces(0x7f0dae84c740): completed, ifList=0x7f0db5c81300

2023.09.20 09:45:21.466 *I* [logger             ] Log file opened (rotation policy 2, max size 16777216)
2023.09.20 09:45:21.466 *I* [startup            ] Starting NetXMS server version 4.4.2 build tag 4.4-568-g3a9a8aa557

if i search for  0x7f0dae84c740 in the log i found witch node it was

2023.09.20 09:45:18.722 *D* [node.iface         ] Node::getInterfaceList(node=TPFIBSW02 [10402]): calling driver (useIfXTable=true)
2023.09.20 09:45:18.722 *D* [ndd.common         ] NetworkDeviceDriver::getInterfaces(0x7f0dae84c740,true)

I started server and quickly unmanaged  TPFIBSW02 and now server has been running for a while without crashing.
Since nothing is output to the log even with level 6 i guess this needs core dump to get why it crashes on this node on poll for interfaces but there i am at a loss.
Title: Re: 4.4.2 Crash
Post by: Alex Kirhenshtein on September 20, 2023, 02:29:27 PM
Do you have any non-Ethernet interfaces there with MAC address longer than 6 bytes?
Title: Re: 4.4.2 Crash
Post by: MarcusH on September 20, 2023, 02:43:09 PM

Quote from: Alex Kirhenshtein on September 20, 2023, 02:29:27 PMDo you have any non-Ethernet interfaces there with MAC address longer than 6 bytes?
not that i can see