News:

We really need your input in this questionnaire

Main Menu

4.4.2 Crash

Started by MarcusH, September 14, 2023, 05:09:31 PM

Previous topic - Next topic

MarcusH

Hi,

Upgraded from 4.2.395 to 4.4.2 and now server crashes on some strange sqlquery, we had this query on 4.2.395 also but never caused crash.

Any idea on what this query is?

2023.09.14 16:04:19.220 *E* [db.drv             ] SQL query failed (Query = "UPDATE nodes SET primary_ip=?,primary_name=?,snmp_port=?,capabilities=?,snmp_version=?,community=?,agent_port=?,secret=?,snmp_oid=?,uname=?,agent_version=?,platform_name=?,poller_node_id=?,zone_guid=?,proxy_node=?,snmp_proxy=?,icmp_proxy=?,required_polls=?,use_ifxtable=?,usm_auth_password=?,usm_priv_password=?,usm_methods=?,snmp_sys_name=?,bridge_base_addr=?,down_since=?,driver_name=?,rack_image_front=?,rack_position=?,rack_height=?,physical_container_id=?,boot_time=?,agent_cache_mode=?,snmp_sys_contact=?,snmp_sys_location=?,last_agent_comm_time=?,syslog_msg_count=?,snmp_trap_count=?,node_type=?,node_subtype=?,ssh_login=?,ssh_password=?,ssh_key_id=?,ssh_port=?,ssh_proxy=?,port_rows=?,port_numbering_scheme=?,agent_comp_mode=?,tunnel_id=?,lldp_id=?,fail_time_snmp=?,fail_time_agent=?,fail_time_ssh=?,rack_orientation=?,rack_image_rear=?,agent_id=?,agent_cert_subject=?,hypervisor_type=?,hypervisor_info=?,icmp_poll_mode=?,chassis_placement_config=?,vendor=?,product_code=?,product_name=?,product_version=?,serial_number=?,cip_device_type=?,cip_status=?,cip_state=?,eip_proxy=?,eip_port=?,hardware_id=?,cip_vendor_code=?,agent_cert_mapping_method=?,agent_cert_mapping_data=?,snmp_engine_id=?,snmp_context_engine_id=?,syslog_codepage=?,snmp_codepage=?,ospf_router_id=?,mqtt_proxy=?,modbus_proxy=?,modbus_tcp_port=?,modbus_unit_id=? WHERE id=?"): [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]String or binary data would be truncated.
2023.09.14 16:04:19.220 *D* [event.proc         ] EVENT SYS_DB_QUERY_FAILED [52] at {0} (ID:57617035 F:0x0001 S:4 TAGS:"") FROM netxmsd: Database query failed (Query: UPDATE nodes SET primary_ip=?,primary_name=?,snmp_port=?,capabilities=?,snmp_version=?,community=?,agent_port=?,secret=?,snmp_oid=?,uname=?,agent_version=?,platform_name=?,poller_node_id=?,zone_guid=?,proxy_node=?,snmp_proxy=?,icmp_proxy=?,required_polls=?,use_ifxtable=?,usm_auth_password=?,usm_priv_password=?,usm_methods=?,snmp_sys_name=?,bridge_base_addr=?,down_since=?,driver_name=?,rack_image_front=?,rack_position=?,rack_height=?,physical_container_id=?,boot_time=?,agent_cache_mode=?,snmp_sys_contact=?,snmp_sys_location=?,last_agent_comm_time=?,syslog_msg_count=?,snmp_trap_count=?,node_type=?,node_subtype=?,ssh_login=?,ssh_password=?,ssh_key_id=?,ssh_port=?,ssh_proxy=?,port_rows=?,port_numbering_scheme=?,agent_comp_mode=?,tunnel_id=?,lldp_id=?,fail_time_snmp=?,fail_time_agent=?,fail_time_ssh=?,rack_orientation=?,rack_image_rear=?,agent_id=?,agent_cert_subject=?,hypervisor_type=?,hypervisor_info=?,icmp_poll_mode=?,chassis_placement_config=?,vendor=?,product_code=?,product_name=?,product_version=?,serial_number=?,cip_device_type=?,cip_status=?,cip_state=?,eip_proxy=?,eip_port=?,hardware_id=?,cip_vendor_code=?,agent_cert_mapping_method=?,agent_cert_mapping_data=?,snmp_engine_id=?,snmp_context_engine_id=?,syslog_codepage=?,snmp_codepage=?,ospf_router_id=?,mqtt_proxy=?,modbus_proxy=?,modbus_tcp_port=?,modbus_unit_id=? WHERE id=?; Error: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]String or binary data would be truncated.)

Alex Kirhenshtein

Server crashed, or just failed query? If it's crashed, please share stack trace from core file or just sent us core file and we'll check it.

MarcusH

Running server on docker making core dump a challange.

I was about to install server on a clean temp VM and test it without docker and then i noticed an issue.

Why does netxms still depend on libssl1.1 and not libssl3?

Alex Kirhenshtein

Quote from: MarcusH on September 15, 2023, 10:03:20 AMWhy does netxms still depend on libssl1.1 and not libssl3?

Because Debian 11 ships with OpenSSL 1.1:

root@da539131bae5:/# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye
root@da539131bae5:/# apt-cache search libssl
libssl-ocaml - OCaml bindings for OpenSSL (runtime)
libssl-ocaml-dev - OCaml bindings for OpenSSL
libssl-dev - Secure Sockets Layer toolkit - development files
libssl-doc - Secure Sockets Layer toolkit - development documentation
libssl1.1 - Secure Sockets Layer toolkit - shared libraries
libssl-utils-clojure - library for SSL certificate management on the JVM
root@da539131bae5:/# apt-cache show libssl-dev|grep Version
Version: 1.1.1n-0+deb11u5
Version: 1.1.1n-0+deb11u4


There are no reference to libssl1 in the official packages for debian 12:

root@d483efbfc136:~# ldd /usr/bin/netxmsd | grep ssl
        libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007ffb9ae88000)
root@d483efbfc136:~# dpkg -l netxms-server
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                Version      Architecture Description
+++-===================-============-============-=================================
ii  netxms-server:amd64 4.4.2-1      amd64        meta package

MarcusH

Thanks, this was my bad i imported old source list for netxms

On the issue i reverted back to 4.2.395 making at least server stable and this strange query is outputed to the server console and only this i have nothing else outputed to server console. Any way to figure out what creates this query?


Alex Kirhenshtein

Quote from: MarcusH on September 15, 2023, 10:03:20 AMRunning server on docker making core dump a challange.
it's rather straighforward if you can control host's core pattern

sysctl -w kernel.core_pattern='/core/core.%e.%p.%t'
docker volume create core_vol
docker run --ulimit core=-1 --mount source=core_vol,target=/core container
sysctl -w kernel.core_pattern=core # reset it back to the default

MarcusH

Quote from: Alex Kirhenshtein on September 15, 2023, 10:31:49 AM
Quote from: MarcusH on September 15, 2023, 10:03:20 AMRunning server on docker making core dump a challange.
it's rather straighforward if you can control host's core pattern

sysctl -w kernel.core_pattern='/core/core.%e.%p.%t'
docker volume create core_vol
docker run --ulimit core=-1 --mount source=core_vol,target=/core container
sysctl -w kernel.core_pattern=core # reset it back to the default

I have a core dump but it gives a lot of reference error and shows no stack trace.

Alex Kirhenshtein

Quote from: MarcusH on September 15, 2023, 10:36:51 AMI have a core dump but it gives a lot of reference error and shows no stack trace.

Have you installed netxms-dbg package? It contains all debug symbols for the product.

Alex Kirhenshtein

Quote from: MarcusH on September 15, 2023, 10:30:30 AMAny way to figure out what creates this query?
It's executed by object syncer thread, which saves node changes back into the database.

From the error message it's unclear which field is not accepted by the SQL server, this link might help with tracing it: https://stackoverflow.com/a/62905763

MarcusH

Quote from: Alex Kirhenshtein on September 15, 2023, 10:38:19 AM
Quote from: MarcusH on September 15, 2023, 10:36:51 AMI have a core dump but it gives a lot of reference error and shows no stack trace.

Have you installed netxms-dbg package? It contains all debug symbols for the product.


I have not i will have look at this on the test VM

MarcusH

Quote from: Alex Kirhenshtein on September 15, 2023, 10:40:07 AM
Quote from: MarcusH on September 15, 2023, 10:30:30 AMAny way to figure out what creates this query?
It's executed by object syncer thread, which saves node changes back into the database.

From the error message it's unclear which field is not accepted by the SQL server, this link might help with tracing it: https://stackoverflow.com/a/62905763
I though all the =? values was for log obfuscation but even the trace on the SQL server only shows =? i guess that would cause this issue since a lot of the columns are int and it tries to update ? into that.

Any idea on what could generate this type of node update?

MarcusH

Ah it is the "INSERT INTO event_log" for the issue i see in the trace that explains the "=?"

MarcusH

Think i found it

exec sp_prepexec @p1 output,N'@P1 varchar(15),@P2 varchar(15),@P3 int,@P4 int,@P5 int,@P6 varchar(7),@P7 int,@P8 varchar(1),@P9 varchar(567)
@P9 varchar(567), P9 is snmp_oid and it is max 255

this is has been strange i removed it and will see if the issue is gone.

MarcusH

Removed the node that caused the faulty query on poll and now 4.4.2 server is stable.

Noticed that there is another thread that also restarted server on "SQL query failed" is this intended behavior now or bug?

MarcusH

Scratch that server still crashes now without any obvious error.
Might not have time to trace this error and revert again to 4.2.395