Client timeouts and Alarms not processing in a timely manner

Started by twparker, April 17, 2026, 05:02:06 PM

Previous topic - Next topic

twparker

I am currently using NetXMS 6.1 and I've noticed that I am not always getting alarms when nodes become unreachable at least not immediately. I have also observed timeouts when using the client for simple activities like creating a new node that I have previously not encountered before.

I took a look at the log for my server and noticed these entries -

2026.04.17 09:16:50.436 *I* [watchdog           ] Thread "Recurrent scheduler" r                           eturned to running state
2026.04.17 09:17:48.752 *I* [watchdog           ] Thread "Syncer Thread" returne                           d to running state
2026.04.17 09:19:36.335 *E* [watchdog           ] Thread "Syncer Thread" does no                           t respond to watchdog thread
2026.04.17 09:21:56.337 *E* [watchdog           ] Thread "Recurrent scheduler" d                           oes not respond to watchdog thread
2026.04.17 09:24:24.322 *I* [watchdog           ] Thread "Recurrent scheduler" r                           eturned to running state
2026.04.17 09:26:36.342 *E* [watchdog           ] Thread "Recurrent scheduler" d                           oes not respond to watchdog thread
2026.04.17 09:32:38.323 *I* [watchdog           ] Thread "Recurrent scheduler" r                           eturned to running state
2026.04.17 09:33:06.359 *I* [watchdog           ] Thread "Syncer Thread" returne                           d to running state
2026.04.17 09:34:56.347 *E* [watchdog           ] Thread "Syncer Thread" does no                           t respond to watchdog thread
2026.04.17 09:35:56.348 *E* [watchdog           ] Thread "Recurrent scheduler" d                           oes not respond to watchdog thread

I'd appreciate any suggestions on how to fix this.




Filipp Sudanov

Pls show output of the following from debug console:

sh st
sh th
sh qu

What are the overall specs of the server? What are CPU and memory for a few days?

twparker

NetXMS Server Remote Console V6.1.0 Ready

 sh st
Objects............: 25145
   Nodes...........: 3177
   Interfaces......: 19762
   Access points...: 651
   Sensors.........: 0
Collectible DCIs...: 44734
Active alarms......: 3924
Uptime.............: 2 days, 19:42:54

 sh th
MAIN
   Threads.............. 8 (8/600)
   Load average......... 0.00 0.00 0.00
   Current load......... 0%
   Usage................ 1%
   Active requests...... 0
   Scheduled requests... 3
   Total requests....... 4624
   Thread starts........ 0
   Thread stops......... 0
   Wait time EMA........ 0 ms
   Wait time SMA........ 0 ms
   Wait time SD......... 0 ms
   Queue size EMA....... 0
   Queue size SMA....... 0
   Queue size SD........ 0

AGENT
   Threads.............. 32 (32/256)
   Load average......... 0.00 0.00 0.00
   Current load......... 0%
   Usage................ 12%
   Active requests...... 0
   Scheduled requests... 0
   Total requests....... 9455
   Thread starts........ 0
   Thread stops......... 0
   Wait time EMA........ 0 ms
   Wait time SMA........ 0 ms
   Wait time SD......... 0 ms
   Queue size EMA....... 0
   Queue size SMA....... 0
   Queue size SD........ 0

POLLERS
   Threads.............. 484 (10/600)
   Load average......... 195.17 240.33 250.33
   Current load......... 58%
   Usage................ 80%
   Active requests...... 284
   Scheduled requests... 0
   Total requests....... 5389134
   Thread starts........ 749
   Thread stops......... 275
   Wait time EMA........ 0 ms
   Wait time SMA........ 0 ms
   Wait time SD......... 0 ms
   Queue size EMA....... 0
   Queue size SMA....... 0
   Queue size SD........ 0

FILE-TRANSFER
   Threads.............. 2 (2/16)
   Load average......... 0.00 0.00 0.00
   Current load......... 0%
   Usage................ 12%
   Active requests...... 0
   Scheduled requests... 0
   Total requests....... 0
   Thread starts........ 0
   Thread stops......... 0
   Wait time EMA........ 0 ms
   Wait time SMA........ 0 ms
   Wait time SD......... 0 ms
   Queue size EMA....... 0
   Queue size SMA....... 0
   Queue size SD........ 0

DATACOLL
   Threads.............. 10 (10/600)
   Load average......... 2.51 1.65 1.38
   Current load......... 0%
   Usage................ 1%
   Active requests...... 0
   Scheduled requests... 0
   Total requests....... 17275228
   Thread starts........ 66
   Thread stops......... 66
   Wait time EMA........ 29 ms
   Wait time SMA........ 28 ms
   Wait time SD......... 37 ms
   Queue size EMA....... 0
   Queue size SMA....... 0
   Queue size SD........ 0

THREVT
   Threads.............. 2 (2/4)
   Load average......... 0.00 0.00 0.00
   Current load......... 0%
   Usage................ 50%
   Active requests...... 0
   Scheduled requests... 1
   Total requests....... 75
   Thread starts........ 0
   Thread stops......... 0
   Wait time EMA........ 0 ms
   Wait time SMA........ 0 ms
   Wait time SD......... 0 ms
   Queue size EMA....... 0
   Queue size SMA....... 0
   Queue size SD........ 0

CLIENT
   Threads.............. 16 (16/2048)
   Load average......... 0.03 0.01 0.00
   Current load......... 12%
   Usage................ 0%
   Active requests...... 2
   Scheduled requests... 0
   Total requests....... 25173
   Thread starts........ 0
   Thread stops......... 0
   Wait time EMA........ 0 ms
   Wait time SMA........ 0 ms
   Wait time SD......... 0 ms
   Queue size EMA....... 0
   Queue size SMA....... 0
   Queue size SD........ 0

DISCOVERY
   Threads.............. 8 (8/64)
   Load average......... 0.00 0.00 0.00
   Current load......... 0%
   Usage................ 12%
   Active requests...... 0
   Scheduled requests... 0
   Total requests....... 0
   Thread starts........ 0
   Thread stops......... 0
   Wait time EMA........ 0 ms
   Wait time SMA........ 0 ms
   Wait time SD......... 0 ms
   Queue size EMA....... 0
   Queue size SMA....... 0
   Queue size SD........ 0

SCHEDULER
   Threads.............. 6 (1/64)
   Load average......... 0.00 0.00 0.00
   Current load......... 0%
   Usage................ 9%
   Active requests...... 0
   Scheduled requests... 1
   Total requests....... 1026
   Thread starts........ 5
   Thread stops......... 0
   Wait time EMA........ 643 ms
   Wait time SMA........ 0 ms
   Wait time SD......... 0 ms
   Queue size EMA....... 0
   Queue size SMA....... 0
   Queue size SD........ 0

MOBILE
   Threads.............. 4 (4/256)
   Load average......... 0.00 0.00 0.00
   Current load......... 0%
   Usage................ 1%
   Active requests...... 0
   Scheduled requests... 0
   Total requests....... 0
   Thread starts........ 0
   Thread stops......... 0
   Wait time EMA........ 0 ms
   Wait time SMA........ 0 ms
   Wait time SD......... 0 ms
   Queue size EMA....... 0
   Queue size SMA....... 0
   Queue size SD........ 0

PACKAGE-MANAGER
   Threads.............. 2 (2/25)
   Load average......... 0.00 0.00 0.00
   Current load......... 0%
   Usage................ 8%
   Active requests...... 0
   Scheduled requests... 0
   Total requests....... 0
   Thread starts........ 0
   Thread stops......... 0
   Wait time EMA........ 0 ms
   Wait time SMA........ 0 ms
   Wait time SD......... 0 ms
   Queue size EMA....... 0
   Queue size SMA....... 0
   Queue size SD........ 0



 sh qu
Data collector                   : 0
DCI cache loader                 : 0
Template updater                 : 0
Database writer                  : 0
Database writer (alarms)         : 0
Database writer (IData)          : 0
Database writer (raw DCI values) : 1934
Event processor                  : 0
Event log writer                 : 0
Poller                           : 0
Node discovery poller            : 0
SNMP trap processor              : 0
SNMP trap writer                 : 0
Syslog processor                 : 0
Syslog writer                    : 0
Scheduler                        : 0
Windows event processor          : 0
Windows event writer             : 0

Ubuntu VM on 22.04 Intel Xeon Skylake Processor 16 Cores, 32 Gbs RAM, 500 GB Virtual drive running on SSD

Using 15 GBs of RAM currently and CPU usage stays under 10%









Filipp Sudanov

Possible case could be a locking of some server thread. You can capture server threads when you see delays in working with the server from the GUI, here's the script: https://github.com/netxms/netxms/blob/master/tools/capture_netxmsd_threads.sh
Script should be executed 3 times with 20-30 second interval. It needs gdb and netxms-dbg packages installed on the system.

Also to check - on netxms server node there should be Server: Number of Processed Events (based on internal Server.TotalEventsProcessed metric) - just to verify that there we no sudden spikes in event quantity.
In overall there's quite a few metrics there, e.g. Server QueueSize ones, it's worth checking them for any big spikes.
If you don't have Server.ImportConfigurationOnStartup set to Always, you might not have the most recent template with above mentioned DCIs, you may want to import them from /usr/share/netxms/templates/netxms_server.xml