Hi, everyone!
NetXMS becomes unresponsive via web-interface and Client both after about couple of hours after being started - without any specific reasons or any interaction with it.
OS: Microsoft Windows Server 2025 Version 24H2 (OS Build 26100.4946)
No events in Windows Event Log about that found.
NetXMS log:
2025.11.16 18:14:21.830 *I* [logger ] Log file opened (rotation policy 2, max size 16777216)
2025.11.16 18:14:21.830 *I* [startup ] Starting NetXMS server version 5.2.7 build tag 5.2-483-g0fe7ba4e30
2025.11.16 18:14:21.830 *I* [startup ] System time zone is RTZ+03RTZDT
2025.11.16 18:14:21.830 *I* [logger ] Debug level set to 0
2025.11.16 18:14:21.830 *I* [config ] Main configuration file: C:\NetXMS\etc\netxmsd.conf
2025.11.16 18:14:21.830 *I* [config ] Configuration tree:
2025.11.16 18:14:21.830 *I* [config ] config
2025.11.16 18:14:21.830 *I* [config ] +- server
2025.11.16 18:14:21.830 *I* [config ] +- DBDriver
2025.11.16 18:14:21.831 *I* [config ] | value: sqlite.ddr
2025.11.16 18:14:21.831 *I* [config ] +- DBName
2025.11.16 18:14:21.831 *I* [config ] | value: C:\NetXMS\database\netxms.db
2025.11.16 18:14:21.831 *I* [config ] +- LogFile
2025.11.16 18:14:21.831 *I* [config ] value: C:\NetXMS\log\netxmsd.log
2025.11.16 18:14:21.835 *I* [startup ] System hardware ID 17F0E422E9CDBD3F36E0B15735E86B5B448A3BE0
2025.11.16 18:14:21.839 *I* [db.drv ] Database driver "sqlite.ddr" loaded and initialized successfully
2025.11.16 18:14:21.842 *I* [comm.listener ] SocketListener/LocalAdmin: listening on 127.0.0.1:21784
2025.11.16 18:14:21.842 *I* [comm.listener ] SocketListener/LocalAdmin: listening on [127.0.0.1]:21784
2025.11.16 18:14:21.858 *I* [startup ] Server ID 348CF52680000029
2025.11.16 18:14:21.867 *I* [db.writer ] DBWriter/Housekeeper interlock is OFF
2025.11.16 18:14:21.869 *I* [startup ] Data directory "C:\NetXMS\var\crl" successfully created
2025.11.16 18:14:21.869 *I* [startup ] Data directory "C:\NetXMS\var\mibs" successfully created
2025.11.16 18:14:21.873 *I* [crypto ] Crypto library initialized (OpenSSL 3.0.17 1 Jul 2025)
2025.11.16 18:14:21.873 *I* [crypto.cert ] Server certificate not set
2025.11.16 18:14:21.873 *I* [crypto.cert ] Internal CA certificate not set
2025.11.16 18:14:27.038 *I* [macdb ] OUI-24 database loaded (32548 entries)
2025.11.16 18:14:27.063 *I* [macdb ] OUI-28 database loaded (4403 entries)
2025.11.16 18:14:27.091 *I* [macdb ] OUI-36 database loaded (5046 entries)
2025.11.16 18:14:27.093 *I* [backup ] Network device backup interface is not available
2025.11.16 18:14:27.093 *I* [2fa ] 0 two-factor authentication methods loaded, 0 successfully initialized
2025.11.16 18:14:27.114 *I* [ndd ] Network device driver AT loaded successfully
... (lots of records "Network device driver *** loaded successfully") ...
2025.11.16 18:14:27.149 *I* [ndd ] Network device driver WESTERSTRAND loaded successfully
2025.11.16 18:14:27.151 *E* [obj.init ] NetObj::loadCommonProperties() failed for Zone object Default [4]
2025.11.16 18:14:30.349 *I* [event.proc ] Parallel event processing disabled
2025.11.16 18:14:30.349 *I* [snmp.agent ] Build-in SNMP agent is disabled
2025.11.16 18:14:30.351 *I* [snmp.trap ] Local SNMP engine ID set to 8000DF4B0520100804020100
2025.11.16 18:14:30.351 *I* [snmp.trap ] Listening for SNMP traps on UDP socket 0.0.0.0:162
2025.11.16 18:14:30.351 *I* [snmp.trap ] Listening for SNMP traps on UDP socket :::162
2025.11.16 18:14:30.351 *I* [snmp.trap ] SNMP trap receiver started on port 162
2025.11.16 18:14:30.354 *I* [beacon ] Beacon poller will not start because beacon host list is empty
2025.11.16 18:14:30.355 *I* [ldap ] LDAP synchronization thread will not start because LDAP synchronization is disabled
2025.11.16 18:14:30.365 *E* [agent.tunnel ] Tunnel listener cannot start because server certificate is not loaded
2025.11.16 18:14:30.365 *I* [comm.listener ] SocketListener/Clients: listening on 0.0.0.0:4701
2025.11.16 18:14:30.366 *I* [comm.listener ] SocketListener/Clients: listening on [0.0.0.0]:4701
2025.11.16 18:14:30.366 *I* [comm.listener ] SocketListener/MobileDevices: listening on 0.0.0.0:4747
2025.11.16 18:14:30.366 *I* [comm.listener ] SocketListener/MobileDevices: listening on [0.0.0.0]:4747
2025.11.16 18:14:30.368 *I* [licensing ] Number of managed nodes restricted to 250
2025.11.16 18:14:30.368 *I* [startup ] Server initialization completed in 8539 milliseconds
2025.11.16 18:14:30.368 *I* [startup ] NetXMS Server started
2025.11.16 20:08:07.734 *E* [watchdog ] Thread "Syncer Thread" does not respond to watchdog thread
2025.11.16 20:11:47.775 *E* [watchdog ] Thread "Recurrent scheduler" does not respond to watchdog thread
I know that's not much information.
Is there anything else I could provide to help fixing/resolving the issue?
Yes, messages about
Thread "Syncer Thread" does not respond to watchdog thread
Thread "Recurrent scheduler" does not respond to watchdog thread
mean that something is not good with the server process (netxmsd).
First thing to check memory consumption by netxmsd, may be it goes up and system starts swapping.
What are the specs of the machine? What DB do you use? Is the DB on the same machine?
From another thread is seems that you've installed with SQLite. This could be the reason for server locking. I'd recommend to migrate to MySQL, MSSQL or Postgres (I personally prefer the last option, but it's just whatever you are comfortable with).
One you've configured parameters for the new DB in netxmsd.conf, and ran "nxdbmgr init" to initialize new databases's tables, you can use
nxdbmgr import <path-to-your-sqlite-db>
to migrate your exising sqlite database into the new one.
Quote from: Filipp Sudanov on November 19, 2025, 06:58:24 PMYes, messages about
Thread "Syncer Thread" does not respond to watchdog thread
Thread "Recurrent scheduler" does not respond to watchdog thread
mean that something is not good with the server process (netxmsd).
Exactly. I'll try to check via Process Manager the thread stack - if/when it hangs up again.
Quote from: Filipp Sudanov on November 19, 2025, 06:58:24 PMFirst thing to check memory consumption by netxmsd, may be it goes up and system starts swapping.
What are the specs of the machine? What DB do you use?
Supermicro X10DRi-T, 64 Gb RAM, 2x Xeon E5-2683 v3, 1 Tb NVMe SSD with paging file.
Does not look like lack of memory, but I'll be watching that too.
Quote from: Filipp Sudanov on November 19, 2025, 06:58:24 PMIs the DB on the same machine?
Yes
Quote from: Filipp Sudanov on November 19, 2025, 07:03:12 PMFrom another thread is seems that you've installed with SQLite. This could be the reason for server locking.
I'm aware of the fact that SQLite is not made for heavy loads and I'm going to monitor less than ten devices
(home server, dad's pc, my desktop, few routers - mostly not to worry about HDDs in RAID and to be sure WireGuard is ok). But that hanging up has happened when NetXMS was freshly installed and was monitoring only itself, then I've added one MikroTik router monitored via SNMP with default settings, - which is hardly any bit of load.
Quote from: Filipp Sudanov on November 19, 2025, 07:03:12 PMI'd recommend to migrate to MySQL, MSSQL or Postgres (I personally prefer the last option, but it's just whatever you are comfortable with).
One you've configured parameters for the new DB in netxmsd.conf, and ran "nxdbmgr init" to initialize new databases's tables, you can use
nxdbmgr import <path-to-your-sqlite-db>
to migrate your exising sqlite database into the new one.
I'll keep that in mind, thank you - I really appreciate your help :)
We'll see how it works ... 8)
Quote from: Filipp Sudanov on November 19, 2025, 06:58:24 PMmean that something is not good with the server process (netxmsd).
So, running NetXMS service with -D 9 argument brought following as for now:
Thread "Recurrent scheduler" does not respond to watchdog threadLog doesn't seem to contain much useful information .. should I upload mini or full dump somewhere?
2025.11.21 05:21:44.306 *D* [dc.poller ] ItemPoller: wakeup
2025.11.21 05:21:44.306 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object Server [100]
2025.11.21 05:21:44.306 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object Server-Backup [186]
2025.11.21 05:21:44.306 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object MikroTik hAP ac2 [190]
2025.11.21 05:21:44.306 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object Server-Backup BMC [209]
2025.11.21 05:21:44.306 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object MikroTik L009UiGS-2HaxD [216]
2025.11.21 05:21:45.313 *D* [dc.poller ] ItemPoller: wakeup
2025.11.21 05:21:45.314 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object Server [100]
2025.11.21 05:21:45.314 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object Server-Backup [186]
2025.11.21 05:21:45.314 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object MikroTik hAP ac2 [190]
2025.11.21 05:21:45.314 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object Server-Backup BMC [209]
2025.11.21 05:21:45.314 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object MikroTik L009UiGS-2HaxD [216]
2025.11.21 05:21:46.325 *D* [dc.poller ] ItemPoller: wakeup
2025.11.21 05:21:46.325 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object Server [100]
2025.11.21 05:21:46.325 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object Server-Backup [186]
2025.11.21 05:21:46.325 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object MikroTik hAP ac2 [190]
2025.11.21 05:21:46.325 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object Server-Backup BMC [209]
2025.11.21 05:21:46.325 *D* [dc.poller ] ItemPoller: calling DataCollectionTarget::queueItemsForPolling for object MikroTik L009UiGS-2HaxD [216]
2025.11.21 05:21:46.535 *E* [watchdog ] Thread "Recurrent scheduler" does not respond to watchdog thread
2025.11.21 05:21:46.535 *D* [obj.macro ] NetObj::expandText(sourceObject=100 template='Thread "%1" is not responding' alarm=0 event=3268 instance='(null)')
2025.11.21 05:21:46.535 *D* [event.corr ] CorrelateEvent: event SYS_THREAD_HANG id 3268 source Server [100]
2025.11.21 05:21:46.535 *D* [event.corr ] CorrelateEvent: finished, rootId=0
2025.11.21 05:21:46.535 *D* [event.proc ] EVENT SYS_THREAD_HANG [20] at {0} (ID:3268 F:0x0001 S:4 TAGS:"") FROM Server: Thread "Recurrent scheduler" is not responding
2025.11.21 05:21:46.535 *D* [event.policy ] EPP: processing event 3268
2025.11.21 05:21:46.535 *D* [event.policy ] Event 3268 match EPP rule 24
2025.11.21 05:21:46.535 *D* [obj.macro ] NetObj::expandText(sourceObject=100 template='SYS_THREAD_HANG_%1' alarm=0 event=3268 instance='(null)')
2025.11.21 05:21:46.535 *D* [obj.macro ] NetObj::expandText(sourceObject=100 template='%m' alarm=0 event=3268 instance='(null)')
2025.11.21 05:21:46.535 *D* [obj.macro ] NetObj::expandText(sourceObject=100 template='' alarm=0 event=3268 instance='(null)')
2025.11.21 05:21:46.535 *D* [alarm ] AlarmManager: adding new active alarm, current alarm count 0
2025.11.21 05:21:46.535 *D* [db.cpool ] Handle 0000013967CF42E0 acquired (call from c:\jenkins\workspace\release-windows\src\server\core\alarm.cpp:676)
2025.11.21 05:21:46.536 *D* [db.cpool ] Handle 0000013967CF42E0 released
2025.11.21 05:21:46.536 *D* [obj.notify ] Sending object change notification for Server [100] (flags=0x00000000)
2025.11.21 05:21:46.536 *D* [obj.notify ] Sending object change notification for LAN 10G (10.10.10.0/24) [103] (flags=0x00000000)
2025.11.21 05:21:46.536 *D* [obj.notify ] Sending object change notification for Default [4] (flags=0x00000000)
2025.11.21 05:21:46.536 *D* [obj.notify ] Sending object change notification for Entire Network [1] (flags=0x00000000)
2025.11.21 05:21:46.536 *D* [obj.notify ] Sending object change notification for LAN (192.168.128.0/24) [105] (flags=0x00000000)
2025.11.21 05:21:46.536 *D* [obj.notify ] Sending object change notification for Infrastructure Services [2] (flags=0x00000000)
2025.11.21 05:21:46.536 *D* [client.session.0 ] Sending compressed message CMD_ALARM_UPDATE (344 bytes)
2025.11.21 05:21:46.536 *D* [db.writer ] SQL request queued: INSERT INTO alarm_events (alarm_id,event_id,event_code,event_name,severity,source_object_id,event_timestamp,message) VALUES (?,?,?,?,?,?,?,?)
2025.11.21 05:21:46.536 *D* [event.proc ] Event 3268 with code 20 passed event processing policy
2025.11.21 05:21:46.536 *D* [obj.macro ] NetObj::expandText(sourceObject=100 template='Node status changed to CRITICAL' alarm=0 event=3269 instance='(null)')
2025.11.21 05:21:46.536 *D* [db.cpool ] Handle 0000013967CF42E0 acquired (call from c:\jenkins\workspace\release-windows\src\server\core\dbwrite.cpp:321)
2025.11.21 05:21:46.536 *D* [event.corr ] CorrelateEvent: event SYS_NODE_CRITICAL id 3269 source Server [100]
2025.11.21 05:21:46.536 *D* [client.session.0 ] Message dump:
...
Is this still with SQLite?
Quote from: Filipp Sudanov on November 21, 2025, 03:43:38 PMIs this still with SQLite?
Yes. I remember the advice on using any other DB and I might switch to another one later, although that would be too much for my needs and I preffer concept "The easier the better", i.e. the less extra software. But generally speaking if SQLite is provided as an option - it has work correctly.
Well, we always mention that it's just for the tests and extremely simple setups. SQLite is great for some use cases, but not this one.
Consider it's not available.
Quote from: cwl on November 22, 2025, 11:10:02 AMQuote from: Filipp Sudanov on November 21, 2025, 03:43:38 PMIs this still with SQLite?
But generally speaking if SQLite is provided as an option - it has work correctly.
Quote from: Filipp Sudanov on November 19, 2025, 07:03:12 PMI'd recommend to migrate to MySQL, MSSQL or Postgres (I personally prefer the last option, but it's just whatever you are comfortable with).
Works with PostgreSQL
(default settings) without issues - confirmed. And once again - thank you :)