daily server crash - Thread "Item Poller" does not respond to watchdog thread

Started by MarkusO, February 08, 2016, 05:02:57 PM

Previous topic - Next topic

MarkusO

Situation:
In a test environment i run NetXMS version 2.0RC2.

netxmsd: sh sta
Total number of objects:     489
Number of monitored nodes:   68
Number of collectable DCIs:  272

4 dashboards have been configured and are up and running. During most of the day everything is working properly. DCI values are polled and displayed nicely.

Issue:
On an average of once a day the server seems to crash. In the Windows event viewer the error : Thread "Item Poller" does not respond to watchdog thread.

From that moment the dashboards are empty and show an error message. At the same time when I execute 'last values' an error message is shown and no values can be displayed. See attached screenshots.

When i view the Windows services it appears that netxms core service is still running. But when troubleshooting this issue the only solution to temporarily solve the issue is stopping and starting the NetXMS core service. Eventually resulting with no history in the dashboard graphs.


Any suggestions in fully solving this issue?

Thanks in advance.
Kindest regards.

Markus




tomaskir

Definitely upgrade to 2.0.2.
There were many fixes between 2.0-RC2 and 2.0.2.

If the problem still persists on 2.0.2, let us know :)

MarkusO

Ok thanks for the advice and support.

I will upgrade to version 2.0.2 and then further evaluate if this issue is resolved.

Again thanks for your help.




MarkusO

Yesterday i successfully upgraded to 2.0.2 at around 14:00h.

Unfortunately this issue, "daily server crash - Thread "Item Poller" does not respond to watchdog thread", occurred again 5 hours after the successful upgrade to version 2.0.2.

* application and services still running, but no dci polling values
* Error viewing last values for each node, screenshot attached in this post.
* empty dashboard graphs

Any advice in fixing  this issue is more than welcome.

Victor Kirhenshtein

Hi,

can you please add the following to netxmsd.conf:

DebugLevel = 7
CreateCrashDumps = yes

as well as ensure that LogFile points to some file, not {syslog}, and restart it. When server hangs, run

nxadm -c "raise access"

server process will crash and dump will be generated. Send dump file and log file to us for analyze.

Best regards,
Victor

MarkusO

Followed instructions.

Server is running again. Waiting for the next crash. Then i will run : nxadm -c "raise access"

I will sent log and dumpfiles after.

MarkusO

Server crashed again, graphs empty and no values. NetXMS core service still appears to be running.

Executed command line : nxadm -c "raise access"
dump files were created

Attached in this reply the most recent (zipped) log file.
Also attached in this reply the 2 dump files.

Awaiting your response.
Thanks in advance.
Kind regards,
Mark


MarkusO

Server crashed again, graphs empty and no values. NetXMS core service still appears to be running.

Executed command line : nxadm -c "raise access"
dump files were created

Attached in this reply the most recent (zipped) log file.
Also attached in this reply the 2 dump files.

Hope the team can solve this issue with the new information (logs and dumpfiles)

Thanks in advance.
Kind regards,
Mark

MarkusO

From a more recent crash attached in this reply :  (zipped)log file and dump files

Hope this addition to the troubleshooting log and dump files can help in solving the issue.

MarkusO

Another crash.

More log and dump files attached in this reply :  (zipped)log file and dump files

Again I hope this addition to the troubleshooting log and dump files can help in solving the issue.

Thanks in advance.

Kind regards,

Mark

MarkusO

Different location. Different network. Different NetXMS server, (software new version 2.0.2)

same issue : daily server crash - Thread "Item Poller" does not respond to watchdog thread, empty graphs, no polling values

created DUMP file and log files. Attached in this reply.

Hope you can find some leads within the new information regarding this issue.

MarkusO

This issue has not been solved.

Have done some new troubleshooting.

-MySQL logging enabled and reviewed logging.
Maybe someone can tell me what to look for besides the usual error search.

-Changed some server configuration parameters, for example in one troubleshooting 'run':  increased the StatusPollingInterval from 60 seconds to 300 seconds.
and in another troubleshooting 'run' : increased PollerThreadPoolMaxSize from 250 to 500, PollerThreadPoolBaseSize from 10 to 20 and NumberOfDataCollectors from 25 to 50.

Unfortunately until now the server keeps on 'crashing' daily.

Hope there is a solution for solving this issue.


MarkusO

Still crashing.

Added some screenshots in this reply post of situation after the issue has occurred.

MarkusO

FYI
For one location i configured the following settings on a NetXMS server:

Statuspollerinterval          600
DefaultDCIPollingInterval   600
ConditionPollingInterval     600

Although there are unrealistic values for daily operations the server has been running for more than one day now without crashing.

MarkusO

Unfortunately the configuration mentioned in my last post (Statuspollerinterval 600, DefaultDCIPollingInterval 600, ConditionPollingInterval 600) did not last.

Server just logged the error "Thread "Item Poller" does not respond to watchdog thread" and does not poll new values anymore and displays empty graphs again.