Unable to restart nxagentd

Started by mulder, October 14, 2009, 02:09:40 PM

Previous topic - Next topic

mulder

Hello,

We are monitoring /var/adm/messages of Solaris10 using nxagentd.
After rotating of messages every Sunday 3 AM, we need to restart nxagentd.
When we restart nxagentd by "nxagentd restart", it fails to restart with the following logs.

[13-Oct-2009 13:39:15] Log file opened
[13-Oct-2009 13:39:15] Subagent "/usr/local/lib/libnsm_sunos.so" loaded successfully
[13-Oct-2009 13:39:15] Subagent "/usr/local/lib/libnsm_logwatch.so" loaded successfully
[13-Oct-2009 13:39:16] Unable to bind socket: Address already in use

It doesn't accept "kill -HUP" as well.

So we are doing stop and start in 10 seconds in stead. The logs when stop and start are;

[25-Sep-2009 18:10:14] Log file opened
[25-Sep-2009 18:10:14] Subagent "/usr/local/lib/libnsm_sunos.so" loaded successfully
[25-Sep-2009 18:10:14] Subagent "/usr/local/lib/libnsm_logwatch.so" loaded successfully
[25-Sep-2009 18:10:15] Listening on socket 0.0.0.0:4700
[25-Sep-2009 18:10:16] NetXMS Agent started

But in this case NetXMS manager detects the agent not responding.

We just want to restart the daemon smoothly.
Does anyone have any suggestions?

Best regards,
mulder

Victor Kirhenshtein

Hi!

Looks like sockets opened by agent closing slower than agent restarts. You can try to add agent startup delay by adding parameter to configuration file:


StartupDelay = <seconds>


and try different delay values. However, it will still be possible that NetXMS server will detect non-responding agent and generate appropriate event.

But why you are need to restart agent after log rotation? Is it stop working, or prevents log rotation? It is supposed to detect log rotation and handle it, and if it doesn't, then it's a bug which needs to be fixed.

Best regards,
Victor

mulder

Hi Victor,

Thank you for your suggestion.
Now we've seen that it stopped working after log rotation. Then we found that it's necessary to restart nxagentd after log rotation.
NetXMS version is 0.2.26.

Is it likely a bug?

Best regards,
mulder

Victor Kirhenshtein

Yes, it's looks like a bug. I'll check it in a few days.

Best regards,
Victor

Victor Kirhenshtein

I have fixed it. Agent now will correctly handle rename of monitored file (which usually happens during log rotation). I plan to release version 0.2.31 in a few days, it will contain this fix.

Best regards,
Victor

mulder

Hi Victor,

Perfect ;)
We're waiting for the next release.

Does an agent work fine with a server which has different core version from the agent?

Best regards,
mulder

Victor Kirhenshtein

Yes, version of the agent can be different. So far, any server version in 0.2.x branch can work with any agent version, including very old 0.1.x version agents.

Best regards,
Victor