ExternalParameter DCI triggering issue

Started by DanG, February 20, 2012, 01:06:03 PM

Previous topic - Next topic

DanG

Hi,

I wrote a program to processes mail messages after fetching them from a mail server. It follows the Nagios plugin conventions.

In nxagentd.conf I've add the following to the #ExternalParameter section:
ExternalParameterShellExec = company.ParameterChecker:"C:\Program Files\...\Check.cmd"

Check.cmd :
@echo off
"C:\Program Files\...\text.exe" > NUL
@echo %ERRORLEVEL%

The DCI uses this Data Parameters and has a Custom Schedule: 30 5 * * *

The above should cause the DCI to run ONCE A DAY at 5:30 am, however it get triggers twice, both at the same second. My programs expects to find mails and once they've been processed it moves them to another folder. As it's being called a second time there're no mails to be found and it returns an error.

I cannot find any error message neither in the core nor the agent logs.

Any idea what is going wrong here?

Regards,
Dan

Victor Kirhenshtein

Hi!

Do you get two values in the DCI history or only one? It could be possible if command executes for too long on first run, server gets timeout error, and retry. But then there should be at least two seconds between runs.

Best regards,
Victor

DanG

Hi Victor,

In the old console using "Show History" I see there's a single value per day. Time stamp is always 5:30:02-06. This is as should I presume.
(By the way, is the history functionality present in the new console interface?)
Is there a log where the execution of external DCI's can be traced?

Regards,
Dan

Victor Kirhenshtein

As seconds part of timestamp is in range 02-06, and never 00, I assume that it's an execution timeout problem. There are two places where additional information may appear:

1. Server log. If you turn on debug to level 7, you should see in server log something like this:

Node(node_name)->GetItemFromAgent(parameter_name): timeout; resetting connection to agent...

But turning on debug level 7 on server will produce a really huge amount of records in the log.

2. Agent log. If you run agent with debug level 4, you should see debug messages related to command execution - these messages will be prefixed with text "H_ExternalParameter".

I would start with agent log. Please note that you should change logging to file before running agent in debug mode, otherwise it will flood system log.

Best regards,
Victor


Jamie

With the example above : ExternalParameterShellExec = company.ParameterChecker:"C:\Program Files\...\Check.cmd"

What should be entered under Data Collection Configuration origin NetXMS agent?

Does the Parameter field contain company.ParameterChecker




Victor Kirhenshtein

Origin should be set to "NetXMS agent". And yes, parameter name should be company.ParameterChecke

Best regards,
Victor

DanG

Hi Victor,

This is what I get for the Agent log in debug level 4:

[23-Feb-2012 15:57:01] H_ExternalParameter called for "Company.ParameterChecker" "S"C:\Program Files\...\Check.cmd""
[23-Feb-2012 15:57:01] H_ExternalParameter: command line is ""C:\Program Files\...\Check.cmd""
[23-Feb-2012 15:57:01] H_ExternalParameter (shell exec): worker thread created
[23-Feb-2012 15:57:03] H_ExternalParameter (shell exec): execution status 2
[23-Feb-2012 15:57:03] H_ExternalParameter called for "Company.ParameterChecker" "S"C:\Program Files\...\Check.cmd""
[23-Feb-2012 15:57:03] H_ExternalParameter: command line is ""C:\Program Files\...\Check.cmd""
[23-Feb-2012 15:57:03] H_ExternalParameter (shell exec): worker thread created
[23-Feb-2012 15:57:05] H_ExternalParameter (shell exec): execution status 2
[23-Feb-2012 15:57:05] H_ExternalParameter called for "Company.ParameterChecker" "S"C:\Program Files\...\Check.cmd""
[23-Feb-2012 15:57:05] H_ExternalParameter: command line is ""C:\Program Files\...\Check.cmd""
[23-Feb-2012 15:57:05] H_ExternalParameter (shell exec): worker thread created
[23-Feb-2012 15:57:06] H_ExternalParameter/POpenWorker: worker thread pipe read result: 0000000001397230
[23-Feb-2012 15:57:06] H_ExternalParameter/POpenWorker: worker thread pipe read result: 00000000012876B0
[23-Feb-2012 15:57:06] H_ExternalParameter/POpenWorker: worker thread pipe read result: 000000000128AC10
[23-Feb-2012 15:57:06] H_ExternalParameter (shell exec): execution status 0

Can you deduce the cause of the cmd being called 3 times?

Regards,
Dan

Victor Kirhenshtein

Hi!

From this log I can see that command executes for too long, and command execution timeout expires first. Default timeout is 2 seconds. You should increase both ExecTimeout parameter on agent and AgentCommandTimeout configuration variable on server.

Best regards,
Victor

DanG

Hi Victor,

I had no ExecTimeout parameter in my agent config (therefore it defaulted to 2000), After adding it I've set it to 10000.
I've increased the server's AgentCommandTimeout from 5000 to 11000.

In the agent log I can see now the external action is being called only once, problem solved. Thank you Victor.

I wonder however why NetXMS calls the external action multiple times when it timeouts, I would rather expect to find no value is returned.
Could you clarify this?

Regards,
Dan

Victor Kirhenshtein

NetXMS server does 3 retries after timeout error, resetting agent connection before each retry, because most common cause for timeout on getting parameter is communication problems. Long-running external parameters are bad anyway, because server uses one connection for reading data from agent, and parameters are read in order. So, if one parameter request takes too long, collection intervals for others will shift.

Best regards,
Victor

DanG

Victor,

The program I wrote does take some time to run as connecting to the mail server, fetching and moving mails takes time.
From what you're saying I should avoid using such actions as they impair performance and accuracy.
Is there a (recommended) way of achieving long during external tests in NetXMS?

Regards,
Dan


Victor Kirhenshtein

I can think of the following ways:

1. Schedule external script using system scheduler and write result to a file; then you can get into DCI content of that file, which can hold execution result of and external script, and fact of such file presence and/or last modification time.

2. If you have long-running script which should be executed with fixed intervals, you can use external parameters provider. It works by running given script periodically and caching the results. Then, when server requests value of provided parameters, result from cache is returned immediately.

There are also a lot of possible variations - like starting external script from server using actions, and so on.

Best regards,
Victor

P.S. I start thinking that having some kind of scheduled jobs in the server could be useful in situations like that.

DanG

Before trying to use my script with NetXMS I was making use the scheduler services of the underlying operating system. It works but has major disadvantage of keeping things in sync. You end up with different program having different schedules to test the results of the program that run before them.  Because there is no formal relation between the different programs there is no central place to see which program depends on which. Over time when one changes the schedule of one program it can easily break the whole chain. My idea was to use a single program to set the schedule and test the result, however it's now clear that NetXMS expects results to be readily available for its DCIs.

A scheduler within NetXMS for long running tasks can be a solution. At the moment I thing I'll revert to using the Windows scheduler and (search to see how I can) modify the script to push an event to NetXMS informing it of success or failure of the script.

Regards,
Dan

Victor Kirhenshtein

Yes, pushing results is another option. You can push events using nxevent tool, or push DCI values using nxpush.

Best regards,
Victor

Jamie

What do you setup on the server side to recieve a nxevent or nxpush?

Is there are API for nxevent and/or nxpush in Java?

I have a user interface script running and I want to send events such as timing thresholds or up/down alerts...