Hi,
I just setup NetXMS in a test lab and I must say that it is a wonderful product. The problem that I face is that when I add a service check for World Wide Web Publishing services with the following parameter - System.ServiceState("W3SVC") - I do not get the proper result.
Can anyone help or suggest what status codes need to be used?
Thanks,
Maxknight
Hello!
System.ServiceState can return the following codes:
0 - service running;
1 - service paused;
2 - service starting (start pending);
3 - service pausing (pause pending);
4 - service starting after pause (continue pending);
5 - service stopping (stop pending);
6 - service stopped;
255 - unable to get current service state.
Hope this helps!
Best regards,
Victor
Hi Victor,
Can't thank you enough. :)
Wish you a Happy New Year.
Wish a Happy New Year!
How would I create alarms for a monitored service?
If I use "get" with the ServiceState, I get back 0...which means running.
So I create a threshold that says when not equal to 0.
That's all good, when it's true it triggers an event I created COP_SERVICE_DOWN. But actually I guess it would be better to say "unknown" instead of down, as if 1 was returned it would actually be paused and not down.
Recommendations?
Hello!
If you wish to distinguish between different states, you can create multiple thresholds and events - for example, create separate event SERVICE_PAUSED and for service status DCI create additional threshold "when equal 1" and generate event SERVICE_PAUSED from it. It depends on your environment and requirements - there are no universal tips. From my experience, in most cases considering service as down if it's not running is ok, because it cannot process requests anyway, in whatever state it is, and it's the most important information - actual state is not so important.
Best regards,
Victor
I guess for the most part, if it's not "started" then an alert should be generated.
Let's say I did want the detailed monitoring capability, and I create events for the different conditions...what would be appropriate to set the "when false" to
So lets say I have "when equal to 2" Generates event COP_SERVICE_STARTING
What would false be, COP_SERVICE_UNKNOWN?
But then it would report unknown for 0, which means started?
Are all of the thresholds evaluated (I know there's a check box, but I didn't really understand what it changed) and the one that matches best triggered?
I am making this difficult aren't I?
The only reason I ask is because some of the VMware Services on the Virtual Infrastructure Server I have had trouble with getting stuck in "Starting" and I would like to see that logged.
I suggest the following scheme:
condition | event when true | event when false |
equal 1 | COP_SERVICE_STARTING | COP_SERVICE_OK |
not equal 0 | COP_SERVICE_DOWN | COP_SERVICE_OK |
Uncheck "always process all thresholds". Threshold order is important.
Then you will get COP_SERVICE_STARTING if service is in starting state, COP_SERVICE_DOWN if service in state other than running or starting, and COP_SERVICE_OK when it returns to running state.
If you wish to create and terminate appropriate alarms accordingly, you can add the following rules to event processing policy:
Rule 1
Event: COP_SERVICE_STARTING, COP_SERVICE_DOWN
Alarm: Generate alarm, text: %m, key: SERVICE_PROBLEM_%i_%5
Rule 2
Event: COP_SERVICE_OK
Alarm: terminate alarm with key SERVICE_PROBLEM_%i_%3
Then you will have active alarm with appropriate text when service is not running, and it will be automatically terminated when service goes back online. If service will go from starting to, for example, stopped state, text of alarm generated by previous COP_SERVICE_STARTING event will be replaced by message text of COP_SERVICE_DOWN event, so you will have actual information in your alarm browser.
Hope this helps!
Best regards,
Victor
P.S. Also there is a description of threshold checking algorithm in NetXMS user manual, in section 5.2.3.3.
Thanks
Couple more questions
in Rule 1 there is "SERVICE_PROBLEM_%i_%5" but in Rule 2 that can clear the alarm generated by Rule 1 it is "SERVICE_PROBLEM_%i_%3"
What are the %5 and %3 doing? Why are they different?
In the built-in event processing, for example service down and the event that clears it both end in %i_%1.
I understand %i is the unique ID of the event, so isn't that all that is really needed?
Hello!
%i is a unique identifier of the event's source object (usually node). If you plan to monitor only one service per node with these events, then using just %i is ok, but if you monitor more then one service running on same node and generate same events for diferent services, than you have to distinguish alarms nnot only by node id, but also by service. %3 and %5 is event-specific parameters, number 3 and 5 respectively. For events generated when threshold condition becomes true, parameters are following:
1) Parameter name
2) Item description
3) Threshold value
4) Actual value
5) Data collection item ID
6) Instance
For events generated when threshold condition returns to false, parameters are following:
1) Parameter name
2) Item description
3) Data collection item ID
4) Instance
So in my example I construct alarm key from node id and DCI id.
You can find list of parameters for any predefined event by opening Control Panel -> Events -> Edit appropriate event record and looking at the description field.
All possible macros for event processing policy can be found in NetXMS user manual or here: https://www.netxms.org/documentation/macros.shtml (https://www.netxms.org/documentation/macros.shtml)
Best regards,
Victor
Everything is starting to make more sense now and you already answered my next question about the variables when a threshold is false, I didn't remember seeing that in the manual (I'm guessing I just missed it).
Thanks for the quick response.
I am currently evaluating network management solutions to implement. I have currently installed and configures HypericHQ, OpenNMS, (and the two aforementioned integrated with one another), Zabbix and ZenOSS. I actually on stumbled upon NetXMS while looking for how to do something in OpenNMS.
Quote from: StarryTripper on March 14, 2008, 02:05:32 PM
I am currently evaluating network management solutions to implement. I have currently installed and configures HypericHQ, OpenNMS, (and the two aforementioned integrated with one another), Zabbix and ZenOSS. I actually on stumbled upon NetXMS while looking for how to do something in OpenNMS.
And what's your impression for now? What is good or bad?
The need for an agent is certainly a turn off. Especially since it seems that many of the values the agent allows you to retreive are available through SNMP (CPU utilization, Network Utilization, Disk Space) while I understand others are not (service status).
The lack of a robust Web interface and a W32 only management console is also a downside. Though, I am happy with the speed of the client server model as opposed to an AJAX GUI.
The ability to easily visualize the chain of events is nice. The alarm features in OpenNMS for example are cumbersome.
I will continue to work with it in my spare time as well as OpenNMS. I hope to reach a conclusion by July and purchase a support agreement with whomever I choose.
Quote from: StarryTripper on March 14, 2008, 09:45:03 PM
The need for an agent is certainly a turn off. Especially since it seems that many of the values the agent allows you to retreive are available through SNMP (CPU utilization, Network Utilization, Disk Space) while I understand others are not (service status).
Usage of NetXMS agents is not mandatory - you can use NetXMS in SNMP only environment as far as installed SNMP agents provides you with all required information. We create our own agents because it is usually easier to configure data collection from agent, and usage of an agent gives you some additional benefits:
- Strong encryption of connection between server and agents if needed (using AES-256, IDEA, Blowfish or 3DES);
- Proxy functionality - you cann access on host A via agent on host B, if host A not directly reacheable from NetXMS server (firewalled or NATed, etc.);
- SNMP proxy functionality - access remote SNMP devices not directly, but via NetXMS agent - can be useful if these SNMP devices not accesible directly or you wish to improve security;
- Execute commands on remote servers in reaction to events;
- You can extend agents easily;
- You can have centralized agent configs if you need.
Best regards,
Victor
Quote from: Victor Kirhenshtein on March 16, 2008, 09:05:15 PM
... and usage of an agent gives you some additional benefits:
Also, I should note, that you need to deploy them by hand only once. Later, when agent is up and running, you can upgrade it from management console in few clicks.
Hi Victor!
Need more info about this: Data Collection Items -> Data -> Parameter
Exactly -
"Status" parameter.
What codes returns this parameter?
Similarly like this?
Quote from: Victor Kirhenshtein
Hello!
System.ServiceState can return the following codes:
0 - service running;
1 - service paused;
2 - service starting (start pending);
3 - service pausing (pause pending);
4 - service starting after pause (continue pending);
5 - service stopping (stop pending);
6 - service stopped;
255 - unable to get current service state.
I have now "code 4" state from few server. And i don't understand what is it means.
It' an internal parameter (i.e. it represents information existing inside NetXMS server, not on target node). Internal parameter Status represents current node's status in NetXMS encoded as follows:
0 = Normal
1 = Warning
2 = Minor
3 = Major
4 = Critical
5 = Unknown
6 = Unmanaged
There also internal parameter ChildStatus(), which represents status of given node's child object. In addition to status codes listed above, interface objects can have the following additional status codes:
7 = Disabled
8 = Testing
Best regards,
Victor
thanks! =)
i understand now.
great support!
Quote from: Victor Kirhenshtein on July 02, 2008, 10:46:53 AM
It' an internal parameter (i.e. it represents information existing inside NetXMS server, not on target node). Internal parameter Status represents current node's status in NetXMS encoded as follows:
0 = Normal
1 = Warning
2 = Minor
3 = Major
4 = Critical
5 = Unknown
6 = Unmanaged
There also internal parameter ChildStatus(), which represents status of given node's child object. In addition to status codes listed above, interface objects can have the following additional status codes:
7 = Disabled
8 = Testing
Best regards,
Victor
I've noticed that I get status 4 when I unplug a network cable to a managed object. I was wondering if there is any other condition other than not being able to communicate with an object (because it's unplugged, or turned off) that can lead to a status of "critical". IE if I see this status, can I safely assume that my server cannot talk to the managed device?
Thanks.
Tony
Hi!
It depends on your configuration. For node objects, status calculated based on two sources - statuses of child objects (interfaces and network services), and active alarms for the node. So, if you, for example, have outstanding critical alarm, node status will be critical even if it is reachable by management server. Also, default status calculation algorithm takes most critical status of child objects, so if you have managed device with multiple interfaces, one interface in down state will cause critical status for entire device, although it is still reachable via other interfaces.
You can safely assume that critical status mean "node unreachable or down" only if you have configured system in that way. In default configuration this is not the case.
Best regards,
Victor