Questions about NetXMS Event Processing Policy and Mail Notifications

Started by mauro.cmcc, April 14, 2021, 02:04:03 AM

Previous topic - Next topic

mauro.cmcc

Dear Users,

this is my first post here. I'm starting using NEtXMS 3.8.250 for the first time. So, we can say that I'm a newbie.
I would like to monitor the status of a lot of switches, servers, storage, and other entities that are running in our data center.

Because of the mixed entities, I decided to use SNMP, ICMP and ARP discovery settings (no agent).
Discovery went well, I added some nodes (servers and switches), some event processing policies have been enabled, I can see if the nodes are up or not and I'm receiving some notification emails.

But, now, I have some questions to do:

- I can see only 40 default event processing policies listed in EPP window; Do I need to create manually every standard hardware check policy? Do you know if there is a way to increase the number of default listed policies (EPP external bundle download)!?
- when an EPP is enabled, it covers all the defined nodes; is there a way to exclude only one node from a specific EPP check?

At this moment, I need only to know if a node is up and if it is healthy (no hw problems).
In the next future, I would like to print the elements inventory and the topology/map also.

Thank you very much in advance.
Regards,
Mauro

Filipp Sudanov

Hi!

These 40 EEP rules are there just for demonstration, in real production situation there might be more rules.

Systems logic is the following

1) Events are created when something happens. There are many source of events - DCI threshold violation/restore, node status change, log parser, SNMP traps, etc.
Each event stores a number of parameter with it, e.g. DCI parameter name, threshold and actual values, etc. The number of parameters is different for different events.
There are "event prototypes" where we can configure event name and user message, which can have macros (they are expanded when event is created).

2) Events are processed by EPP. It could be that some events are not configured in EPP at all - in this case the event would be just recorded to event log, nothing else would happen.
EPP rules can be very flexible - e.g. one rule can handle multiple events, it can work only for nodes in specific container (or except for some nodes/containers) - check conditions in EPP rule properties.

3) EPP can create an alarm - it's sort of a bug-tracker built in netxms that shows list of current problems. It's also possible to terminate alarms automatically when some other event comes. In order for this to work alarms have alarm key, which is made of some text and macros. Macros can have, e.g. ID of node and ID of a DCI - see default rules 19 and 20.


Now it depends on what exactly hardware checks do you need. Node connectivity and status of network interfaces is checked once per minute via status poll - if it's down, corresponding event will be generated. Other checks are usually configured as DCIs, e.g. cpu temperature, that DCI has threshold and an event specified in in DCI properties is generated on threshold violation. You can use generic SYS_THRESHOLD_REACHED event, but usually it's more convenient to have separate events for various groups of thresholds, e.g. something like DC_SERVER_TEMPERATURE, etc.