Dependency script based on infrastructure [SOLVED]

Started by sperlm, November 12, 2013, 10:10:44 AM

Previous topic - Next topic

sperlm

Hi,

   ever since I saw the script for filtering out consequent alarms as shown on wiki page Situations Overview I was trying to think of not needing to specify what depends on what, but base it on actual topology instead.

I came up with an idea to filter out alarms based on topology availability. For example - if dependent node triggers alarm, it first checks if the node it is topologically dependent on is down or not.
Sorting nodes in Infrastructure based on this is logical anyway...

The infrastructure looks like this:
-Infrastructure Services
|- X01-locality
||- N-nodeA
||- D-dependency_node
||- X11-sub
|||- N-nodeB
|- X02-locality
...

The problem with the script might be trivial because I need some string related comparison for the names of containers and node names.

What I am trying to do is list node parents of "N-nodeB" with
nparents = GetNodeParents(FindObject($1));
foreach(i : nparents)
{
    println "Parent name='" . i->name;
}

... I get bunch of parents like this:

QuoteParent name='10.0.5.0/24'
Parent name...
...
Parent name='X11-sub' <=Infrastructure Services Folder

At this point I cannot get the right command to filter out just the infrastructure folder "X11-sub" because I need to compare just a part of the name, not the whole string. I only need to see if the name starts with "X" and that cannot be done by - if i->name=="X".

After that I will be looking for object parents of that folder, then for object children and find those with dependency status "D" and trigger the alarm based on availibility of that node.

I cannot get past this "compare based on part of the string" part only... any hints?

With regards

Milan Sperl


Victor Kirhenshtein

Hi!

First filtering you can do based on object class - that way you'll filter out subnets, etc. Then you can use like or match operations to filter objects by part of name. For example, to match only containers with names started with "X-" you can use the following script:


nparents = GetNodeParents(FindObject($1));
foreach(i : nparents)
{
    if ((i->type == 5) && (i->name like "X-*"))
    {
        println "Parent name='" . i->name;
    }
}


Note that like operation is case sensitive - if you need case insensitive comparison, use ilike.

Best regards,
Victor

sperlm

#2
Thanks for the help. This is what I came up with:

The idea is to filter alerts and to prioritize those that are not secondary in terms of node down status.
If node is down and node above in the infrastructure is also down, the chances are high that the problem is only at the uppermost node and all other node down events are only "flooding" the supervisors attention.
Assuming I have nodes sorted in the infrastructure to fit the needs - based on dependency.

There was one thing to decide first - whether to filter the events or just actions based on the events.

a) events start only for nodes that fulfill the filter criteria
pro: event log and/or actions based on all events contain only the uppermost alerts
con: if there are problems with other nodes too (or if the problem starts in the meantime), after the uppermost node is resolved (up) - other problems are not shown in the log or taken action upon

b) events start and show for each node down but actions (e-mails) start only for events that pass the filter
pro: solves the problem in variant a)
con: event log contains all down nodes for the duration of the uppermost node downtime, effectively making the view unusable (if there are alot of depending nodes)

I decided for variant b) (with tagging the events to have some sort of control in the event log)

Event Processing Policy
- create a "copy" of the default rule 1. (Show alarm when node is down)
Filter:
- IF: SYS_NODE_DOWN
- AND:
nparents = GetNodeParents($node); // get parent objects of tested node
foreach(i : nparents)
{
if (i->type == 5) // if it is infrastructure object (container)...
{
oparents = GetObjectParents(i); //... get parents of the object too...
foreach(j : oparents)
{
if (j->type == 5) //... and if it is infrastructure object (container) again...
{
children = GetObjectChildren(j); //... get object children...
foreach(k : children)
{
if ((k->status == 4) && (classof(k)!="NetObj")) return 0; //... and if the status of any of the children objects is "critical" (bypassing non-node objects) do not take action.
}
}
}
}
}
return 1; // else allow action

Action: send e-mail (for example)

I used the same script as macro in the events SYS_NODE_DOWN to tag it whether the e-mail was sent or muted.
Basically adding the script into script library too and adding to the SYS_NODE_DOWN the %[Dependency] macro which is the same as above with the exceptions:
- "return 0" changed to "return " (e-mail muted)"
- "return 1" changed to "return " (e-mail sent)"

I decided not to differentiate what node above the chain is down (basically if any of the upper ones is down it means all below will go down as well anyway), but can pick with the like operation if needed.

Any thoughts on drawbacks ?