Critical Paths

Started by bulwer, January 09, 2013, 02:35:48 PM

Previous topic - Next topic

bulwer

Sorry, me again! I have got our system set up to monitor our 35 remote sites and send an email if the node is down for more than 5 minutes which works perfectly thanks to your earlier help. However, when the connection to our head office went down the other night, we come in to around 200 emails telling us that every node was down and then back up again.

Can you suggest an easy way of changing the setup so we don't get emails if the main line goes down? Before settling on NetXMS, I tried OpenNMS and they had a critical path feature whereby you specify the Node connections it relies on so you don't get notifications if the critical path is down. I'm sure this can be done using individual rules but with a router and at least one, if not several servers at each site, I don't really want to have to create a separate rule for each node.

I have also set a check for our website which is hosted elsewhere. I am wondering if a check in the rule that if the website is down as well then don't email but that would stop monitoring working properly if the website is down for any length of time.

Any advice would be welcome.

Victor Kirhenshtein

Hi!

In 1.2.5 we improve event correlation based on topology, so if NetXMS knows full path between itself and remote node it should correlate events automatically and only pass NODE_DOWN event for router. You can check this by selecting management server node object in object tree, then select "IP Route" from context menu, select remote node, and check that correct IP path is shown.

Sorry, I didn't fully understand web site question. Can you give a simple scenario please?

Best regards,
Victor

bulwer

Quote from: Victor Kirhenshtein on January 09, 2013, 02:44:18 PM
Hi!

In 1.2.5 we improve event correlation based on topology, so if NetXMS knows full path between itself and remote node it should correlate events automatically and only pass NODE_DOWN event for router. You can check this by selecting management server node object in object tree, then select "IP Route" from context menu, select remote node, and check that correct IP path is shown.

Sorry, I didn't fully understand web site question. Can you give a simple scenario please?

Best regards,
Victor

Sorry I wasn't clear. The Website check was an idea I had for stopping an email. For example, if a remote node is down, check if the website is down as well. If so, then don't send an email. It did occur to me that I could do this using Google (Which should never go down in theory!) so before emailing, check if Google is down. If so then do not email as it is very likely that the link from Head Office is down.

However, I hadn't seen that 1.2.5 had been released. I will upgrade and try IP routes. Thanks again for your prompt support and for your excellent software!

bulwer

Is there a way I can manually specify a link between nodes? It looks like by setting up a VPN connector between, say, 192.168.1.1 and 192.168.5.1, IP Route knows that it gets from 192.168.1.10 to 192.168.5.1 via 192.168.1.1 so hopefully this means that it will only pass a node down for 192.168.1.1 if the line goes down. However, we also have a lot of virtual servers. Is there a way I can setup a link between 192.168.1.12 and 192.168.1.10? If 1.10 (The physical server) is down then it follows that 192.168.1.12 (The virtual server) is down so I want to stop the node down for 1.12

As I added these nodes manually, I need to somehow let NetXMS know there is an extra step in the IP Route. Can I do this without adding a VPN link between the two? (I'm guessing this would work but would be wrong as it is a VM, not a VPN connection

Victor Kirhenshtein

Hi!

It's not possible to specify such relation manually. Anyway, it is layer 2 connectivity issue, not layer 3, as both nodes (physical host and VM) are in same subnet, so 192.168.1.10 is not involved in the routing process, but acting as Ethernet bridge. I will add a feature request for including layer 2 topology into event correlation process.

As a workaround, you can check status of the host node in the script in event processing policy, but this may not work as expected if VM polled first (so server don't know that host is already down).

Best regards,
Victor

bulwer

Quote from: Victor Kirhenshtein on January 09, 2013, 05:30:51 PM
Hi!

It's not possible to specify such relation manually. Anyway, it is layer 2 connectivity issue, not layer 3, as both nodes (physical host and VM) are in same subnet, so 192.168.1.10 is not involved in the routing process, but acting as Ethernet bridge. I will add a feature request for including layer 2 topology into event correlation process.

As a workaround, you can check status of the host node in the script in event processing policy, but this may not work as expected if VM polled first (so server don't know that host is already down).

Best regards,
Victor

Thanks - I might give that a go as the event I am using waits for 5 minutes before emailing so the server must be down at the same time as the VM.