Replacing custom solution (elasticsearch and riemann) with NetXMS

Started by mgiammarco, January 01, 2021, 07:00:09 PM

Previous topic - Next topic

mgiammarco

Hello,
I have (like many) a complex situation to monitor with information scattered in:
- syslog
- snmp
- graphite/collectd sensors

I was building a custom solution with elasticsearch, riemann, influxdb and so on
But looking for a snmp input solution I have found NetXMS and now I am rebuilding my monitoring on it.
But now I have several questions (yes I am reading and reading and reading docs):

- I am trying syslog parsing and my (simple) regular expression does not work. How can I debug it? Is there a way to try it on already received syslog messages? I cannot reproduce easily the syslog message to test
- when I receive a syslog line "backup finished" I must check if time is above 8.00 (slow backup), how can I do it?
- suppose I need to test that backup are executed, so I need to check that a syslog "finished backup" is present at least once a day, can I do it with repeatCount?

Thanks in advance for any help!
Mario

Filipp Sudanov

Hi!

1) as discussed in another thread, under linux logger utility could be helpful to generate syslog messages
2) A matched syslog line generates an event, so we have to do the logic in Event Processing Policy (EPP). There you have filtering script - if it's not empty, event will be processed further only if that script returns true. You can just take current time in that script and check it:
now = localtime();
return (now->hour >= 8);

3) I am not exactly sure how to use the repeatCount in syslog processing, but anyway reaction to syslog only happens when a new syslog message is received. Here we need something in another place that would be executed regularly, event if there will be no syslog messages from a given node. A possible approach could be:
- when you receive backup successful message, store current unixtime in custom attribute of the node. You can do this in filter script in EPP:
$node->setCustomAttribute("lastBackupTime", time());
- create a DCI with origin "Internal" and parameter "Dummy".
- in transformation script add the following:
return time() - $node->getCustomAttribute("lastBackupTime");
as the result this DCI will have time in seconds since last backup.
- add threshold so that an event will be generated when time since last backup is too big.