Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - blazarov

#1
Hello,

On yesterday Q&A Webex session i've got a great idea on how to solve a challenge that i have for years - have generic EPP & Email notification actions, while selecting email recipients based on user's email addresses and group membership.

Apparently this can be implemented elegantly using the new feature Responsible Users.

Users and group objects can be set as Responsible Users on nodes and containers.
Then NXSL have the necessary functionality to write a script that finds the responsible users for a given node and return a list of their email addresses.
Then this script can be run through a macro in email notification action recipient address field - hopefully with a $node parameter. So that this script always returns the full list of email recepients for this particular node.

All above combined together will deliver exactly what i need - if i understand it correctly.

I spent few hours trying to develop such a NXSL script, but failed miserably. Can someone help with some hints? I am sure this will be beneficial for the community, not just me.

Thanks!
#2
Hello,
we recently upgraded to v3.7, but noticed very wierd and unfortunately very destructive behavior.

The nodes keep changing their primary hostname without human intervention. The new address is some of the addresses of the nodes, but that completely breaks our monitoring.

Is that expected behaviour and how can we disable it? It is completely ruining our monitoring and other support processes.

Obviously we have set "Prevent automatic SNMP configuration changes", but that does not help.
#3
Hello,
shortly after upgrade to 3.7 we have decided to give it a try of the network discovery.
Initially it worked seemingly well, but we decided to not use it for multiple reasons.
However it seems to me that it never stops, regardless of disabling it from configuration->network discover-> general settings: Disabled
Even after netxmsd restart devices keep appearing in the Zones' subnets.

Also there is a significant increase in load of netxms server and agent proxies that i believe is caused by the continous network discovery.

is there a way i can actually check if discovery is acutally happening and a way to forcefully disable it?
#4
Hi,
I just noticed that i am missing the server debug console from NXMC (both windows app and web) after the upgrade to 3.7.

- is that expected?
- what are the alternatives?
#5
Hello,
i have always wondered what is the deal behind the duplicate actions in the console tools->info->.. menu.

I can understand that some actions can be executed using Agent or SNMP, but there are few that say exactly the same thing but are duplicate.

Is there any actual function or is it just a bug?

Thanks!
#6
Hello,
For certain cases it would be very useful it we could specify as a DCI parameter source IP address for the Service Monitoring connections ServiceCheck.*

Our use case is we are monitoring 2 different paths to the same destination, where you select the path based on the source IP you are using to connect.

thanks in advance!
#7
General Support / Performance tab graphs - Auto color
January 17, 2020, 12:56:53 PM
Hello,
We are running version 3.1-300

We are heavily using instance discovery and the performance tab for network devices. It works very very well and way it is developped gives us the flexibility we need to cover wide variety of scenarios.

Often we are using the grouping function which is also really nice, but i am wondering if there is a way to use automatic color selectionl, so that automatically created and grouped DCIs by instance discovery end up with distinct colors in the same graph?
#8
Hello,
recently we noticed this occuring more and more. Without any human intervention some nodes change their SNMP version from v2c to v1.
Unfortunately that breaks most of the data collection, since i suppose some of the OIDs, or maybe 64-bit counters, are not supported over v1.

My question is - are there any normal circumstances where this change is expected to occur or that sounds like a bug?

I've seen some devices that it is easy to reproduce - every configuration poll changes v2c to v1, even though polling over v2c works fine.

For others the issue is not reproducible, but anyways we find them at some point changed to v1.

Thanks in advance!
Regards,
#9
Hello,
we are monitoring hundreds of network devices with many interfaces each.
Interface expected state is a neat and useful feature, but the default behavior of propagate as "Critical" does not fit our use case.

When a device is completely unreachable - that is critical for me, whereas when an interface that is expected to be UP goes DOWN, that is probably WARNING or MINOR.
and that covers 100% of our nodes and use cases.
So my question is - what would be the most simple and elegant way to globally change the behavior?
I know how to change it on a per-port basis - properties on port; set propagate as->fixed value->warning
I suppose i can make a script that regularly iterates all interfaces on all nodes to set that, probably also a script on a Hook (more optimal), but i was wondering if there is just a simple global way to configure it?
#10
Hello,

in the recently passed few weeks we noticed brand new issue:
one or few of the agent tunnels appear as unbound without anyone changing anything. Ofcourse data collection stops at that time until you manually bind the tunnel again to the appropriate node.

We use the agents as proxies, so lots of nodes are affected on each occurence.

Any ideas are welcome on how to approach the troubleshooting of this issue!

Thanks!
#11
Hello,
i am trying to setup a nice template for monitoring F5 load balancers and their services with instance discovery.
I stumbled upon the way they are encode their object names in SNMP OIDs. I would like to use the decoded human readable string in NetXMS DCI instead of the long encoded code. Here is an example:

root@nxagent:~/.snmp/mibs# snmpwalk -v2c -c XXXXX 192.168.120.YYY iso.3.6.1.4.1.3375.2.2.5.4.3.1.11.31.47.67.111.109.109.111.110.47.98.111.107.95.102.105.110.98.114.105.100.103.101.95.112.114.111.100.95.112.111.111.108.40.47.67.111.109.109.111.110.47.98.111.107.95.112.114.111.100.95.49.48.46.49.48.50.46.49.52.57.46.52.49.95.102.105.110.98.114.105.100.103.101.35500
iso.3.6.1.4.1.3375.2.2.5.4.3.1.11.31.47.67.111.109.109.111.110.47.98.111.107.95.102.105.110.98.114.105.100.103.101.95.112.114.111.100.95.112.111.111.108.40.47.67.111.109.109.111.110.47.98.111.107.95.112.114.111.100.95.49.48.46.49.48.50.46.49.52.57.46.52.49.95.102.105.110.98.114.105.100.103.101.35500 = Gauge32: 10

root@nxagent:~/.snmp/mibs# snmpwalk -m F5-BIGIP-LOCAL-MIB -v2c -c XXXX 192.168.120.YYY iso.3.6.1.4.1.3375.2.2.5.4.3.1.11.31.47.67.111.109.109.111.110.47.98.111.107.95.102.105.110.98.114.105.100.103.101.95.112.114.111.100.95.112.111.111.108.40.47.67.111.109.109.111.110.47.98.111.107.95.112.114.111.100.95.49.48.46.49.48.50.46.49.52.57.46.52.49.95.102.105.110.98.114.105.100.103.101.35500
F5-BIGIP-LOCAL-MIB::ltmPoolMemberStatServerCurConns."/Common/bok_finbridge_prod_pool"."/Common/bok_prod_10.102.149.41_finbridge".35500 = Gauge32: 10

root@nxagent:~/.snmp/mibs# snmptranslate -m F5-BIGIP-LOCAL-MIB iso.3.6.1.4.1.3375.2.2.5.4.3.1.11.31.47.67.111.109.109.111.110.47.98.111.107.95.102.105.110.98.114.105.100.103.101.95.112.114.111.100.95.112.111.111.108.40.47.67.111.109.109.111.110.47.98.111.107.95.112.114.111.100.95.49.48.46.49.48.50.46.49.52.57.46.52.49.95.102.105.110.98.114.105.100.103.101.35500
F5-BIGIP-LOCAL-MIB::ltmPoolMemberStatServerCurConns."/Common/bok_finbridge_prod_pool"."/Common/bok_prod_10.102.149.41_finbridge".35500

As seen above when i load the MIBs in the standard linux net-snmp tools it does it automatically, but i couldnt find a way to implement it in NetXMS DCI/Instance discovery, apart from writing the code for translation myself in the instance discovery filtering script.

Any ideas?
#12
Hello,
I've noticed since some time (probably since the last server upgrade) that "Force DCI Poll" does not work - neither from last values, nor from DCI configuration.
Anybody else having the same issue?
I am wondering if this is a general bug, or something specific to my environment.

Thanks!
#13
Hello,
recently i noticed that changes that we make on the templates get somehow lost after netxmsd restart.
This is extremely annoying, mostly because of the wasted time in fine tuning stuff.  >:(

Anyone faced that?
Any ideas on how to approach troubleshooting it?
#14
Hello all,

I have an idea that will greatly help me in day to day activities, but i am not sure if it is possible to implement and how.

I am using DCI summary tables which are very very cool.

What i need is to somehow count the total entries in the DCI summary table, along with counting different matches and store them in a separate DCI so i can graph them.

For example i have now a perfectly working DCI summary table called Backup Status looking like this:


Node         Backup Status
srv1          OK; blablalba details
srv2          FAILED; blablabla details
srv3          WARNING; blablabla details
srv4          OK; blablalba details
srv5          OK; blablabla details
srv6          OK; blablabla details

Ideally i would like to configure 4 DCIs that execute the DCI summary table and store the counts of:
- count of total DCI summary table entries (6)
- count of DCI summary table entries matching BackupStatus like OK* (4)
- count of DCI summary table entries matching BackupStatus like WARNING* (1)
- count of DCI summary table entries matching BackupStatus like FAILED* (1)

That way i can have a nice history of the success rates of my backups over time.

So any ideas how to achieve that?
#15
General Support / Monitoring disk latency in Linux
October 17, 2017, 03:26:16 PM
Hi,

I need to monitor disk I/O for some of our Linux machines, precisely:
1. IOPs
2. disk throughput
3. avg latency

1 and 2 are easy since they are natively supported in nxagentd - System.IO.ReadRate, System.IO.WriteRate, System.IO.BytesReadRate, System.IO.BytesWriteRate

About latency - so far it seems my best chances are parsing iostat's output. I can imagine a way to make it work, but it would be very ugly and probably fragile. Mostly because i should execute iostat on a regular basis and write its output to a local file. Then use nxagent to parse the output.

Another idea would be to parse /proc/diskstats directly in netxms, but i dont think there's a latency counters there..?

Does anyone have an idea on how to realize it in a more clean and robust way?

Also i dont entierly understand if DiskTime has direct relation to the latency?
#16
Hello team,
we are extensively using the NXAgent local caching (data reconciliation) function which is great.
Recently we've found out a nasty limitation of the implementation based on SQLite.

When its a busy agent with lots of DCIs and for some reason it loses connectivity with the server for relatively long period (several hours +) its local SQLite cache database (dc_queue table in particular) quickly gets large. The larger it gets the slower the select queries become which results in longer periods between the reconciliation operations between the agent and the server. This basically leads to a snowball effect where it gets worse and worse with the time and it could never catch up to sync all cached data with the server and start sending the "fresh data".
Now we have an agent that has 7+ million of rows in the dc_queue table in the SQLlite and every select query takes around 40-50 seconds to complete:

[02-Jun-2017 10:22:31.816] [DEBUG] Long running query: "SELECT server_id,dci_id,dci_type,dci_origin,status_code,snmp_target_guid,timestamp,value FROM dc_queue WHERE server_id=3095327485888869043 ORDER BY timestamp LIMIT 1024" [43475 ms]
[02-Jun-2017 10:22:32.638] [DEBUG] ReconciliationThread: 1024 records sent
[02-Jun-2017 10:23:12.913] [DEBUG] Long running query: "SELECT server_id,dci_id,dci_type,dci_origin,status_code,snmp_target_guid,timestamp,value FROM dc_queue WHERE server_id=3095327485888869043 ORDER BY timestamp LIMIT 1024" [40224 ms]
[02-Jun-2017 10:23:14.085] [DEBUG] ReconciliationThread: 1024 records sent
[02-Jun-2017 10:23:58.584] [DEBUG] Long running query: "SELECT server_id,dci_id,dci_type,dci_origin,status_code,snmp_target_guid,timestamp,value FROM dc_queue WHERE server_id=3095327485888869043 ORDER BY timestamp LIMIT 1024" [44448 ms]
[02-Jun-2017 10:23:59.432] [DEBUG] ReconciliationThread: 1024 records sent
[02-Jun-2017 10:24:43.598] [DEBUG] Long running query: "SELECT server_id,dci_id,dci_type,dci_origin,status_code,snmp_target_guid,timestamp,value FROM dc_queue WHERE server_id=3095327485888869043 ORDER BY timestamp LIMIT 1024" [44115 ms]
[02-Jun-2017 10:25:13.633] [DEBUG] ReconciliationThread: timeout on bulk send
[02-Jun-2017 10:26:03.604] [DEBUG] Long running query: "SELECT server_id,dci_id,dci_type,dci_origin,status_code,snmp_target_guid,timestamp,value FROM dc_queue WHERE server_id=3095327485888869043 ORDER BY timestamp LIMIT 1024" [49917 ms]
[02-Jun-2017 10:26:04.483] [DEBUG] ReconciliationThread: 1024 records sent
[02-Jun-2017 10:26:49.148] [DEBUG] Long running query: "SELECT server_id,dci_id,dci_type,dci_origin,status_code,snmp_target_guid,timestamp,value FROM dc_queue WHERE server_id=3095327485888869043 ORDER BY timestamp LIMIT 1024" [44614 ms]
[02-Jun-2017 10:26:50.456] [DEBUG] ReconciliationThread: 1024 records sent
[02-Jun-2017 10:27:35.038] [DEBUG] Long running query: "SELECT server_id,dci_id,dci_type,dci_origin,status_code,snmp_target_guid,timestamp,value FROM dc_queue WHERE server_id=3095327485888869043 ORDER BY timestamp LIMIT 1024" [44006 ms]
[02-Jun-2017 10:27:38.335] [DEBUG] ReconciliationThread: 1024 records sent
[02-Jun-2017 10:28:30.247] [DEBUG] Long running query: "SELECT server_id,dci_id,dci_type,dci_origin,status_code,snmp_target_guid,timestamp,value FROM dc_queue WHERE server_id=3095327485888869043 ORDER BY timestamp LIMIT 1024" [51862 ms]
[02-Jun-2017 10:28:34.130] [DEBUG] ReconciliationThread: 1024 records sent
[02-Jun-2017 10:29:26.638] [DEBUG] Long running query: "SELECT server_id,dci_id,dci_type,dci_origin,status_code,snmp_target_guid,timestamp,value FROM dc_queue WHERE server_id=3095327485888869043 ORDER BY timestamp LIMIT 1024" [52457 ms]
[02-Jun-2017 10:29:27.455] [DEBUG] ReconciliationThread: 1024 records sent

this results in a reconciliation rate of around 1500 records per minute which is much lower than the rate of the new data coming, so we are already in a snowball situation that is getting worse and worse and will never catch up.
So far the solution that we have is to delete the SQLite database and that immidiately fixes the situation, but costs us losing valuable monitoring data. Unfortunately the database format is very different between the agent and the server and so far we havent found a working way where we can "manually" dump SQLite and then import to server database. That would be a nice option to manualy fix such situations.

The hardware that runs the NXAgent is pretty decent and unfortunately giving the VM more CPU/RAM or putting it to faster storage (even tried all-flash storage) does not help significantly. It seems to me that SQLite is capped to using just one core.

So after this long introduction i have several questions:

  • Is there an option to use real database such as MySQL or PostgreSQL for the local agent caching DB? Since this is very critical for us we can live with some advanced installation or configuration just to make it work
  • Does my understanding and analysis of the issue and the cause make sense?
  • Any other ideas how we can solve our problem or maybe workaround it?
#17
Hi,
I was thinking that it would be very good if there is way to access particular graphs outside of the NetXMS console.
Is there such thing available right now? For example if i need to show some graph on a standard web site?

I was thinking of two possible soltions:
1) Provide access to a "dynamic graph web API" which is just a web service that you can call with GET/POST attributes for the graph (size, DCIs, time period, etc...) and it returns a rendered PNG on the fly. Or maybe you create the graph you want as a predefined graph and just "call" it on the web API to get the rendered PNG for use.
2) Set a scheduled PNG file creation in a specified location.

Any ideas on how to realize that?
#18
Hi,
I have developped and been using an NXSL script for interface instance discovery using SNMP for over an year now.
Although it is working fine it keeps on throwing alarms for script execution errors on the server node. I am pretty sure that the cause for those errors in 99% of the cases is just SNMP timeout. I have many of the monitored nodes with slow and unreliable connection so this is expected and more or less inevitable.

So now i am looking for a way to enhance my script in a such way that it catches and handles such problems and just quietly abort until the next execution. I have looked int the documents and in the forum, but couldnt find a solution.
Ideas anyone?


Here is my instance discovery script:
snmp = CreateSNMPTransport($node);
ifName = SNMPGetValue(snmp, ".1.3.6.1.2.1.2.2.1.2." . $1);
ifName .= " ";
ifName .= SNMPGetValue(snmp, ".1.3.6.1.2.1.31.1.1.1.18." . $1);
if (ifName ~= "Loopback.*") {
return %(false, $1, ifName);
} else {
return %(true, $1, ifName);
}


Screenshot of the alarms is attached.

Thanks in advance!
#19
Hi,

These days I am experimenting with the Performance tab and I am pretty happy with the results. The only thing that I am missing is user-editable time range, but I already registered a feature request for that. :)

I noticed something weird that might be a bug or feature?

There is no legend in the bottom of the graph UNLESS there is no second DCI attached. I've tried all options, but can not get the legend to appear in a single-DCI graph.
So - is this a bug or am I missing something? :)
Example in the included screenshot.
#20
Hi,
I would like to suggest a new feature that will allow setting a grace period before deleting the DCI's  of missing instances after instance discovery.

This will be useful in cases when monitoring devices that have their interfaces changes often. It will allow for keeping the data collected for a given period before permanently deleting the data.

Thanks!