Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - Tursiops

#41
General Support / Table Thresholds not working(?)
July 25, 2016, 04:32:48 AM
Hi,

We are making extensive use of SNMP tables for monitoring things like CPU, RAM, PSU, fan and HDD/SDD health.
Threshold triggering in tables appears to be a bit hit and miss though.

For example right now I have a system with a disk that reports "Predictive Failure" in the HP System Management Homepage on the server itself.
Our NetXMS picks this up, transforms the numerical value reported into a string to match "Predictive Failure" - and shoul dthen alert on this.
The latter part is not working. I have seen this before with string matching and "equal to" and sometimes switching to "like" worked. In this particular case that didn't work either. Changing to "not like" just for testing triggered an immediate alarm, so it sure looks like what I am looking for does not match what NetXMS finds in the database?

See attached images showing the collected data (note: Display Name matches Column Name) as well as the configured Thresholds.
I also attached the template - nothing special really.

Now I am not sure if there may be other items in an unhealthy state right now, where NetXMS just doesn't create an alarm. The only reason I picked this one up was because I just added the system as a node to NetXMS while it was in that state already.

Thanks
#42
General Support / Templates creating duplicate DCIs
July 22, 2016, 05:52:00 AM
Hi,

I seem to have a lot of systems with duplicate DCIs.
Pretty much all DCIs on our installation are created via templates.
It appears that the system sometimes decides to create the same DCI a second time, although it already exists. I have not seen this happen live, I only see the results when the very same thing(s) show up twice on the same node.  It doesn't seem to be tied to a specific node, template or DCI, but happens across all nodes.

Not sure if this is due to a race condition or deadlock at the time of a regular node configuration poll?

Cheers
#43
Hi,

I just encountered a rather odd issue when I manually removed a node from a template and asked it to remove the DCIs.
The result was that not all DCIs from the template were actually removed - but two out of three DCIs from a completely different template were.

Not sure if anyone here has ever seen such a behaviour before?

Cheers
#44
Hi,

I noticed that when an instanced DCI triggers an alarm and the instance itself is then later removed, the alarm remains in the system.
Guess it would make sense if alarms were removed from the system together with DCI data when an instance is removed?

Along the same line, it would also be nice to be able to tell the system to keep an instance "active" for a (per instance DCI configurable) period, just in case the same instance comes back later. In our case we have some systems that have USB drives which we are monitoring for disk space. The drives are sometimes detached for some time, but do come back. Once the disk is detached though, the instance will eventually be removed, wiping the DCI data, keeping any potential "zombie alarms" and then creates a new DCI later when the disk is reattached.

I could think of other scenarios where keeping the "old" DCI data for a while would make sense, guess it all depends on how much one actually utilises instance discovery and instance discovery filters.

Cheers
#45
Hi,

I am trying to filter out interfaces which are adminState down as well as expectedState down via instance filter scripting.
I'm using the Net.InterfaceNames Agent List to get a proper name for the DCI itself and am having issues retrieving those interface states in the filter script.
As $1 is only the interface name, I used the FindObject function hoping to get the interface object to then allow me to check for adminState and expectedState.
Unfortunately that didn't work out, so I tried GetInterfaceObject, but that requires an interface index - which I don't get with the Net.InterfaceNames list.
Switching to Net.InterfaceList didn't help me either.

Does anyone have a code snippet to point me in the right direction?

Thanks
#46
Hi,

Not quite sure if these figures actually qualify as "lots".

I am trying to modify a template that (now) has 21 DCIs (8 of them are for instances) and is applied to around 300-400 systems at present.
The idea was to remove four DCIs (i.e. it used to have 25 DCIs) as they are not required.

Pretty much as soon as I close the template's DCI configuration, NetXMS doesn't react to anything (throws an error if I want to look at things) and after that the GUI says the server is not responding and disconnects. On the server itself I can see that NetXMS indeed simply crashed.
Restarting NetXMS leaves the template at now 21 DCIs, but the nodes assigned to the template still have all 25. Manually removing them works, but is very tedious.

For testing purposes I changed DebugLog to 6, then simply renamed one of the DCIs in the template (I also tested a different template with a similar number of nodes).

The result is that the logs are full of entries like these:
Node::onDataCollectionChange[..]: executing data collection sync
Node::onDataCollectionChange[..]: executing data collection sync for SNMP proxy [..]
ApplyTemplateThread: template=[..] updateType=0 target=[..] removeDci=false
Apply 21 items from template "[..]" to target "[..]"
Applying DCO "[..]" to target "[..]"

With the last one repeating once for each DCI.

After around 300 "executing data collection sync (for SNMP Proxy)" lines, logging simply stops and the NetXMS process is gone.
It looks like the updates for each node are not put into some queue/pool to be worked through, but the system tries to update everything at once and fails at doing so?

#47
General Support / Dashboards - Line Charts
June 24, 2016, 04:40:23 AM
Looks like Dashboards completely ignore the "Invert Values" checkbox as well as the difference between Line and Area chart.
All I see is a non-inverted line chart, no matter what I configure. Running NetXMS 2.0.4 server and console.

Any Ideas?
#48
General Support / SNMP agent is not responding
June 22, 2016, 12:58:00 AM
Hi,

We keep seeing systems reporting "SNMP agent is not responding", even though SNMP polling is working fine.
It appears this happens when a Windows Server, that act as its own proxy for SNMP connectivity, reboots.

Any ideas?
Thanks
#49
General Support / Syncer Thread flapping
June 21, 2016, 01:27:15 AM
Hi,

I keep receiving alerts on 'Thread "Syncer thread" is not responding'. Happens every few minutes and a few seconds later it's back to "Thread "Syncer Thread" was returned to running state'.

Not quite sure if it's related or not, but we also see systems that alert on "SNMP agent is not responding" and even when SNMP is back up and it is happily collecting data, the alert just stays until it is manually resolved. That doesn't happen for all SNMP alerts, but enough to be noticeable. Basically every morning I go through the alerts and remove those false positives.

We also have some systems that sometimes report that the NetXMS agent is not responding. However the agent is in fact up and running.
Running a "Status Poll" returns an "UNKNOWN" state. The only thing that brings such systems back is a full configuration poll.
It feels like the system is "stuck" in some state which is not reset unless a full poll is run.

Not sure if these are related to the Syncer Thread problem or if they are all independent issues?
(Still running NetXMS 2.0.3)
#50
General Support / NetXMS 2.0.4 - Ubuntu Repository
June 15, 2016, 01:53:05 AM
Hi,

I am still having trouble upgrading my NetXMS installation on Ubuntu 14.04.4 LTS.
When I run apt-get update followed by upgrade, it tells me there is nothing to update and leaves NetXMS at version 2.0.3.

Content of /etc/apt/sources.list.d/netxms.list:
deb http://packages.netxms.org/ubuntu trusty main

What am I missing?

Thanks
#51
General Support / Inverse Graphs in 2.0.4
June 10, 2016, 04:34:15 AM
Hi,

Since installing the NetXMS 2.0.4 Management Console I a m having issues with graphs with "Invert values".
If the values are of type "Area", they work fine. If they are of type "Line", the scale will change, but the line will not be inverted on the graph.s
So if the y-axis was going from 0 to 20M, it will now show -20M to 20M, but the line itself will not be inverted. As soon as I switch to "Area" it works fine.
Not sure if this is related to the actual values in the legend now not being inverted either. I recall in 2.0.3 when I inverted a value, it would show as negative in the extended legend, too. It no longer does that in 2.0.4 (which is good :) ).

Note: I have not upgraded the server to 2.0.4 yet (waiting for repository update for Ubuntu), but this looks like a Console, not a Server issue.

Has anyone else noticed this behaviour?
#52
General Support / Custom Network Drivers?
May 30, 2016, 03:54:44 AM
Hi,

I'm wondering if it's possible to build my own network driver or if there are any plans to put in some kind of customisable driver which allows adding OIDs for VLANs, hardware, port configuration, etc. per node? The latter might be more scalable than building custom drivers per device type?

Alternatively, what information would be required to build a driver for a (Brocade) switch that doesn't pick up VLANs, LAGs, components or ports with the current ones?

Cheers
#53
General Support / netxmsd segmentation faults
May 13, 2016, 07:23:58 AM
Hi,

For the last couple of days the netxmsd process on our server has been segfaulting randomly. Randomly in that debug logs ( at level 8 ) do not hint at the same thing being done prior to the crash. However it does usually take a couple of hours before the process crashes.

Not sure how to troubleshooting this?
I've been reading up on gdb and am following some suggestions from this post: http://stackoverflow.com/questions/16169022/debugging-a-running-daemon-using-gdb
Not sure if that is going to produce any useful output at all.

Anything else I can do to troubleshoot this?

Thanks
#54
I keep running into a problem when using templates that hold instance discovery DCIs which are to be attached to another DCI for display on the Performance tab.
In the template itself everything looks fine.
But as soon as the template is applied, the Performance tab will only show the "primary" DCI, not the attached one(s).
When I check the actual DCIs that are not showing, I receive a pop-up as I open them, stating
'Resolve DCI name' has encountered a problem.
Cannot resolve DCI name: Invalid DCI ID

The actual instance DCI itself already produces the above error message, hence all instances do, too.

To make matters more confusing, this doesn't always happen.
I have some instance discovery DCIs using the NetXMS agent which work just fine.
Now I just added some for network interfaces via SNMP and none of them work properly.


And just as I typed all of that, I think I found the issue (still leaving all the above text in case it helps someone in the future).
It's the order of the DCIs for instance discovery.
The first DCI in the list (by DCI ID, not name) must be the "primary", otherwise the "discovered" primary DCI doesn't exist by the time the second/third/etc. are trying to attach to them, hence the process fails. Makes sense, but feels counter-intuitive having to watch the order in which DCIs are entered when setting them up.

No idea if that's something that could be worked around in the code to make it less reliant on the order in which DCIs are entered? For example don't "attach" the DCIs until Instance Discovery has been completed?
#55
General Support / Raspberry Pi 3 - Compile Agent
May 04, 2016, 03:29:39 AM
Hi,

I'm trying to compile the NetXMS agent from source on a Raspberry Pi 3 (wouldn't mind just using a package instead :) ).
The following command line runs through ok:
sudo ./configure --with-agent --with-snmp --with-client --with-client-proxy
After that I run
make
make install

So far so good.

But when I want to actually run the agent (be it via nxagentd or sudo nxagentd, makes no difference), I receive this:
nxagentd: error while loading shared libraries: libappagent.so.2: cannot open shared object file: No such file or directory

So I ran ls -l /usr/local/lib/libappagent.* and got this:
-rwxr-xr-x 1 root staff  1144 May  3 23:55 libappagent.la
lrwxrwxrwx 1 root staff    20 May  3 23:55 libappagent.so -> libappagent.so.2.0.0
lrwxrwxrwx 1 root staff    20 May  3 23:55 libappagent.so.2 -> libappagent.so.2.0.0
-rwxr-xr-x 1 root staff 45432 May  3 23:55 libappagent.so.2.0.0


To me that looks ok. Not sure why nxagentd has a problem with this?
Any suggestions?
#56
General Support / SNMP Table Thresholds
April 22, 2016, 01:34:50 AM
Hi,

I noticed that the threshold settings in SNMP tables are somewhat limited compared to other DCIs, i.e. I can't do things like "trigger threshold if value is above X for Y checks".

Last night I received a whole range of alerts as a status variable switched from OK to something else for a single check (without looking at the database, I can't see what that something else was, as Tables don't have a "History" option for showing previous data). It would be preferable if I could configure the threshold to only trigger if the status is not "OK" for say three checks. Any chance to achieve this within NetXMS?

Thanks
#57
Hi,

When a DCI that has the option Show last value in object overview ticked is disabled, the last value collected remains visible in the object overview.
Not sure if that is on purpose, but it seems a bit counter intuitive to show something that's been disabled in the overview, especially as it's not indicated in that overview that this DCI has been disabled. I did not double-check the Show last value in object tooltips, but I would guess it will have the same behaviour.
#58
General Support / Unsigned Integer DCI and diff()
April 19, 2016, 02:24:35 AM
I've been monitoring UPS runtime, and as that really should never go into a negative value, I thought I'd use an unsigned integer for the DCI.
I also used a threshold to alert if the runtime dropped by 20 minutes within 1 minute, kind of as an indicator that the battery is on its way out.

The result of the above setup is that the threshold gets triggered a lot. Every time the runtime goes down, it triggers.
For all I can tell this is due to the unsigned integer. If the the difference to the previous value is negative (i.e. from 120 to 116, a difference of -4), it compares the threshold of 20 with 4294967292 (i.e. 2^32 - diff). I don't think the unsigned integer class for the actual value should apply to the result of the diff?
Or did is this some other odd issue I ran into? Since switching from Unsigned Int to Integer, I'm not getting these alerts anymore.
#59
Hi,

I am trying to achieve the following:
- Create a single template with a DCI
- The DCI needs to be passed multiple parameters. These will differ per node.
- Assign the template to a system based on the existence of a number of custom attributes on a node. That part is not a problem.
- Use the values of these custom attributes as parameters for the DCI. That's the part that I can't get to work.

I've been searching and ran into this old post from 2013: https://www.netxms.org/forum/general-support/custom-attributes-2446/
But it doesn't really seem to help me. Not sure if I misunderstood the %{script:<scriptname>} part mentioned. Where would I put that in the above scenario, if that would even work in this case?

Any other ways to achieve what I'm trying to do?

Thanks
#60
Hi,

When trying to look at previous values in DCI tables, I only see the options "Line chart", "Bar chart" and "Pie chart".
For a standard DCI, there is also a "History" option which doesn't seem to exist for tables. Considering the data is already stored in the database, it would be nice to be able to access it using that "History" as well. For example certain values may be strings, which won't really work with any of the charts. So without "History", there is really no way of looking at changes over time.

Cheers