Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - paul

#76
General Support / Alarm Key bug
June 21, 2019, 07:52:16 PM
I have set the key on a bunch of alarms to be %n_%4 which in theory means a trap==>event==>Alarm should increase the count when the same trap from the same node comes in. For one particular message I get, it is not working - I get a new alarm every time.  :(

The value of %4 is 247 chars as per below.
Warning [21/06/2019 BT2 3, 21/06/2019 BT6 3, 21/06/2019 BT6 12, 21/06/2019 BT7 5] - Failed to retrieve image: 'RISA Harness image /8557', exception: The remote server returned an error: (550) File unavailable (e.g., file not found, no access).

I have tried different variations at the start of the key of which I can see that, every time, the value in the KEY field gets truncated every time to 256 chars - and count does not go up.

It would not be so bad if the truncated field key was compared as that would be enough to match - but for some reason, it is not.

I exported the keys from Alarm Log view into a csv - pasted them into notepad ++ - ran compare - both keys were the same.



#77
When looking at status codes, I am forced to rely on a status of critical via a node_Down alarm. A node being down is both a status and a state of which I would like to know, separately, if a node is down. We have plenty of nodes with critical alarms, but I would like to show down as a unique code / colour / status. 

https://wiki.netxms.org/wiki/NXSL:NetObj - add it as Status ID = 9.

I hit this problem as I have nodes that are not down, their only fault is that they have an outstanding Node_Down alarm - and the node is not down.

By having this option, I could also set to auto-clear any Node_Down alarms where status NE down or "Status ID == 9".

I can also have a specific container for Status ID = 9.

#78
Nope - it is not there.  :(

https://wiki.netxms.org/wiki/NXSL:Alarm   -    nope

https://wiki.netxms.org/wiki/NXSL:Event   -    nope

As Event Description is not available for an alarm, or even an event, will bypass this potentially fantastic usable option.

Given that for Trap based Alarms, the DCI section is not relevant - its part on the screen would be the perfect place to display Event Description.
#79
For anyone coming from another product with Alarm viewing capabilities that are selectable, this functionality is pretty important if not a show stopper.

for those that have not - you do not know what you are missing.

I currently have 4 Alarm viewers (including NetXMS).

Two have no customising - this includes NetXMS

One has basic column selection but no column ordering - frustrating.

One has everything - absolutely everything. Column section, column ordering, preset filtering, severity filtering, separate settings for Alarms, Events, Explorer(Tree), Interfaces.
The preference setting screen is attached - see for yourself!!.

So why go NetXMS if I have the above? - the other product's DCI equivalent monitoring is appalling and as I can display NETXMS alarms per container and filter in "Outstanding" - I have a minimum workable solution with NetXMS.
#80
I am setting up traps ==> events ==> alarms and Event Description would be the logical place to put the OID text and parameters as returned from the OID lookup when adding the trap and pass it through to Alarm Browser as %D for event description similar to <DCIDescription> is passed for threshold alarms.

Apart from the fact you cannot tick a box that says "create event from trap OID" that would do it for you, and the fact that the mib itself tells you the OID names it will pass yet you need to look up the OID number in order to search; the text of the trap from the OID description that I add to the event is not visible anywhere!!

Event Log does not show it. Alarm Browser does not show it. Event Configuration does not have it as a variable to be able to display or insert. I have to open up event configuration in order to see a description of the event.

This to me seems like I am missing something obvious!! Hence my thinking this has to be a really dumb question.

OK - so you think that was dumb - what about this....

I add the trap to the trap processor. I do not need to add anything to the Event Configuration other than it being a unique match as all variables get passed through to Event Processing Policy. Description is pointless a it is not visible - so why do I spend any time with event processing when dealing with Traps other than to pick out the description of the problem from the trap configuration and include that in the Message - the same way all other events have been set up - except no point in adding Description as it is not available for display.

Except - and here is the really dumb bit - the fact that the Description is kept and stored against the event - it probably is available - I just cannot find it.

I have looked at the macro variables and found nothing there - so why go to all the effort to add the Description to the event - which explains what parameters are provided and why, but then make that text / field not available anywhere else?

It really just seems that when adding trap Processing, an event could be autogenerated based on the MIB severity, using the MIB name for that alert - and Message -  and we could skip straight through to Event Processing Policy to setup how the alarm looks. Am I missing something - again?
#81
In a strange twist - visibility via the tool-tip when hovering over alert in Alarm Browser, does not have the limitation. This gets us around this being a major impact and instead, is an annoyance that we hope gets adjusted at some point.

It just really stands out by having 2 or three events and the rest of that empty, a comments section - also empty, and the Alarm Overview that has to scroll to see the whole alarm.
#82
Went with export / import. 1.4GB export file took about an hour to import - VM built as 4 core 12GB E2680 V4 CPU with all Flash storage - everything was local (iscsi to SAN). Included collected data.

Panicked when I started the console locally and pointed it at the new server - error.:(  Helps to start the NetXMS service first :)

Everything was perfect - the only thing I had to do in addition to the import was to copy the MIBS across.

#83
Used the sizing spreadsheet - average row size - with 700 nodes, 20 DCI's per node = 14,000 DCI's, 300 sec polling (CPU per second but file Systems 1 per 15 minute - so averaged) - with a 365 day retention = 124GB. Not everything will have 365 day retention - but - taking my worst possible case.

Based on the following post - I do not appear to be pushing NetXMS so
https://www.netxms.org/forum/general-support/netxms-cpu-usage-high/msg24645/#msg24645

I started with PostgreSQL and unless anyone suggests otherwise, I will go with PostgreSQL - on same server as NetXMS - with 150'ish GB of SAN based SSD storage, a couple of cores and a chunk of mem (plenty of both an a B200 M3).

Suggestions / Feedback welcome.


#84
Fantastic - thanks for the assist - again.

When I was wandering through snmptrap.cpp, I did see the following which might also be of use in the script.

if ((node->getStatus() != STATUS_UNMANAGED)
#85
After completing my first two traps in just over an hour, the prospect of doing the same for the 34,570 traps in my existing NMS was going to take forever - literally. Even if I limited it to the 3,000 critical was going to be too many - plus there is the 900 odd specifically customized as an absolute minimum was still way too many.

And then it hit me.....

NetXMS treats all PDU's / Varbinds the same - exactly the same!!

All I needed to do was to take the "Unmatched trap" event and turn it into an Alarm. I simply added an Alarm definition based on that event - passed the varbinds across - and hey presto, all the traps I could ever need to configure into readable alarms now generate readable alarms (99% readable - transformations such as time ticks to Time format do not work - obviously).

For those traps I want to format specifically, I can simply define them individually, but for the most part, I can simply use the default.

The only real problem with doing this is that NetXMS passes the PDU's / Varbinds as %2 rather than as individual parameters.  I had a look at snmptrap.cpp to see if there were other variables with the individual values, but, for unmatched traps, it does not populate the PDU's into variables.

If this was changed so that unmatched trap returned each of the PDU's / Varbinds / OID's from the trap individually rather than as %2 - this solution would mean that  I would probably never have to define another snmp trap again. I could simply display %1 \n %2 \n %3 \n etc. and see the trap content one line at a time - automatically.

If I was able to have the OID of the trap converted into the trap description in the MIB - I would never define another trap again. Once I know what trap it is - I can work out pretty much what each PDU means. If this was too hard - I can always define an alert.

The snmptrapp takes the OID of the trap and looks up the trap table to see if it is defined. All it would have to do is look up the MIB table to find the trap description for that OID - exactly as it does when adding a trap processor. Not a complete solution, but, a solution that would work for at least 28,000 of my traps.

I do not consider these as features - they are simply simple options that would make trap processing much easier. Have them as two Server options:
Unmatched_TRAP_as_one_var True False with False splitting up the PDU's into individual
Unmatched_TRAP_lookup True False with True looking up the trap from the compiled MIBs (or a master list of all traps extracted from the compiled MIBs)

Now here is the weird part - looking at the code in snmptrap.cpp - around lines 63 - 136 it looks like there is the capability for auto trap creation already existing.

Given that my cooking is better than my coding and I would not recommend anyone eating my cooking, I cannot say for certain what NetXMS can already do - but it certainly looks like that everything above is already catered for already - except for splitting unmatched traps into individual variables rather than passing as %2. this just happens to be the one thing that would help the most.

Anyway - just thought I would put it out there that trap processing can go from complete nightmare to absolute wonder with just a little lateral thinking - and one simple alarm definition :)




#86
General Support / Re: DCI Template manual override
June 13, 2019, 04:15:57 AM
As per Tursiops and Victor, the Windows template example that I uploaded deals with both customized DCI thresholds with three variations, as well as dropping DCI's for instances not wanted.

For instance exclusion, there is probably a better way, but basically, you can either do the DCI exclusion in the automatic bind coding - I do it there because it globally applies and it would be cleaner to have different templates for devices based on different firmware / software levels that do / do not support all the DCI's. You might have two templates - Microtik with Voltage and Microtik no voltage - selected based on whatever determines that voltage is expected to be returned (Poll the voltage or set a system attribute somewhere - volt_sensor_installed).

Having said that, I use the same global thresholds for Windows and Linux - all variants. I can override per node, and have set global thresholds for things like /var and C:\ - mixing and matching as needed. The only real difference is that Windows devices node specific override is first three chars whereas Linux is full instance name match.

It took me some time to get my head around this - plus the invaluable assistance from Tursiops and Victor - but it is all there and is all working. When I get time (hahaha), I will write up a "how to" for this as there are a few different components that need to be aligned in order for this to work.
#87
I tried a  Condition container but as there was no Manage event, could not create an enter / exit combination of events.

In Node Attributes, do not have status and do not have a boolean isManaged option. None of the flags or runtime flags have it either.

https://wiki.netxms.org/wiki/NXSL:Node

How do I create a container to hold Unmanaged nodes only?

Even harder -  Container with a list of nodes where there is a DCI that has been set to disabled.
#88
We have Vsphere HA cluster and Commvault snapshot based backups.

https://www.veeam.com/blog/why-snapshots-alone-are-not-backups.html

Unfortunately, we do not have the Commvault PostgreSQL agent which means a DIY option.

I have pg_dump working producing compressed files in Directory format so happy enough with that. *** my daily safety net *** 

My uncompressed version produced the same size backup as nxdbmgr export - so happy enough with that as well.

Having a just-in-case seems like a good idea and since I will be smash testing my VM (and host) to see how "recoverable" it is - I want to make sure I really can recover it - no matter what!!

I will start with a snapshot though after first stopping NetXMS and PG :) - my ultimate fall back.
#89

I assume that as an Option 2, I would be to do an export on the original server and do an import on the new server?

An extra step that might be seen as avoidable - but sometimes, getting a file from one server to another can be much simpler and quicker this way.

I have export working, so rather than opening up external connections - copying the SQLite export and importing is looking much easier.
#90
In Alarm Details screen, the Overview section is dynamic, starting at 5 and expanding to 13 before turning into a scrolling section for the remainder.

Related events will expand down if Comments and DCI Data are minimised, but not Overview - which is where the important information is.

Can Overview please be set such that it will display either display up to 50 before turning into a scroller - or have that number be s system variable able to be set by a user.

As per below - a 39 line trap that I need to see all the lines of - especially the last ones. When a SAN controller panics, details, and speed, are critical.

You can also set the default section size of Comments and Related Events smaller so that all are still visible - with the Overview being completely visible.