Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - paul

#1
I have the following hook in my NodeUp and an equivalent in my node down. Apart from it being the longest way of doing this (first one I found when searching), I get a strange result.

Everything is correct - except - the month is out by 1. It reports month as being last month, not this month. I know the local time is correct(or bein extracted correctly) because I am comparing the Node_Up and Node_Down event times to the custom attributes - and everything matches  - exactly - except the month - which is out by one.

SetCustomAttribute($node, "timelastcameup",localtime(time())->mday.".".localtime(time())->mon.".".localtime(time())->year.", ".localtime(time())->hour.":".localtime(time())->min.":".localtime(time())->sec);

Any ideas?
#2
The Admin guide is helpful for NetXMS settings, but I think my constraint is likely Postgres given I have a stock standard install - default settings.

We have 2000 nodes, 50k Objects, 70k DCI's and 150 EPP's. Once we get above 300 events per minute we start backing up.
Maximum backlog = 113K events - taking 4.5 hours from Event creation to Alarm creation.

I tried https://pgtune.leopard.in.ua/#/ and got the following:

# DB Version: 9.6
# OS Type: windows
# DB Type: oltp
# Total Memory (RAM): 12 GB
# CPUs num: 6
# Data Storage: ssd

max_connections = 300                      (currently 100)
shared_buffers = 512MB                     (currently 128MB)
effective_cache_size = 9GB                 (default - unknown)
maintenance_work_mem = 768MB      (default - 712MB)
checkpoint_completion_target = 0.9
wal_buffers = 16MB                           (default 16MB)
default_statistics_target = 100
random_page_cost = 1.1
work_mem = 4466kB
min_wal_size = 2GB
max_wal_size = 4GB
max_worker_processes = 6
max_parallel_workers_per_gather = 3

Other than going with the above, anybody have any other suggestions as to which Postgres settings impact or assist NetXMS the most?

#3
Feature Requests / Just a comment on NX-1642
July 27, 2019, 05:41:41 PM
Both BIG-IP F5's and CITRIX Netscaler's imbed the ascii equivalent of the user defined field name as the trailing oids.

You can see this in the walk data vs the text version
ASCII
47.67.111.109.109.111.110.47   
CHAR
/Common/

Once I had worked that out, monitoring F5's and Netscaler's became much, much easier. Having it indexed would be even better.

Lots of other things are like this - CISCO blade centre blades and thinks like that.

From the archives: - ascii to char conversion
https://ee.hawaii.edu/~tep/EE160/Book/chap4/subsection2.1.1.1.html

I use CITRIX now, but for F5 snmp monitoring - it is  reasonable well documented here ==?  https://techdocs.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/bigip-external-monitoring-implementations-13-1-0/13.html

The only problem is that the mib with the indexed oids are in F5-BIGIP-LOCAL-MIB.txt - which they do not explain how to do the conversion - but they do explain everything else.

The challenge here is that we selectively monitored the server / vserver / node tables for specifics such as UP or DOWN and if membership was 100%.

Perhaps include a SNMP wizard that walks the mib, builds the various tables (ltmPoolMember is one example of a table we used), allowing the user to select which items in the table to monitor.

Our way?  - walk the  whole mib - copy our virtual server name / paste into ascii converter - then copy the ascii string - paste it onto the end of the OID for that table that we wanted to monitor -

#4
I have set ResolveDNStoIPOnStatusPoll to yes to try and get NetXMS to pick up when the IP has changed however I have also got UseSNMPTrapsForDiscovery turned on.

Before Status polling can update the IP address when it does it's next status poll, the device with the new IP has already sent a trap - causing NetXMS to add a new duplicate node via Discovery.

I could set NetworkDiscoverMergeDuplicateNodes to yes - except that deletes the old Node with all the old node DCI history - which I want to keep.

Is there a way to delay the Discovery processing of traps for 5 minutes till status polling has had a chance to update the IP address?

Alternatively - and as a better solution - a simple option - RetryProcessingTrapsFromUnknownIPafterxxxMinutes - which allows Status Polling to update the IP of the existing Node and when xxx has passed (5 minutes as the default) - the trap is then properly associated to the correct device, no discovery is triggered, and no duplicate is created.

I cannot turn off UseSNMPTrapsForDiscovery as that I find the thousands of devices that are turned on only a few times each year - all with dynamic IP's.
#5
I have my nodeUpDown custom variable auto binding and unbinding working - except for the last piece - updating this custom variable when status polling.

I do NOT want to just use the event and do it via event processing - Event Policy processing can get behind so this way avoids waiting - it updates what we look at immediately.

I want to add to the Status hook the setting of the nodeUpDown based on the ICMP ping response.
If I can ping the device - set it to up. If I cannot ping it, set it to down.

Something like this below added to the status polling hook script - once I know the variable that holds the ICMP response.

if (($node->icmpresponse == "no")) {
   newstate = "Down";
   oldstate = GetCustomAttribute($node, "nodeUpDown");

         if (newstate != null)
            {
             if ((newstate imatch "Down")) 
                 {
                  BindObject(FindObject("AllDown"), $node);
                  SetCustomAttribute($node, "nodeUpDown", newstate);
                 }
   }
if (($node->icmpresponse = "yes")) {
   newstate = "Up";
   oldstate = GetCustomAttribute($node, "nodeUpDown");

         if (newstate != null)
            {
             if ((newstate imatch "Up"))
                 {
                   SetCustomAttribute($node, "nodeUpDown", newstate);
                   UnbindObject(FindObject("AllDown"), $node);
         }
   }

}


#6
General Support / Bulk import - device names only?
July 21, 2019, 09:12:09 AM
Does anybody know if I can run the bulk import with device names only?
#7
I think I know what is happening and perhaps even what is contributing - but do not know why.

Each night at the moment we get to a point where Events themselves come in and are registered as events but Event Processor starts to grind and Alarms stop being generated - slowly coming through hours later.

A  sh q gives me 330042 items in the Event Processor queue - it is clearly backing up.

Restarting NetXMS core works - it picks up those 300k events and processes them - clearing that queue - but not the ideal way to fix.

There seems to be no settings or options that would provide Event Processing with extra capacity to perform Event Processing Policy.

My mitigation thoughts - reduce number of alarms - but what else can I do?

as a bare minimum - can I add Event Processor as a DCI so I can set a threshold on Event Processor queue - at least let me know when I am in trouble!!
#8
Now that my create help desk ticket object tool is working (right click on alarm ==> Tool ==> Create Service Desk Ticket), it could be improved by being able to provide a list of choices which a user could chose from then gets passed through from Display Name into Name for that field.

I cannot depend on the user to type upper / lower case text that I want to populate that are case and wording specific -  I just want them to choose from a list.

CA / Broadcom Service Desk Manager is pretty pedantic with its numerous mandatory fields and any spelling or case mismatch means no ticket created.

Depending on which support group I would assign the ticket to, I would like to present a list of options on who to assign the ticket to.





#9
Wondering if anybody else has hit this problem - I am pretty sure I am not the first - and was wondering what I am missing.

Trying to add to my expanding collection of Object tools / Alarm tools - but having some bad luck with Alarms. I want an Alarm Tool to create a Help Desk ticket via PowerShell but NetXMS is not acting as expected.

I do not think it is the extremely limited documentation - I think it is an inconsistency in parameter availability.

The command below is designed to get the Alarm message either in its entirety, a specific variable, or even the Event Message, the severity and the time the alarm occurred.

The only documentation that half helped is this - https://netxms.readthedocs.io/projects/admin/en/latest/event-processing.html?highlight=message#event-processing-macros
as well as section 23.2.1 of the 2.2.11 admin guide (which lists in a table the fields available - but are for Event Processing Policy only??)

The problem is as follows:
1. Time of the alarm is not passed. %t is empty
2. %m passes the first word only.
3. NetXMS passes the rest of the message as variables (no idea what name) using a blank as the separator. I have added additional variables just to pick up them up.
4. Fortunately, using PowerShell names variables, the input variable is passed successfully, even if PowerShell has it last in its list.
5. I can export the Alarm as CSV and Message field and Created Time are correct and complete - exactly what I want - but seemingly impossible to achieve.
6. An alarm that has special characters such as / % , etc. cause NetXMS to hang the execution - either never finishes or terminates with blank output.
7. It would appear %A would do it - but that is empty - tried that as well.
 
This can be reproduced by anyone as per below. I changed %t to %c as %t caused the script to hang - using %c allows me to see that the -c Code  variables is working.

My Object tool command (local script): It has one input - variable name  is pass. The command does produce output 9tick the box)
C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe c:\scripts\create-inc-from-alarm.ps1 -device  %n  -sev %s -tyme %c -pd %(pass) -msg %m

My PowerShell is as follows:

Param(
[string]$device,
[string]$msg,
[string]$tyme,
[string]$sev,
[string]$m1,
[string]$m2,
[string]$m3,
[string]$m4,
[string]$m5,
[string]$m6,
[string]$m7,
[string]$m8,
[string]$m9,
[string]$m10,
[string]$m11,
[string]$pd)


write-host  " Alarm on $device with msg $msg $m1 $m2 $m3 $m4 $m5 $m6 $m7 $m8 $m9 $m10 $m11 that occured at $tyme with severity $sev using pass $pd "



#10
I added an Object Command that issues command ping %n . I have two versions, a local command and a server command. Both have the same fault.

On a normal node it works
On an unmanaged node it works
On a node in maintenance node it works.

On a node that I have added but does not respond to anything, and has yet to respond to anything, I get the following response "Bad parameter SNMP".

Why does it not just simply run the command instead of checking SNMP, which at this point in the discovery process, has yet to succeed.

Output:

Bad parameter SNMP.


*** TERMINATED ***
#11
I have created a snmp get script - line for line out of the admin guide - tweaked with clearer answers.

The problem is - if a node is unmanaged, create transport fails and I am unable to run the script.
If it is in maintenance - it is fine.

Therefore the question is - if unmanaging a node prevents me from creating a transport, is there an alternate option to check snmp response whilst node is unmanaged?



transport = CreateSNMPTransport ($node);// Create SNMP transport for node
println $node->name;
if (transport == null)
{
  println "Failed to create SNMP transport, NetXMS is very unhappy, DONT poll unmanaged nodes exit";
      return $1;
}
value = SNMPGetValue(transport, ".1.3.6.1.2.1.1.1.0");
if (value == null)
{
    println "Failed to issue SNMP GET request as no SNMP response - might be community, might be connectivity - try pinging the node first!!";
       return 2;
       }
       else
       {
         println "SNMP request responded to our request for sysdescrt - and here it is ::" . value;
         return 0;
}
#12
When I status poll an unmanaged node, I get a "Node is Connected" but node status remains unmanaged - however - what I am looking for is if the node is now in a state that I can manage it again. The Status poll does not tell me that - at all.

I went low tech and simply added a tool called ping - local command - ping %n - and I use that as my status poller instead. I have a second one which pings from NetXMS server which I use alternatively, depending on where I think the issue might be.

Next thing is to add is a snmpget of the sysdesc to the tools so I can see if I get SNMP responses as again, Status polling does not trigger polling if it thinks node is unreachable.

#13
When using the filtering option in alarm log (right click on node does not have this option - it has alarms which is already available in Object Details), it starts to search from the first character typed.

Unfortunately for me, with 3500 nodes, this then hangs my NetXMS console till it can populate the list based on that first character - a pointless exercise considering the time it wastes.

Of all the things I like and dislike - this is the one thing that all users get really annoyed about and they vent at me to which there is nothing I can do.
Either let me entered the source node freehand completely, or let me enter it partially and then invoke search. Using the last search is great when adding traps and events, but for source, when there are over 3000 entries in the tree to traverse and then for me to select from - when that is not near enough anyway, is a nightmare.

Is there a setting somewhere where I can turn off "search once first letter is typed" or can I set numberofcharstypedtobeginsearch to 3 instead of 1?
#14
General Support / Alarm Key bug
June 21, 2019, 07:52:16 PM
I have set the key on a bunch of alarms to be %n_%4 which in theory means a trap==>event==>Alarm should increase the count when the same trap from the same node comes in. For one particular message I get, it is not working - I get a new alarm every time.  :(

The value of %4 is 247 chars as per below.
Warning [21/06/2019 BT2 3, 21/06/2019 BT6 3, 21/06/2019 BT6 12, 21/06/2019 BT7 5] - Failed to retrieve image: 'RISA Harness image /8557', exception: The remote server returned an error: (550) File unavailable (e.g., file not found, no access).

I have tried different variations at the start of the key of which I can see that, every time, the value in the KEY field gets truncated every time to 256 chars - and count does not go up.

It would not be so bad if the truncated field key was compared as that would be enough to match - but for some reason, it is not.

I exported the keys from Alarm Log view into a csv - pasted them into notepad ++ - ran compare - both keys were the same.



#15
When looking at status codes, I am forced to rely on a status of critical via a node_Down alarm. A node being down is both a status and a state of which I would like to know, separately, if a node is down. We have plenty of nodes with critical alarms, but I would like to show down as a unique code / colour / status. 

https://wiki.netxms.org/wiki/NXSL:NetObj - add it as Status ID = 9.

I hit this problem as I have nodes that are not down, their only fault is that they have an outstanding Node_Down alarm - and the node is not down.

By having this option, I could also set to auto-clear any Node_Down alarms where status NE down or "Status ID == 9".

I can also have a specific container for Status ID = 9.

#16
I am setting up traps ==> events ==> alarms and Event Description would be the logical place to put the OID text and parameters as returned from the OID lookup when adding the trap and pass it through to Alarm Browser as %D for event description similar to <DCIDescription> is passed for threshold alarms.

Apart from the fact you cannot tick a box that says "create event from trap OID" that would do it for you, and the fact that the mib itself tells you the OID names it will pass yet you need to look up the OID number in order to search; the text of the trap from the OID description that I add to the event is not visible anywhere!!

Event Log does not show it. Alarm Browser does not show it. Event Configuration does not have it as a variable to be able to display or insert. I have to open up event configuration in order to see a description of the event.

This to me seems like I am missing something obvious!! Hence my thinking this has to be a really dumb question.

OK - so you think that was dumb - what about this....

I add the trap to the trap processor. I do not need to add anything to the Event Configuration other than it being a unique match as all variables get passed through to Event Processing Policy. Description is pointless a it is not visible - so why do I spend any time with event processing when dealing with Traps other than to pick out the description of the problem from the trap configuration and include that in the Message - the same way all other events have been set up - except no point in adding Description as it is not available for display.

Except - and here is the really dumb bit - the fact that the Description is kept and stored against the event - it probably is available - I just cannot find it.

I have looked at the macro variables and found nothing there - so why go to all the effort to add the Description to the event - which explains what parameters are provided and why, but then make that text / field not available anywhere else?

It really just seems that when adding trap Processing, an event could be autogenerated based on the MIB severity, using the MIB name for that alert - and Message -  and we could skip straight through to Event Processing Policy to setup how the alarm looks. Am I missing something - again?
#17
Used the sizing spreadsheet - average row size - with 700 nodes, 20 DCI's per node = 14,000 DCI's, 300 sec polling (CPU per second but file Systems 1 per 15 minute - so averaged) - with a 365 day retention = 124GB. Not everything will have 365 day retention - but - taking my worst possible case.

Based on the following post - I do not appear to be pushing NetXMS so
https://www.netxms.org/forum/general-support/netxms-cpu-usage-high/msg24645/#msg24645

I started with PostgreSQL and unless anyone suggests otherwise, I will go with PostgreSQL - on same server as NetXMS - with 150'ish GB of SAN based SSD storage, a couple of cores and a chunk of mem (plenty of both an a B200 M3).

Suggestions / Feedback welcome.


#18
After completing my first two traps in just over an hour, the prospect of doing the same for the 34,570 traps in my existing NMS was going to take forever - literally. Even if I limited it to the 3,000 critical was going to be too many - plus there is the 900 odd specifically customized as an absolute minimum was still way too many.

And then it hit me.....

NetXMS treats all PDU's / Varbinds the same - exactly the same!!

All I needed to do was to take the "Unmatched trap" event and turn it into an Alarm. I simply added an Alarm definition based on that event - passed the varbinds across - and hey presto, all the traps I could ever need to configure into readable alarms now generate readable alarms (99% readable - transformations such as time ticks to Time format do not work - obviously).

For those traps I want to format specifically, I can simply define them individually, but for the most part, I can simply use the default.

The only real problem with doing this is that NetXMS passes the PDU's / Varbinds as %2 rather than as individual parameters.  I had a look at snmptrap.cpp to see if there were other variables with the individual values, but, for unmatched traps, it does not populate the PDU's into variables.

If this was changed so that unmatched trap returned each of the PDU's / Varbinds / OID's from the trap individually rather than as %2 - this solution would mean that  I would probably never have to define another snmp trap again. I could simply display %1 \n %2 \n %3 \n etc. and see the trap content one line at a time - automatically.

If I was able to have the OID of the trap converted into the trap description in the MIB - I would never define another trap again. Once I know what trap it is - I can work out pretty much what each PDU means. If this was too hard - I can always define an alert.

The snmptrapp takes the OID of the trap and looks up the trap table to see if it is defined. All it would have to do is look up the MIB table to find the trap description for that OID - exactly as it does when adding a trap processor. Not a complete solution, but, a solution that would work for at least 28,000 of my traps.

I do not consider these as features - they are simply simple options that would make trap processing much easier. Have them as two Server options:
Unmatched_TRAP_as_one_var True False with False splitting up the PDU's into individual
Unmatched_TRAP_lookup True False with True looking up the trap from the compiled MIBs (or a master list of all traps extracted from the compiled MIBs)

Now here is the weird part - looking at the code in snmptrap.cpp - around lines 63 - 136 it looks like there is the capability for auto trap creation already existing.

Given that my cooking is better than my coding and I would not recommend anyone eating my cooking, I cannot say for certain what NetXMS can already do - but it certainly looks like that everything above is already catered for already - except for splitting unmatched traps into individual variables rather than passing as %2. this just happens to be the one thing that would help the most.

Anyway - just thought I would put it out there that trap processing can go from complete nightmare to absolute wonder with just a little lateral thinking - and one simple alarm definition :)




#19
I tried a  Condition container but as there was no Manage event, could not create an enter / exit combination of events.

In Node Attributes, do not have status and do not have a boolean isManaged option. None of the flags or runtime flags have it either.

https://wiki.netxms.org/wiki/NXSL:Node

How do I create a container to hold Unmanaged nodes only?

Even harder -  Container with a list of nodes where there is a DCI that has been set to disabled.
#20
In Alarm Details screen, the Overview section is dynamic, starting at 5 and expanding to 13 before turning into a scrolling section for the remainder.

Related events will expand down if Comments and DCI Data are minimised, but not Overview - which is where the important information is.

Can Overview please be set such that it will display either display up to 50 before turning into a scroller - or have that number be s system variable able to be set by a user.

As per below - a 39 line trap that I need to see all the lines of - especially the last ones. When a SAN controller panics, details, and speed, are critical.

You can also set the default section size of Comments and Related Events smaller so that all are still visible - with the Overview being completely visible.