Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - paul

#31
Agreed - status polling hook is the better place. My config polling is once per hour - much too slow to be useful for this.

There is the PING subagent - 10k is the value when no response - but if I am only using SNMP from central NetXMS server, I don't know if subagent is available.
https://www.netxms.org/forum/configuration/subagent-icmp-ping/ .

But this has already been suggested and advised against anyway.
https://www.netxms.org/forum/feature-requests/icmp-ping-as-internal-parameter/

BUT....the idea of putting them on the containers is good as that gives me a "once-per-hour" refresh for any node that was not picked up otherwise.

It nearly got a lookin here:
https://www.netxms.org/forum/configuration/simple-ping-monitoring/
But - it is not explained how NetXMS is remembering that a node is down and not to generate a new SYS_NODE_DOWN and when to generate a SYS_NODE_UP event.

It was even closer here:
https://www.netxms.org/forum/general-support/trying-to-under-stand-polling/
But went off track by advising to use $node->status rather than advising what the internal variable is used. $node-status has 8 possible values - none of which is down - the whole reason users keep asking. This error was repeated here:
https://wiki.netxms.org/wiki/Step_by_step_service_monitoring - again, just because a node is critical, does not mean it is down.

As suggested - I will use the Status Polling hook - runs every minute, and I will just have to persevere trying to identify the internal variable NetXMS uses when ICMP response = none and where NetXMS stores the trigger for SYS_NODE_DOWN and SYS_NODE_UP.

My biggest problem - dynamic IP addressed devices - I am turning off DiscoveryViaTrap = yes - and let DNStoIP on status polling update the IP - will see if once the IP is changed, the node comes back up.

**** Update ****
I exported all my duplicates and just deleted them - will come back to them later.

The status poll hook is working - working fine - but.......
I have put my Nodes that have issues into Maintenance but my Dashboard severity filter for Status Map does not allow me to exclude Maintenance.
I will have to use unmanage which prevents polling - and prevents the Node clearing the nodeUpDown as there is no polling.

For the purpose of pure up/down monitoring - Dashboard view - either the node is unmanaged (no polling) or it is polled and the up/down status shows.

Will see how this goes over the next week - I think this might be close enough.

**** Update 2 ****
Dashboard view was showing the unmanaged devices even though Unmanaged was unticked :(
Thinking back - NetXMS likes specifics - the unmanaged nodes were also in maintenance.
Updated each node - leave maintenance - disappeared from the display (as hoped but not as expected) :)
Finally - working as desired.
************












#32
Thanks for the feedback.

As you can see in the attached CPU usage chart, apart from the first one that happened at around 16:00, they all seem to start around 10:00 each day.

The last one that required a service bounce was on the 23rd. On that day, I changed some of the Events to have the "stop processing" ticked.

My daily backup starts at 14:10 so it is not that.

I dug out more details from Events Processed and can see that there is a spike when the problem seems to start. Looks like "something" triggers a dump of events at around 10:00 (the high Event Processor queue) of which NetXMS tries to process - getting through about 600 per minute - but due to the where volume of events, simply cannot get through them fast enough.

I will progress through the rest of the Event Processor policies and change them to "Stop event processing" as even though I do have some events that have multiple policies - I do not have any event that has two policies that actually do anything.

I would have thought that, from a speed perspective, that NetXMS knows which policies are triggered for each event and is not doing a full Event Processor Policy scan of every policy for each event - if it did, no wonder it grinds :( - but does not explain my issue - my issue seems to involve a bulk dumping of events to process.

Will put an exception trigger of 500 for events processed in last minute - must exceed this for 5 minutes - and see if I can pick it up this way.
#33
I have set ResolveDNStoIPOnStatusPoll to yes to try and get NetXMS to pick up when the IP has changed however I have also got UseSNMPTrapsForDiscovery turned on.

Before Status polling can update the IP address when it does it's next status poll, the device with the new IP has already sent a trap - causing NetXMS to add a new duplicate node via Discovery.

I could set NetworkDiscoverMergeDuplicateNodes to yes - except that deletes the old Node with all the old node DCI history - which I want to keep.

Is there a way to delay the Discovery processing of traps for 5 minutes till status polling has had a chance to update the IP address?

Alternatively - and as a better solution - a simple option - RetryProcessingTrapsFromUnknownIPafterxxxMinutes - which allows Status Polling to update the IP of the existing Node and when xxx has passed (5 minutes as the default) - the trap is then properly associated to the correct device, no discovery is triggered, and no duplicate is created.

I cannot turn off UseSNMPTrapsForDiscovery as that I find the thousands of devices that are turned on only a few times each year - all with dynamic IP's.
#34
Happened again today - 250,000 queued - another NetXMS restart needed.

Appears that the queue is slowing dropping - about 1,000 per minute - do not have 4 hours to wait for NetXMS to catch up.

An extra 2 cores and 4 GB memory did not help. 

Can see, consistently, CPU usage climbs 15-20% when "grind-to-a-halt" starts.

threads before restart = 491 CPU average = 23% - Events queued 250,000
threads after restart = 482 CPU average = 5% - Events queued zero.

Screen prints attached.
#35
I have my nodeUpDown custom variable auto binding and unbinding working - except for the last piece - updating this custom variable when status polling.

I do NOT want to just use the event and do it via event processing - Event Policy processing can get behind so this way avoids waiting - it updates what we look at immediately.

I want to add to the Status hook the setting of the nodeUpDown based on the ICMP ping response.
If I can ping the device - set it to up. If I cannot ping it, set it to down.

Something like this below added to the status polling hook script - once I know the variable that holds the ICMP response.

if (($node->icmpresponse == "no")) {
   newstate = "Down";
   oldstate = GetCustomAttribute($node, "nodeUpDown");

         if (newstate != null)
            {
             if ((newstate imatch "Down")) 
                 {
                  BindObject(FindObject("AllDown"), $node);
                  SetCustomAttribute($node, "nodeUpDown", newstate);
                 }
   }
if (($node->icmpresponse = "yes")) {
   newstate = "Up";
   oldstate = GetCustomAttribute($node, "nodeUpDown");

         if (newstate != null)
            {
             if ((newstate imatch "Up"))
                 {
                   SetCustomAttribute($node, "nodeUpDown", newstate);
                   UnbindObject(FindObject("AllDown"), $node);
         }
   }

}


#36
General Support / Bulk import - device names only?
July 21, 2019, 09:12:09 AM
Does anybody know if I can run the bulk import with device names only?
#37
OK - the idea was well intended - but failed. I could not get the getcustomattribute or the $node->customattribute working.

Working version is as follows:

Create Container called AllDown.

Create the following - do this in the order listed as each depends on the previous being done.

Script onNodeDown

SetCustomAttribute($node, "nodeUpDown", "Down");
BindObject(FindObject("AllDown"), $node);


Script: onNodeUp
SetCustomAttribute($node, "nodeUpDown", "Up");
UnbindObject(FindObject("AllDown"), $node);


Actions
NXSL Script
Name: NodeDown
Script name: onNodeDown

NXSL Script
Name: NodeUp
Script name: onNodeUp

Event Processing Policy
Show alarm when node is down
Add under Action ==> Server actions ==> NodeDown

Terminate node down alarms when node is up
Add under Action ==> Server actions ==> NodeUp

Click on the save icon.

All done :)

For auto binding and auto unbind on DCI polling - the following needs to be added to the Hook:ConfigurationPoll

Note: It is based on the nodename being of a certain name - in this case it contains the letter a. The container name is called "alldown" (change to AllDown if using the above as well)

if (($node->name imatch "a")) {
   state = GetCustomAttribute($node, "nodeUpDown");
         if (state != null)
            {
             if ((state imatch "Down")) 
                 {
                  BindObject(FindObject("alldown"), $node);
                 }
             if ((state imatch "Up"))
                 {
                   UnbindObject(FindObject("alldown"), $node);
         }
   }
}


The final part is to add a hook to Status Polling so that a Node that does not respond to ICMP and SNMP (or whatever is yes   isSNMP=yes) to set the custom attribute nodeUpDown to down.

Doing it this way gets me round the need to rely on Event Policy processing - something which, for us, has its own problems.





#38
Interim solution - from 2012  https://www.netxms.org/forum/configuration/event-based-on-resolving-an-alarm-timeout/ - thanks again Victor  :) :) :)

Unless there is a newer better solution, this will likely do it

Create script "OnNodeDown"
SetCustomAttribute($node, "nodeUpDown", "Down");

Create script "OnNodeUp"
SetCustomAttribute($node, "nodeUpDown", "Up");

then create action "execute script" to execute "OnNodeDown" script, and add it in event processing policy to the rule processing SYS_NODE_DOWN event.
then create action "execute script" to execute "OnNodeUp" script, and add it in event processing policy to the rule processing SYS_NODE_UP event.

And then, create a container based on the Custom Attribute "nodeUpDown" - auto bind and auto unbind - based on following
updown = GetCustomAttribute($node, "nodeUpDown");
return ( updown = "Down");


Which should leave me with a container with nodes whose only attribute that made them a member is that they are Down.

If I could - I would add this custom value to the General Object view so I could see - easily - when a node I am looking at is down or up.

My "unknown" nodes that respond to both icmp and snmp - but are not recognized - would benefit from this immensely.



#39
I think I know what is happening and perhaps even what is contributing - but do not know why.

Each night at the moment we get to a point where Events themselves come in and are registered as events but Event Processor starts to grind and Alarms stop being generated - slowly coming through hours later.

A  sh q gives me 330042 items in the Event Processor queue - it is clearly backing up.

Restarting NetXMS core works - it picks up those 300k events and processes them - clearing that queue - but not the ideal way to fix.

There seems to be no settings or options that would provide Event Processing with extra capacity to perform Event Processing Policy.

My mitigation thoughts - reduce number of alarms - but what else can I do?

as a bare minimum - can I add Event Processor as a DCI so I can set a threshold on Event Processor queue - at least let me know when I am in trouble!!
#40
General Support / Re: Moving NetXms to new server
July 20, 2019, 09:38:58 AM
Well done - good outcome. :)
#41
General Support / Re: Moving NetXms to new server
July 19, 2019, 07:32:26 AM
I assume you meant that you now have a new server with 2.2.16 installed and want to export from old (WIN &) to the new server?

If it was me, I would install same version NetXMS on new server, export from old server and import to new server. No schema issue as same schema.

Once imported to new server - update new server from old version to 2.2.16. From memory, if there is a schema change - the command is "nxdbmgr upgrade"

https://www.netxms.org/documentation/adminguide/upgrade.html

I don't think you can go straight from old schema on one box to new schema on a new box which is what you appear to be doing.

As for backup - export is different than standard database backup using MySQL backup commands.

I successfully moved server - twice - on Windows - using same version of NetXMS each time (export import). My NetXMS updates have all been done "in place" before any moving.

Hope this helps.
#42
Need to remember - wait long enough till polling occurs again.

My unknown container works fine based on the above.

What was my "unmanaged" solution? - I just select the container with all relevant nodes, select nodes tab - sort by status - all my unmanaged nodes found - easy as pie :)

Now, a container for those nodes that communication is down - which is not one of the available status codes - that is going to be tricky.
#43
Finally back to this and stuck at my first container - unknown only.

My autobind / autoremove is as follows - but is not adding nodes that are unknown only.

How did I manage to get this wrong :(

return $node->status == "5";
#44
General Support / Re: TCP proxy functionality
July 16, 2019, 01:30:39 PM
And here I was thinking my excessively complicated PowerShell to create a service desk ticket was just me (which it still largely probably is).

Switch management in remote networks is potentially on my horizon, so good to hear that a) somebody else got it working and b) they used PowerShell.

My only hard requirements were SSH (Securecrt or Putty) and Utilization / Packet loss at a Node level and at an interface level. This functionality should deliver the first and I assume that Utilisation and Packet Loss could / should probably be added to the Network template to give me the second. (add .13 .14 .19 and .20 from iftable)

Once v3 is out, will check out what options are now available.
#45
Now that my create help desk ticket object tool is working (right click on alarm ==> Tool ==> Create Service Desk Ticket), it could be improved by being able to provide a list of choices which a user could chose from then gets passed through from Display Name into Name for that field.

I cannot depend on the user to type upper / lower case text that I want to populate that are case and wording specific -  I just want them to choose from a list.

CA / Broadcom Service Desk Manager is pretty pedantic with its numerous mandatory fields and any spelling or case mismatch means no ticket created.

Depending on which support group I would assign the ticket to, I would like to present a list of options on who to assign the ticket to.