Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - BillLortz

#1
We rely on the wonderful ExternalParametersProvider feature of NetXMS extensively.   Most of the time the values we are after are relatively static so we often only run the script every 10 minutes.

Unfortunately, when a node reboots we see a race condition where NetXMS declares many of the variables that came from ExternalParametersProvidor as unsupported.   It then disables that DCI variable.

For example, if we have an ExternalParametersProvider that provides license info from a USB Licensing Hasp plugged into the node, it might return a variable "HaspID" and another variable "HaspType".    In the DCI tables for a node, we would create an entry that pulls the Agent Variable "HaspID".     That entry works just fine.   But, after a reboot, many of the nodes that reference that type log alarms and declare that DCI Entry as unsupported.    We then have to individually go to each node and manually re-enable them.    Sometimes, it will generate a new alarm and re-disable that entry a few minutes later.   If we do a configuration poll and re-enable, it often solves the problem until the node reboots.     We usually try to match the polling interval of the DCI entry to the frequency specified in the ExternalParametersProvider function.   For example, if we use 600 seconds for the ExternalParametersProvider definition, we'll also set the polling interval of the DCI entry to 600.

It can be fairly painful after Microsoft Patch Tuesday when 50 machines have rebooted.   we have to go in and re-enable the dci entry on each node and re-poll.

Is there something we need to change in our configuration to solve this issue?

We see this issue on different versions of netXMS.   For example, it occurs on 1.2.17 and on 2.0.3
#2
We often use the DCI Summary tables feature to summarize configuration information.   For numeric data, the summary tables often try to simplify the way large numbers are displayed.   A number like 192123456 will be shown as 192 M.   Unfortunately for us, it treats string fields that look like numbers as numbers.   In our case, we are collecting the serial number of a licensing hasp.    That serial number is 9 digits long and is entirely numeric.    We would like to see the entire serial number.

Is there a way of treating a string object as a string and not trying to simplify the number?    Or is there a way of temporarily turning off the simplification feature for all numbers?

We are on Version 2.0.3 of netXMS
#3
General Support / WMI Query to get disk error count
January 22, 2016, 10:34:45 PM
Just in case it would be helpful to someone else, I'm posting the WMI Query I use to get the total number of disk errors for my hard drive.  I have a system that I stumbled on that had a disk that was failing.   I found the SMART Data was fairly useless, but Windows has a special class that gave me the data I was interested in - MSFT_StorageReliabilityCounter.   If you google it, you'll see that it has a bunch of useful info.   I decided that ReadErrorsTotal would be a good indicator of a problem drive.

Unfortunately, the query tended to return two entries for a given DeviceID that were almost identical, so I arbitrarily chose the one that ended with the text "reliabilitycounter".   The other one on my system had some sort of SCSI Id that I wasn't sure would be consistent.   On my system without any "where" clause, the query returns 4 entries - two for DeviceId 0 and two for DeviceId 1.   I'm assuming those correspond to physical drives 0 and 1.   If you want to monitor another drive instead of 0, just change the "0" in the "Where" statement to the other drive number.   You could have multiple queries, one for each drive on your system.

WMI.Query(root\microsoft\windows\storage,SELECT * FROM MSFT_StorageReliabilityCounter where '(DeviceId=0) AND (ObjectID like "%reliabilitycounter")',ReadErrorsTotal)


This query returns an unsigned 64 bit integer.   I have the frequency set to 3600 seconds and 90 days of history in NetXMS.

I hope this helps someone so that they don't have to go through hours of research to get this data, and they don't waste too much time trying to use the SMART data which declared my drive that has 4248 total read errors as perfectly fine - even though it is barely functioning.
#4
I am very interested in using the centralized upgrade feature of NetXMS.

But, we have a non-standard directory structure where we like to deploy items:

   For the binaries and highly static info, we would like to use c:\programs\NetXms\NetXmsA\ as the path.
   For configuration files, and less static info we like to use c:\programdata\NetXMS\NetXMA\...   (This is where we would put the agent config files)

This structure allows us to keep the root directory clean.   

I tried the upgrade feature and found that it ignored the existing structure and installed everything in default locations and even duplicated the service creation.

Is there some way of specifying alternative locations?   For example, could I edit the npi file and add a /DIR=c:\programs\netxms\netxmsa to specify a different install location.   Since it doesn't appear that you have an install option to specify the config file location, if I were to put a symbolic link in the install location that points to the agent file in its location, would that work?

If there is no current method, could this be considered as an enhancement?

Bill

#5
I am running NetXms 2.0.M5 and have a network of Cisco switches.

The switches all have the Catalyst-generic driver associated with them.

If I view the switch forwarding database of a switch that has a node connected, I see in the node column that the database is showing that node.

But, if I go to the node and select "Find Switch Port", it reports an indirect connection to the wrong switch.   Often the switch it reports has a 2nd node that is communicating with the original node, so the original node appears on the trunk between the switches.   

If I try at a different time with the same "Find Switch Port" on the same node, I may get the error "Connection Port information cannot be found".   I can sometimes resolve this 2nd error by forcing topology polls on all the switches (not just the ones involved), but it doesn't last - eventually it returns to the state that displays the error message "Connection Port information cannot be found"

The switches involved don't directly connect to each other, they have a 10GBit connection to a pair of core switches.

I've attached some screen shots and the topology network map that NetXMS generated (which is accurate).   The example node is highlighted on the topology map.  I've also exported the topology map of the switch that the node is connected to because the screen shots only show a part of the info.    In the file Indirect-Switch.png I've highlighted the indirect entry.  The port associated with that entry is a 10Gbit trunk back to one of the core switches.

The node in this example uses only an Ping for polling, but I have the same problem with nodes that use the full NetXMS agent.

Let me know if you need additional info.

Thanks in advance for your assistance.

Bill

#6
I have a network consisting of Cisco 2960S with 4900M switches at the core.   The 4900M switches are using the GENERIC driver.    Because of this, I believe that it is preventing me from effectively using the "Find Switch Port" commands because NetXMS can't figure out the entire network.

I had the same problem on NetXMS version 1.2.17, but upgraded to 2.0.M5 to see if the switch was supported on the newer version.

What would be involved with supporting this switch?

I'm attaching an Object Details screen print of the switch and an SNMP walk.   The SNMP walk file contains two different walks, the first walk was just using the community name.   The 2nd walk was a vlan snmp walk of vlan 101 produced by appending @101 to the community name.

It appears this switch's snmp info is similar to the 2960S switches which do work.   Is there a way I can trick NetXMS into thinking it is a supported switch while I wait for this switch to be supported?

Thanks in Advance

Bill

#7
I have a site that has about 20 Cisco 2960S switches and a couple Cisco 4900M switches that I have NetXMS monitoring.

NetXMS is using the CATALYST-GENERIC Driver so it appears that the switches are being recognized.

I am on NetXMS version 1.2.17 at this site.

The problem I'm running into is that NetXMS seems to only be aware of some of the MAC addresses on the switch.  If I try to find a switch port, it never reports a direct connection.   It always reports that a given node is indirectly connected to a remote switch.

In researching the problem, I've determined that NetXMS is only storing "dynamic" MAC addresses.   It ignores "static" MAC addresses.   We use Cisco's security features where it learns what mac addresses are directly connected to a port and only lets those addresses on that port.   This prevents someone from plugging in a foreign device that we don't know about.   Apparently, even though those addresses are "learned", Cisco classifies them as Static.

Strangely when I snmpwalk the switches, all the mac addresses appear and I don't see anything obvious that would distinguish between static addresses and dynamic addresses.   When I researched cisco literature, I couldn't even find a reference for using SNMP to look at whether an address is static or dynamic.   So, I'm puzzled at why the static addresses are missing.

Is there are way of configuring NetXMS to store all the addresses?   I would really like to be able to use the "Find switch port" feature on an object to see the local port it is connected to.  Seeing that it indirectly connects to a root switch in the core isn't helpful since all my devices connect to one of the two root switches.

I've attached the following files to demonstrate the problem.   I picked a simple 24 port gigabit switch, that has an additional 2 ports of ten-Gigabit.  We always use one of the Ten Gigabit ports as a trunk to the other switches.   typically the gigabit ports are connecting to servers and other nodes in the same rack.


  • An SNMP walk of the 3 VLANs and typical OIDs used for mapping the MAC table to switch ports.   I used the following reference to get those OIDs:
  • A screen capture from Object Details of the switch
  • An Excel Spreadsheet in which I compared the output of NetXMS's exported Switch Forwarding Database to the list produced by telneting into the switch and running a "show mac address-list".   I sorted both of those items by MAC address and inserted blank lines whenever appropriate to keep the MAC addresses matching.   When looking at that comparison, you can see that all the "static" entries are missing from NetXMS's data.   There are other items missing from NetXMS and from the Cisco output, but I consider that related to timing and just noise because there is no consistent pattern.

Please let me know if you need additional info.
#8
I've been using NetXMS for a while, but recently installed version 1.2.17 on a Windows Server 2012 machine at a site that uses Cisco 2960S switches and 4500 switches.   

I use primarily 3 VLans for most of the traffic and don't use vlan 1.   NetXMS doesn't seem to show any information except for vlan 1.  When I look at the forwarding database, I only see vlan 1 related info.

I know that Cisco uses a modified form of the community string for vlans and have tried that without any better results.

I'm skeptical that the Catalyst driver is loading or working.   I just restarted the NetXMS server and don't see any mention about loading the network drivers.

[07-Jan-2015 01:35:32.010] Log file opened
[07-Jan-2015 01:35:32.026] [INFO ] Database driver "mysql.ddr" loaded and initialized successfully
[07-Jan-2015 01:35:38.711] [INFO ] Listening for SNMP traps on UDP socket 0.0.0.0:162
[07-Jan-2015 01:35:38.727] [INFO ] Listening for client connections on TCP socket 0.0.0.0:4701
[07-Jan-2015 01:35:38.727] [INFO ] Listening for client connections on TCP socket :::4701
[07-Jan-2015 01:35:38.727] [INFO ] NetXMS Server started
[07-Jan-2015 01:35:38.727] [INFO ] Listening for mobile device connections on TCP socket :::4747
[07-Jan-2015 01:35:38.727] [INFO ] Listening for mobile device connections on TCP socket 0.0.0.0:4747

I'm curious if this is a bug in the current version, or something odd about my configuration.

I did search to see how to determine which driver it is trying to use, but the articles are very old and refer to things I can't find on my system.

We don't install NetXMS in the standard "Program files" location for Windows.  We try to keep Windows Operating System stuff separated from the applications we install.   So, we install on C:\programs\NetXMS\NetXMS\  (Yes NetXMS is listed twice).  Within that directory, we have the standard installation (bin, database, doc, etc, lib directories).   Within the lib directory is a ndd directory which contains 19 files (including the catalyst.ndd and some other cisco ndds).

We are using SNMP polling for the switches and seem to be able to get SNMP data in DCI and SNMP Trap configuration.

Could you provide guidance on how to determine which NetXMS driver is in use for a switch?   Also, any other tips on how to debug this issue would be appreciated.

Bill
#9
General Support / Some system DCI parameters unavailable
November 19, 2014, 08:12:00 PM
I have a moderate sized NetXMS installation with NetXMS monitoring 80 individual remote sites plus the local master servers.   Each remote site is basically identical to the other sites.

At only one site, NetXMS is returning "Unsupported" for the "system.uptime" and "system.cpu.usage" DCI variables when it polls.   But, if I right click on the node, and use tools->info->agent->Supported Parameters, both of those parameters appear.   This behavior of declaring them unsupported is recent.

Other system parameters such as system.servicestate and process.count are working fine on that same node.

I've seen Unsupported occasionally appear in what I assume are race conditions on system startup.   In those cases, if I re-enable, restart the agent, and re-poll, it won't reflag them as unsupported.  I tried this for this node and it didn't solve the issue.   I restarted the server and it didn't help.   I used Windows Update to bring the server current and restarted and it didn't solve the issue.

The agent is version 1.2.9.        The server was just upgraded to 1.2.17 a couple days ago, but this problem was happening when the server was 1.2.9.   We haven't had time to upgrade the 80 agents yet.   79 of the 80 are working just fine.   The DCI configuration is managed by templates and not individually for each site, so I don't think there is a typo in the DCI configuration.    I've verified the agent configuration is identical to other sites - we have an internal update management system that pushes out the configuration files which are common to all locations.

I believe that this issue is likely to be a windows configuration corruption issue.    What components of Windows does the agent use to return uptime and cpu usage?   Is it WMI or something else?   At a former job, I occasionally saw WMI corruption on earlier versions of Windows, but don't want to rebuild WMI's database if it isn't involved.     The remote nodes are all running Windows 8.

Any suggestions would be appreciated.
#10
I've been experimenting with nxshell to create an alarm monitoring process.   

After using it for a while, somehow things get into a strange mode where I can not invoke nxshell successfully anymore.   It doesn't matter whether I specify a script or try interactive mode.   

During this time, I don't see any other issues with NetXMS.   For example, I can connect to it with the java management console app successfully.   Even if I restart the NetXMS server, I seem to have the same problem.

To debug, I went into the server console in the management console and typed "show connections" while nxshell was trying to connect.  At the start of the process I see a "CMD_REQUEST_ENCRYPTION" state, then it switches to an "init" state, and then the session goes away.   Eventually nxshell/java returns the error "ncxexception: Request timed out".   the "CLTYPE shown during the failed attempt is "DESKTOP <not logged in> [n/a]".

Once in a while it will work again, but then subsequent attempts fail.   This happened to me yesterday evening and I finally gave up.  This morning I was able to use nxshell for several hours and then this failure started again.   I'm guessing that if I let it sit for a while, things will start working again.

The version of the nxshell and server are 1.2.9    I can't upgrade yet to the latest version because I'll have to update 80 different locations and need to schedule that.   The server is on a different system than nxshell, so I'm specifying its IP address to connect to with the "-D" command.   

Is there some sort of parameter I should be playing with on the server or nxshell to get this to work consistently?  Any ideas would be appreciated...

Thanks

Bill
#11
If we leave the management console running on a workstation displaying something like the alarm browser, things work just fine for several hours.

But, after a long time (maybe a half day to a day), the screen stops refreshing.   Trying to do anything in the console tends to then give errors (typically indicating timeout or non response).

Is there a way of keeping the session Live and functional for very long periods without having someone click on the screen every hour?   This is very helpful when setting up a monitor wall.

Bill
#12
We are trying to create a display wall for monitoring in our office.   The purpose is to highlight the issues.   If a user needs details, they can open up NetXMS on their own workstation.    We find the Alarm Browser and Alarm dashboard contain more info than we would like.    What we would like would to be able to limit which columns are displayed and make the fonts big enough to read across the room.   For example, we would probably limit the display to Source, Message, Ack, and Modify Date.   We would sort on Modify Date in reverse order.

If we also had the ability to display the modify date as elapsed time (ie 1 hour, 2 days, etc.) it would be very nice.

#13
I have a need to monitor the replication status of MySQL on several hundred remote servers.    Initially when a server first is installed I would like to monitor how far replication is since the process can take more than a day.   Once a server is fully in production, I need to monitor some status fields to ensure everything is OK with replication.

I'm on MySql version 5.5.32 and version 1.2.8 of NetXMS.

In MySQL, the way of getting the fields I'm interested is to do the query "Show Slave Status;" which returns a wide table with lots of variables.  Currently, there is no "select" statement that will return the info I need.
 
I've written a powershell script that can launch mysql, perform the query, and grab the output from the table.  I can configure external commands to call the script and get the value into NetXMS.   Unfortunately, external commands and powershell specifically have a fair amount of overhead.    I'd prefer that there is a more direct way within NetXMS to get this info.  The servers that I'm monitoring are very CPU bound decoding and monitoring multiple live camera streams and every bit of overhead added for monitoring the database has the potential of disrupting that process.  That is why I'm trying to lighten the footprint.

I tried using the NetXMS ODBC subagent.   I can get it to execute the query, but the subagent is hardcoded to select only column 1.  I'd like to track columns 6, 11, 12, and 19.

Is there a more native way of using NetXMS to either:

  • return a table using an odbc inquiry
  • or select a column from an odbc subagent inquiry
  • or perhaps execute SQL statements from within an netxms script that would do the query and filter the resulting column
If not, are any of those options being considered in the future?  For example, the ODBC subagent seems to have column 1 hardcoded in it.   If there was an optional way of specifying the column number in the agent config file, it would be helpful.   The new table functionality looks very intriguing - it would seem to me that ODBC queries would be a natural extension because they can return tables.

Thanks in advance.

Bill