Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - csharve

#1
General Support / Broadcast Storm
October 06, 2021, 03:42:37 PM
Hello all! I have some newbie questions regarding upgrades. I am new to my role and this is my first experience with NetXMS. Some background. We have been using NetXMS v2.1 and it came down from upper management that we need to get current software up to date, which can be difficult in a production facility! I was tasked with upgrading v2.1 to the most recent version.

After doing some research it looks as if there is no need to make an intermediary upgrade to another version first. I found info on these forums that I can upgrade right to 3.9, is that correct?

Following the Admin Guide, I stopped the NetXMS server and performed "nxdbmgr check". The check returned no errors, everything showed "Passed". Next I ran the "netxms-server-3.9.298-x64.exe" server upgrade. After the upgrade completed the service did not start, so I performed "nxdbmgr upgrade" to the database. When the database upgrade completed, the service started back up on its own and I started to receive NetXMS emails that many of my nodes changed state to UP. All seems good, right? This is where things went downhill.

I performed the management console upgrade next. I went with the default settings, all looked good. I was then contacted by a few people in operations that they were seeing odd system alarms. My facility has 5 different production plants all running their own distributed control system. Every unit that runs the control system (26 total) starting giving the following alarms "Broadcast Storm ended duration 100 seconds". As you can imagine, this raised major concerns! Our DCS vendor told us something is flooding the network and out of fear of the unknown I stopped the NetXMS service, which in turn stopped all the broadcast storm alarms. Obviously, I cannot take the chance of shutting down 5 production plants!

So the question is, what is happening??? I use NetXMS to monitor servers, switches, firewalls, and operator consoles. There are many, many devices that could be added to that list but they are not monitored through NetXMS, but rather through the DCS. Is v3.9 scanning everything across all plants and causing this? Is it a Network Discovery issue? Can I turn that off through a CLI? I don't even know if the management console is up and running because I was forced to kill the server before I could even start the console!

Obviously, the server upgrade was successful and it connected to the database successfully, hence all the emails from NetXMS telling me my nodes were all changing state to UP. I have a fear to even start the server back up to check the console because the alarms will start to come in again and I don't know what issues it could cause (and apparently the vendor doesn't either). We did run a quick test just to verify it was NetXMS, started server up, alarms came in, shut it down.

Of course I do have the option of rolling back to v2.1, can use a recovery point on the SNMP server and I backed up the database prior to upgrading, but I'd like to be able to get up to v3.9. Eventually I have no choice! Database is SQL.

Any suggestions on a course of action? Is my inexperience causing me to miss something here? Unfortunately, my mentor retired 3 years early and it has left me as the only plant resource for OT management. Trial by fire! Thank you for any suggestions!!!