Deleted objects return after NetXMS core restart

Started by TSimmonsHJ, February 19, 2020, 06:16:36 PM

Previous topic - Next topic

TSimmonsHJ

Hello,
We are seeing an issue where when we delete nodes, subnets, and/or zones from NetXMS, they return after the NetXMS core service is restarted. We are running version 3.1p3, though I can't say exactly when the issue started. I've run the nxdbmgr check -f command it it resolved some issues, but the issue still occurs. What can I check? Thanks!

Filipp Sudanov

Try checking server log either for SQL query error messages or messages with tag [obj.sync           ] saying something like "Unable to delete...". If you see nothing, try increasing debug level to 6.

TSimmonsHJ

We keep logging set at 0 generally, so I turned on debug 6 and deleted a set of objects that are known to return. I let the debug run for a few seconds after that as well, and don't find any references to what you mentioned in the log. We generate a lot of logging very quickly, so that few seconds of debug 6 made a 4MB log, making it challenging to leave it running for a while. Would that be occurring randomly, or right at the time of delete?

Filipp Sudanov

You can change logging level for particular debug tags, e.g.:
nxadm -c "debug obj.sync 6"
Then you can check current status of logging
nxadm -c "debug"
And reset logging to default level for particular tag:
nxadm -c "debug obj.sync -1"

"SQL query failed" are errors, so should be visible on debug level 0, so we assume there is no such errors on your system.

obj.sync messages are delayed, should happen within about a minute after deletion on the front-end. Try setting debug level 6 to obj.sync and see what's going to be in the log.

TSimmonsHJ

That worked quite well, thank you. This is what we're seeing:
D:\NetXMS\log>find "delet" netxmsd.log
.... (snipped)
2020.02.20 20:00:27.827 *D* [obj.sync           ] Unable to delete object with id 81066 because it is being referenced 2 time(s)
2020.02.20 20:00:27.827 *D* [obj.sync           ] Object 23 [81067] marked for deletion
2020.02.20 20:00:27.827 *D* [obj.sync           ] Unable to delete object with id 81067 because it is being referenced 2 time(s)
2020.02.20 20:00:27.827 *D* [obj.sync           ] Object 24 [81068] marked for deletion
2020.02.20 20:00:27.827 *D* [obj.sync           ] Unable to delete object with id 81068 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object 25 [81069] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81069 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object 26 [81070] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81070 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object 27 [81071] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81071 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object 28 [81072] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81072 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object DEFAULT_VLAN [81073] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81073 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object VLAN4 [81074] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81074 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object VLAN7 [81075] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81075 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object VLAN8 [81076] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81076 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object VLAN9 [81077] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81077 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object VLAN10 [81078] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81078 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object VLAN41 [81079] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81079 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object VLAN55 [81080] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81080 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object VLAN85 [81081] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81081 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object VLAN92 [81082] marked for deletion
2020.02.20 20:00:27.828 *D* [obj.sync           ] Unable to delete object with id 81082 because it is being referenced 2 time(s)
2020.02.20 20:00:27.828 *D* [obj.sync           ] Object VLAN93 [81083] marked for deletion
.... (snipped)

There are 1566 matching lines generated over the course of maybe 30 seconds, so I think it's safe to say you're onto something here. I'm not sure what to do about it, though. :)

Filipp Sudanov

It's a rare bug that is hard to replicate. Would it be possible that you compile netxms server from sources? If yes, we could prepare a patch that could do deeper debugging in this situation.

TSimmonsHJ

I can certainly try! What dev studio do I need for that? Eclipse? I've only worked in Visual Studio and some Borland stuff ages ago.

Filipp Sudanov

Ah, right, you have server on Windows. For Win it too complicated to set up the compiling environment. Let's get back to this next week, we will try to make a test build that should be able to get some more debug information.

TSimmonsHJ

Alright, that sounds good to me! Thanks for all so far.

Filipp Sudanov

Developers are planning to rewrite the part of code that is responsible for that bug. So it makes no sense to debug the current code as it is, let's wait for a new version.

TSimmonsHJ

Hey that's great news. Do you happen to have a time-frame for when we can expect to see that released?

Victor Kirhenshtein

We just published 3.2 which is current stable release now. I think we will made those internal changes in 3.3, so April seems reasonable.

Best regards,
Victor

TSimmonsHJ

Ok, that's good to know. We're in something of a rough state at the moment with our install due to this issue (things don't move around properly, zone GUIDs are duplicated because we deleted one and added one and the original didn't delete properly, can't remove old cruft) and think we might be better served starting over fresh. Is there an easy way to re-initialize the database and start over from scratch that doesn't involve deleting the DB? Another team manages our databases and if we can do this ourselves it'd be easier.

Victor Kirhenshtein

You can just delete all tables in the database and re-run nxdbmgr init. You can delete all regular tables with attached SQL. Tricky part is to delete all idata_ and tdata_ tables. One option is to use command like this to generate SQL file (Postgres version):

psql -c "SELECT id FROM nodes" | grep -ve '[(i-]' | sed -e 's/\s*\([0-9]*\)/DROP TABLE idata_\1; DROP TABLE tdata_\1;/' > drop_data_tables.sql

and then run drop_data_tables.sql

Best regards,
Victor