Bug in DCI Data Recalculation when using relative periods and NXSL time() func.

Started by fbu, April 24, 2020, 01:27:13 PM

Previous topic - Next topic

fbu

Dear NetXMS team,

I seem to have found a scenario related to DCI Data Recalculation, during which the server (Version 3.2-400) does not behave as expected. The scenario is as follows:

For some of my DCIs I use secondary DCIs, which feature a longer retention period and a bigger polling interval, to avoid storing data with fine granularity over a long period of time.

These secondary DCIs use different aggregation scripts, based on what kind of data the primary DCI returns. The most common script is a simple average calculation, like this:

return GetAvgDCIValue($node, FindDCIByDescription($node, "gateway.RTT"), time()-3600, time());

You can see that (with a polling interval of 1 hour) the secondary DCI simply returns the average value of the primary DCI for the last hour.

This method works perfectly fine, as long as the server is running.

And this is where the problem comes in:

If for some reason, the server was down and the secondary DCIs could not poll any data (the primary DCIs still gathers data, as it is coming from a NetXMS Agent, which stores the data on its local DB during server downtime), I want to use the DCI Data Recalculation feature, to recalculate the secondary DCI data, based on the data that the primary DCI could still gather during the downtime via the Agent.

However, it seems like the NXSL function time() always returns the actual current timestamp, instead of the timestamp of the value to recalculate.
Just to make it a little clearer: Imagine the current timestamp is X and the server should recalculate a DCI value at the timestamp Y (with Y < X). The transformation script shown above is supposed to calculate the average of the primary DCI for the period of one hour before timestamp Y and not X.

This behavior leads to the situation that for every recalculated value of the secondary DCI, the time() function returns timestamp X, meaning the recalculated average value will always be the one for the latest one hour period (relative to actual present), which is not correct.


One way I could see to resolve this problem is to have an option for the DCI Data Recalculation, which makes the server always take the timestamp of the value that is currently being recalculated and (temporarily) set this as the current timestamp, which will then be returned by the time() function, just for the process of data recalculation.

Additionally, I could also think of an option, which allows to "fill gaps" in the data (e.g. caused by server downtime). Maybe by checking the polling interval and generating the missing values. If no RAW values are present, the transformation script could still be executed. For a script like the one above, this would still return values to the final data.


Hopefully, these thoughts will be helpful to you.

Thank you and kind regards,
fbu

Filipp Sudanov

Not sure, if that approach would suit your situation, but here's an alternative

DCI object has attribute lastPollTime. That's the time stamp of last time that DCI got data. You can read this value in transformation script of your "primary" DCI. When delayed data would be coming to the server, this attribute will show correct timestamp of each incoming datapoint.

The idea is that you can configure your "secondary" DCIs as push DCIs. And implement the logic in the transformation script of the "primary" DCI.

E.g. if a new hour has started, you can calculate the average of the "primary" DCI and push it to the "secondary". Some more logic requred to do it only once - you can use a custom attribute and store last time there.

fbu

While this approach is certainly an alternative to mine, it does not help with the core problem, which is the DCI Data Recalculation:

Unfortunately, you cannot pass a timestamp parameter the functions PushDCIData, which could define where to put the new value in the DCI table. Therefore, it will always put the new value to the actual current timestamp.

When looking at the sourcecode, I would see the following location, at which this problem could be addressed:

In line 114 of src/server/core/dci_recalc.cpp, the recalculateValue(value) function is called, which calls the function transform(value, elapsedTime) in line 2151 of src/server/core/dcitem.cpp.
In this function, in line 999 of the same file (dcitem.cpp), a ScriptVMHandle object is created, using the function CreateServerScriptVM(...). This function CreateServerScriptVM(...) could have an optional timestamp parameter, which would tell the script VM to simulate an execution at the given point in time, defined by the given parameter.

The value for this timestamp parameter could be extracted from the ItemValue &value (which is the current DCI value to be recalculated) of the calling function, using value.getTimeStamp().

This approach, however, would require refactoring the implementation of NXSL functions, like time(), GetDCIValue(...) or GetDCIRawValue(...) as their behavior should then be adapted, accordingly.

(I'm sorry, if I have some weird ideas, which might sound reasonable to me but not for others  ;D)