DCI Retention and polling - database size scaling and planning

Started by kghammond, February 05, 2008, 04:42:12 PM

Previous topic - Next topic

kghammond

We are using MS SQL server for the database server for NetXMS.  Do you know how much space each polling interfaval takes up per DCI?  I assume this varies based on int, bigint, float, string (are you using varchar's?).

It would be nice to have a table of the various DCI polling and how much space that will use up for the specified database server.  We are trying to determine how often to poll and how long to set the retention times on our DCI's but it feels like guesswork right now.

We expect to have somewhere around 100 switch ports with at a minimum of in/out and errors for now, and then 30-40 servers with a minimum of network activity, cpu, memory, disk space, disk activty.

Right now we have about 40 GB of RAID 10 drive spaces set aside for NetXMS.

Any kind of guidance you have would help.

Thank you,
Kevin

Victor Kirhenshtein

All collected data stored in varchars, as a textual representation of appropriate data type. I don't now how much space takes each record in database, but can provide some real life statistics.

My own NetXMS server runs on MS SQL 2005, I have 101 node with 1769 DCIs configured, almost all of them configured for collection every 60 seconds and retention time of 30 days (some configured for 3600 seconds and 365 days). Database size now is around 6.5 GB. For your situation, you will have about 400 DCIs for 100 switch interfaces and about 300 to 500 (depending on number of partitions, required parameters, etc.) for 40 servers - so probably your database size for 60 seconds/30 days configuration will be about 4GB in size.

I will try to collect data for some other MS SQL installations and post them here.

Best regards,
Victor


Victor Kirhenshtein

Also, Microsoft has document for estimating table size:

http://msdn2.microsoft.com/en-us/library/aa933068(SQL.80).aspx

NetXMS stores collected data in individual tables for each node, and each table has three fields: two integer and one varchar(255). However, if you collect mostly numerical data, you can assume that varchar field is used for about 10% of it's full capacity.

Best regards,
Victor

kghammond

Thank you for the feedback.  I have put together a quick spreadsheet showing the estimated row size based on a average row size from your database.  I am sure the average row size estimate is probably a little off but it should be useful as a guide in scaling and capacity planning.

I had to do a few estimates.  I am sure we could get a more accurate number from your SQL database but everyone's average varchar size will change depending on what they are polling.

Also for comparison, I put in a comparison column to the space required if you stored integer values as int instead of varchar.  Assuming all data could be stored as int (which I am sure not all records could be stored as int), the estimated database size for 1769 dci's at 60 second polling for 30 days would be 1.7 GB.

See the attached spreadsheet for more details.  I created the calculations based on the information from the Microsoft MSDN page using a few assumptions stated above.

Also on a side note, are you creating a clustered index or any non-clustered indexes on the tables that store the rows of data?

Thank you,
Kevin

Victor Kirhenshtein

Thank you for spreadshit. I'll test it tomorrow, when I'll have MS Office at hands.

Quote from: kghammond on February 05, 2008, 07:00:24 PM
Also on a side note, are you creating a clustered index or any non-clustered indexes on the tables that store the rows of data?

For each table there are one clustered and one non-clustered index.

Best regards,
Victor

kghammond

The non-clustered index will add some size to the database.  I will try to add that into my spreadsheet to account for the non-clustered index to get a more accurate average usage.

Thank You,
Kevin

kghammond

I updated the spreadsheet to include the sizing of the clusted index and the nonclustered indexes.  I pulled the index definitions from one of the idata_nnn tables.

This should be a lot more realistic comparison and this should be a lot more useful for SQL server space planning.

One estimate this is still ignoring is that I think you create a table per dci.  This spreadsheet still assumes that you have one table that holds all the rows of data.  I think the size will go up slightly if I modify the spreadsheet to include one table per dci, but this spreadsheet should get within 5-10% accuracy with the avg size varchar assumed from your database size.

Kevin

Victor Kirhenshtein

Actually I create one table per node, not per DCI.

Best regards,
Victor

kghammond

I updated teh spreadsheet to include support for one table per node.  The results were almost identical in the size versus one big table.  I attached the updated spreadsheet.  There are two tabs.  You should be able to use either tab to do any space planning.

The center column is the one that mimics your database enviornment.  The space planning still estimates that you could reduce your database close to 3.6 GB if you used int's over varchar's in the SQL tables.

Kevin

Victor Kirhenshtein

Just check your spreadsheet - excellent! Can I publish it on web site in documentation section as well?

Best regards,
Victor

kghammond

Yes you may publish it.  Thank you for asking and I am glad I can help.