Server problems

Started by Nikk, November 01, 2013, 12:17:15 PM

Previous topic - Next topic

Nikk

Hi,

I'm experiencing problems with two netxms servers.

Problem 1.

NetXMS running on VMware Windows server 2012.

Starting server takes about an 20-30min, first time it occured is when I raised the datacollector amount by 5! I lowered back to default amount but nothing, then I thought maybe it's because of SNMP tables or logwatch scripts which I added, but when removing all of that, anyway the same problem.
I just get - Error 1053: The service did not respond to the start or control request in a timely fashion.
and after a long time it starts. Nothing in logs but something i got in crashdump:

QuoteNETXMSD CRASH DUMP
????????????

EXCEPTION: C0000005 (Access violation) at 0000000000000000

NetXMS Version: 1.2.9
OS Version: Windows NT 6.2 Build 9200
Processor architecture: AMD64 (Intel EM64T)

Call stack:
  [libnxdb:00000000005F4A3C]: class String __cdecl DBPrepareString(struct db_handle_t * __ptr64,wchar_t const * __ptr64,int)
  [nxcore:0000000180011E8C]: void __cdecl WriteAuditLog(wchar_t const * __ptr64,int,unsigned int,wchar_t const * __ptr64,unsigned int,wchar_t const * __ptr64,...)
  [nxcore:000000018008D9BD]: private: void __cdecl ClientSession::readThread(void) __ptr64
  [nxcore:000000018008C5F2]: private: static unsigned int __cdecl ClientSession::readThreadStarter(void * __ptr64)
  [libnetxms:000000000020452F]: unsigned int __cdecl SEHThreadStarter(void * __ptr64)
  [MSVCR80:000000005D0937D7]: _endthreadex
  [MSVCR80:000000005D093894]: _endthreadex
  [KERNEL32:000007FA941A1832]: BaseThreadInitThunk
  [ntdll:000007FA9662D609]: RtlUserThreadStart

Problem 2.

NetXMS running on Ubuntu server 12.04 x64.

I'm getting segmentation fault, strace:
QuoteProgram received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb48f5b40 (LWP 19381)]
0xb7d1cf66 in ?? () from /lib/i386-linux-gnu/libc.so.6
(gdb) bt
#0  0xb7d1cf66 in ?? () from /lib/i386-linux-gnu/libc.so.6
#1  0xb7effca0 in DCTable::processNewValue (this=0x80b4b70,
    nTimeStamp=1383298291, value=0xb59199c0) at dctable.cpp:389
#2  0xb7f0292c in DataCollectionTarget::processNewDCValue (this=0x80cd218, dco=
    0x80b4b70, currTime=1383298291, value=0xb59199c0) at dctarget.cpp:366
#3  0xb7eefbb7 in DataCollector (pArg=0x0) at datacoll.cpp:254
#4  0xb7e48d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#5  0xb7d87bae in clone () from /lib/i386-linux-gnu/libc.so.6


And in management console more than often i get Software caused connection abort: socket write error and then i must restart console to be able to do something again.

And, once I exported a template from server, and know, when I want to import it back, I get timed out, and  timed out on everything. Here is trace:
Quote======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(+0x75ee2)[0xb742dee2]
/usr/local/lib/libnxcore.so.1(_Z14ValidateConfigP6ConfigjPci+0x628)[0xb76306b8]
/usr/local/lib/libnxcore.so.1(_ZN13ClientSession19importConfigurationEP11CSCPMessage+0x24e)[0xb76879ce]
/usr/local/lib/libnxcore.so.1(_ZN13ClientSession16processingThreadEv+0xd59)[0xb7692fb9]
/usr/local/lib/libnxcore.so.1(_ZN13ClientSession23processingThreadStarterEPv+0x1b)[0xb769387b]
/lib/i386-linux-gnu/libpthread.so.0(+0x6d4c)[0xb7568d4c]
/lib/i386-linux-gnu/libc.so.6(clone+0x5e)[0xb74a7bae]


Program received signal SIGABRT, Aborted.
[Switching to Thread 0xadc89b40 (LWP 23803)]
0xb7fdd424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7fdd424 in __kernel_vsyscall ()
#1  0xb7cc61df in raise () from /lib/i386-linux-gnu/libc.so.6
#2  0xb7cc9825 in abort () from /lib/i386-linux-gnu/libc.so.6
#3  0xb7d0339a in ?? () from /lib/i386-linux-gnu/libc.so.6
#4  0xb7d0dee2 in ?? () from /lib/i386-linux-gnu/libc.so.6
#5  0xb7f106b8 in ~ConfigEntryList (this=<optimized out>,
    __in_chrg=<optimized out>) at ../../../include/nxconfig.h:111
#6  ValidateTemplate (errorTextLen=1024, errorText=0xadc8897c "@",
    root=<optimized out>, config=0xb560a428) at import.cpp:119
#7  ValidateConfig (config=0xb560a428, flags=0, errorText=0xadc8897c "@",
    errorTextLen=1024) at import.cpp:209
#8  0xb7f679ce in ClientSession::importConfiguration (this=0xb6206f28,
    pRequest=0xb530aff0) at session.cpp:8918
#9  0xb7f72fb9 in ClientSession::processingThread (this=0xb6206f28)
    at session.cpp:1133
#10 0xb7f7387b in ClientSession::processingThreadStarter (pArg=0xb6206f28)
    at session.cpp:203
#11 0xb7e48d4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#12 0xb7d87bae in clone () from /lib/i386-linux-gnu/libc.so.6

Here is the template:
Quote<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <formatVersion>3</formatVersion>
   <description>Ethernet Statistics</description>
   <events>
   </events>
   <templates>
      <template id="1824">
         <name>Ethernet Statistics</name>
         <flags>0</flags>
         <dataCollection>
            <dctable id="213">
               <name>.1.3.6.1.2.1.16.1.1.1.1</name>
               <description>Ethernet Statistics</description>
               <origin>2</origin>
               <interval>60</interval>
               <retention>30</retention>
               <systemTag></systemTag>
               <advancedSchedule>0</advancedSchedule>
               <rawValueInOctetString>0</rawValueInOctetString>
               <snmpPort>0</snmpPort>
               <transformation></transformation>
               <columns>
                  <column id="1">
                     <name>Index</name>
                     <displayName>Index</displayName>
                     <snmpOid>.1.3.6.1.2.1.16.1.1.1.1</snmpOid>
                     <flags>256</flags>
                  </column>
                  <column id="2">
                     <name>Broadcast Packets</name>
                     <displayName>Broadcast Packets</displayName>
                     <snmpOid>.1.3.6.1.2.1.16.1.1.1.6</snmpOid>
                     <flags>33</flags>
                  </column>
                  <column id="3">
                     <name>Multicast Packets</name>
                     <displayName>Multicast Packets</displayName>
                     <snmpOid>.1.3.6.1.2.1.16.1.1.1.7</snmpOid>
                     <flags>1</flags>
                  </column>
                  <column id="4">
                     <name>Packets</name>
                     <displayName>Packets</displayName>
                     <snmpOid>.1.3.6.1.2.1.16.1.1.1.5</snmpOid>
                     <flags>1</flags>
                  </column>
                  <column id="5">
                     <name>Collisions</name>
                     <displayName>Collisions</displayName>
                     <snmpOid>.1.3.6.1.2.1.16.1.1.1.13</snmpOid>
                     <flags>0</flags>
                  </column>
                  <column id="6">
                     <name>CRC Align Errors</name>
                     <displayName>CRC Align Errors</displayName>
                     <snmpOid>.1.3.6.1.2.1.16.1.1.1.8</snmpOid>
                     <flags>0</flags>
                  </column>
                  <column id="7">
                     <name>Drop Events</name>
                     <displayName>Drop Events</displayName>
                     <snmpOid>.1.3.6.1.2.1.16.1.1.1.3</snmpOid>
                     <flags>0</flags>
                  </column>
               </columns>
               <thresholds>
               </thresholds>
               <perfTabSettings></perfTabSettings>
            </dctable>
         </dataCollection>
      </template>
   </templates>
   <traps>
   </traps>
</configuration>
Is the template wrong, or is there something else?

Thanks in advance,
Nikk

Victor Kirhenshtein

Hi!

Problem #2 seems to be the same as in https://www.netxms.org/forum/general-support/segfault-2648. You can try to apply suggested patch and recompile if you was building from sources. I'll take a look at other two problems.

I plan to release 1.2.10 somewhere next week, so hopefully all those crashes will be fixed.

Best regards,
Victor

Nikk

Okey, i'll try that patch, and will let you know about the progress!

Nice to hear that, big thanks!

Nikk

Nikk

Hi,
I tried this:
Quote from: Victor Kirhenshtein on November 01, 2013, 08:01:05 PM
Problem #2 seems to be the same as in https://www.netxms.org/forum/general-support/segfault-2648. You can try to apply suggested patch and recompile if you was building from sources.
and it worked, thanks :)

Nikk

Alex Kirhenshtein

Template import fixed in current trunk and will be released in 1.2.10

Nikk


Nikk

Hi,

Any changes regarding to problem #1? It is annoying, that each time, when I want to restart server, I have to wait 20-30 min :/.

Thanks in advance,
Nikk

ericq

I solved problem #1 by cahnging the startup path of the service.
The service NetXMSAgentdW32 will start the follow.
"C:\NetXMS\bin\nxagentd.exe" -d -c "C:\NetXMS\etc\nxagentd.conf" -n "NetXMSAgentdW32" -e "NetXMS Win32 Agent" -D -1 -M "192.168.0.41"
The Debug level is set to -D -1. this has to be -D 1.
Edit the service to start this
"C:\NetXMS\bin\nxagentd.exe" -d -c "C:\NetXMS\etc\nxagentd.conf" -n "NetXMSAgentdW32" -e "NetXMS Win32 Agent" -D 1 -M "192.168.0.41"

Run regedt32.exe then navigate to the key for the service found in;
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services
and edit the Reg_Expand_Sz string named 'ImagePath'

Nikk

Hi ericq,

In my case, agent is starting fine, but core is the guilty one.
And core is starting this:
C:\NetXMS\bin\netxmsd.exe" --config "C:\NetXMS\etc\netxmsd.conf" -d

Thank you anyway :)

Nikk