Character set set (locale) woes? Non-ASCII characters.

Started by jeffreyz, March 29, 2014, 12:00:52 PM

Previous topic - Next topic

jeffreyz

I have bumped into a problem when trying to generate an agent list from a query on an SQL database. I am creating the agent list using the "ExternalList" agent configuration parameter.

This actually works very nicely - the "ExternalList" mechanism is really great and I am not saying that it has a problem, but I find that all non-ASCII characters are stripped out of the bash script that runs to define the list items. This bash script performs that SQL query, although this problem is not related to SQL in any way.

The best way to describe this problem is to show you a test script (nothing to do with SQL) and then show you the results of running this script manually on the agent machine (Debian Linux 7.4) in a bash shell, and then compare this to the list generated by executing exactly the same code by NetXMS via the "ExternalList" mechanism.

Here is the test script:


locale
echo 'asdf æøå ÆØÅ qwerty'


The "locale" command generates several lines of output (each line becomes a list element). The non-ASCII characters in the "echo" command are Norwegian letters that appear in SQL results in my actual case where I define the agent list via an SQL query. When I execute these commands manually from within a bash shell on the agent machine, I get the following output:


LANG=en_US.UTF-8
LANGUAGE=en_US:
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
asdf æøå ÆØÅ qwerty


However, when I let the NetXMS agent execute these same commands on the same machine via the "ExternalList" mechanism. Here is the list generated (using nxget -l ...):


LANG=C
LANGUAGE=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C
asdf


Notice that the locale details are all different and that the command "echo 'asdf æøå ÆØÅ qwerty'" was truncated at the first non-ASCII character. It is not that the Norwegian characters were stripped from the output - the output was truncated at the first non-ASCII character because the "querty" string at the end does not appear.

Here are my questions:


  • Is this the sign that the NetXMS agent and/or server were not built with Unicode support? If so, would the problem likely go away if I re-built the agent in Unicode support? Or would I have to rebuild the server as well? If not, then ...
  • Is there another fix to this problem that you can suggest? If not, then ...
  • Is this problem due to how the internals of NetXMS are implemented and out of my control? If so, is this something you can fix in a future release?

Thanks for the great support you provide on NetXMS.

jeffreyz

I just performed this same test on a NetXMS v1.2.12 server and agent that were built with Unicode support.

I used the same bash script:

locale
echo 'asdf æøå ÆØÅ qwerty'


Here is the result of executing this script manually in a shell:

LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
asdf æøå ÆØÅ qwerty


This is what I would hope to see. The Norwegian characters appear correctly.

Here is the result of the NetXMS agent executing these same commands on the same machine via the "ExternalList" mechanism (I generated this output with nxget -l ...)


LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
asdf


This is bad.  The echo output is still truncated at the first non-ASCII character.

So just compiling NetXMS with Unicode support does not seem to fix this problem, at least according to my test here.

Is there anything else I can try?