A Note on Number Conversions and Locales

Locale settings are used to a large extent to determine the language used by the program, but they influence other things as well. In particular, they influence the format of numbers. If one does not understand this, it can be vexing. It can also be useful since it provides a way to generate numbers in the correct format for other countries.

Here is an example. We begin by setting both the LANGUAGE and LC_ALL environment variables for American English:

setenv LANGUAGE en_US
setenv LC_ALL en_US
We will convert an ASCII text file containing textual representations of two floating point numbers:
cat test.asc
3.1415
2.7839
We convert this file to binary:
ascii2binary -V -t d < test.asc > test.bin
ascii2binary: converted 2 tokens
and convert back to text:
binary2ascii -V -t d -p 4 < test.bin
3.1415
2.7839
binary2ascii: converted 2 tokens
Now we set the LANGUAGE variable to continental French:
setenv LANGUAGE fr_FR
and redo the same conversion from text to binary as before:
ascii2binary -V -t d < test.asc > test.bin
ascii2binary: 2 jetons ont étés convertis
This time the message reporting the conversion of two tokens is in French. Now we set the LC_ALL variable to continental French:
setenv LC_ALL fr_FR
and repeat the conversion from text to binary yet again:
ascii2binary -V -t d < test.asc > test.bin
ascii2binary: entrée malformée 3.1415   jeton 1 de l'entrée
This time the conversion is unsuccessful; ascii2binary complains that the first token is ill-formed. Why? Because in the French locale a comma is used, not a decimal point. If we convert our binary version back to text while still in the French locale, it produces output with commas in place of the decimal point:
binary2ascii -t d -p 4 < test.bin
3,1415
2,7839

The moral is that there may be a disparity between the language that you want to use and the formats of the data you are dealing with. If the variables LC_ALL, LANGUAGE, and LANG are unset, the language is determined by the variable LC_MESSAGES while the number format is determined by the variable LC_NUMERIC, except for numbers that give monetary values, whose format is determined by LC_MONETARY. The value of LANG, if set, overrides that of LC_MESSAGES. The value of LANGUAGE, if set, in turn overrides that of LANG. Similarly, the value of LC_ALL, if set, overrides that of LC_MESSAGES.