From: http://www.i18nfaq.com/qa.html

 Are there any C/C++ library that support Unicode ( on UNIX ) ?

Yes, The open source International Components for Unicode (ICU) provides API to support Unicode in different platforms. There are other libraries that are commercially available too.

 Does all UNIX platform support Unicode ?

Many of the well known platforms have started to support Unicode recently. Some support the UTF-8 encoding form of Unicode.

 Should I go for Unicode on UNIX platform ?

 if your application uses Unicode as internal data, You have to do codepage conversion every time you display a string to the user. This may affect your performance

 My application use Unicode as internal codepage. How to I read/write to file ?

    * Write Unicode characters as 16-bit data.
      Advantage: no conversion. fast.
      Disadvantage: Only Unicode editors can read the file. When read as bytes you will end up with lots of null bytes which is ugly. File is platform dependent and need to take care of big/little endian while reading.

    * Convert to UTF-8 characters and write to file

      Advantage: All the English characters remain the same and readable using normal editors. Fast, since UTF-16 to UTF-8 is algorithmic transformation. No worry about big/little endian


 What is "I18N" ?


    * Externalization of strings, icons and graphics with text.

    * Selecting code page and defining code page conversions (if needed).

    * Modifying all the text manipulation functions to be aware of the code page.

    * Changing the logic of all the formatting functions (date,time,currency,numeric etc.)

    * Changing the logic in collation/sorting functions


 What is a code page ?

As we all know, Computers understand only numbers. So text are handled in computers by assigning numbers to each characters. Simply put, this mapping table of characters to its numeric value is called code page. You might often hear terms like charset, charmap, encoding, coded character sets used in this context. Even though there are subtle differences, for understanding purpose you can think of them as something that refer to mapping of characters to numeric values in each language. ASCII code page is a well known example where English alphabets and bunch of other control characters are mapped to specific numbers. 


 What are all the common "cultural specific information" one need to be aware of ?


    * Hard Coded Strings. - No string should be hard wired in the code.
      It should be externalized to a resource file so it can be translated to each language.
    * Character Classification. - How you classify each character. E.g.
      In English you have upper case character and lower case character. If you are "C" programmer you might remember calling isupper(), islower() to check for this classification. When you add more languages there are more classification to consider and sometimes uppercase/lowercase classification doesn't make sense in othe
      r languages.
    * Numeric and Currency Formatting. - Currency symbols and how you gro
      up the numbers differ in each country
    * Date and Time Formatting . - Month comes first or day comes first or year comes first ?
    * Collation (sorting order) - When you compare say "A" and "B" you can actually compare their ASCII values (0x41 & 0x42) to determine their sorting order. But this is not true in other code pages. So you have to apply special rules to find the sorting order. 


 What is a locale ?


Based on the technology, the locale will have different components to it. The basic components are language, territory and code page.

 What is "code page" conversion. Why is it needed ?


Since there are more than one code page for one language. (E.g Japanese have code-page EUC, Shift-JIS, UTF-8 etc), some times when you exchange data between systems you have to do code page conversions.
In other cases if you are working on a distributed application and decided to keep the internal code page as Unicode, then you have to convert to the systems native encoding while sending the string to display. Or when you receive user input etc.
code-page conversion are usually a table lookup operation and it is very expensive. You should limit these conversions at well defined boundaries to obtain optimal performance. Only few code-page like UTF-16 to UTF-8 provide algorithmic conversions.

 Is there any performance hit in using Unicode/UTF-8 in my application ?


When you convert from native encoding like shift-jis, big5 to Unicode/UTF-8 then you might run into some performance issues. The best way to deal with it is to standardize on Unicode/UTF-8 through out the product and convert to native encoding only if necessary. Windows and Java based application and browsers support Unicode/UTF-8. So most of the time you dont have to convert to native encoding.


Terminologies and Examples:

Native Encodings : Shift-jfs (support Ja), big5 (supports Chinese)
(std) Encodings : UTF-8

Code Page : (Table map for mapping character to it's numeric code) ASCII

How does it interact with configuring keyboard ?
  Keyboards have scan codes for each key. 
  Keyboard driver translates scan code to keycodes.
  
xmodmap  mykeys.xmap ==> loads x keymaps
loadkeys mykeys.map  ==> loads keymaps in non-X console mode.
xmodmap -pk or -pm prints keycodes or key maps
    
loadkeys => loads  /usr/share/lib/keytables/type_tt/layout_dd,
dumpkeys => prints keycode mappings from keyboard.
Note: There are different layouts for norway, german, etc keyboards.

/etc/fonts/fonts.conf:       
<dir>/usr/openwin/lib/locale/zh_TW.BIG5/X11/fonts/75dpi/</dir>
<dir>/usr/openwin/lib/locale/iso_8859_8/X11/fonts/TrueType/</dir>
<dir>/usr/openwin/lib/locale/en_US.UTF-8/X11/fonts/misc/</dir>

Note:
$ ls  /usr/share/lib/locale/com/sun/admin/pm/client/ 
pmHelpResources_de.class
pmHelpResources_fr.class 
...
This means internationalization here is compiled into the executable
so that there is no dynamic table lookup. How it is done ???

Question: With US keyboard, is it possible to support chinese locale ?
Answer: If the console font maps are available for chinese all output
        can be printed appropriately. 
        For keyboard maps, probably if X is active, we can remap the
        keys to chinese keys ???

What does en_US.UTF-8 mean ?

    language = en; Country=US; UTF-8=encoding - means all program
    internal character values are encoded in 8bit. e.g. 'a' = 65
    When printf("abc") is called, the output string is interpreted
    as set of 8bits chars and appropriate font & codemap is chosen.


What is iso_8859_8 ?
   ISO 8859 is standard character sets defined by ISO.
There are 16 parts from iso-8859-1 (english, latin-1, etc) to
iso-8859-16 (latin-10, south eastern, etc). Each character set defines
256 characters and what it means. The en_US character set basically
could use iso_8859_1 character set. The latin language has to use
multiple parts of iso-8859-* character sets since it contains more
characters. Now a days for complex languages, unicode is preferred.


What is unicode ?

    unicode is a 16bit character set. So it assigns unique numbers
for almost all languages! Big character set!
But when an unicode string of  "ABCD" appears in a program,
they are not usually allocated 4*2 bytes. Instead they are encoded
using sequences of one to four 8-bit code values (UTF-8). 
Advantage ? occupies less space for most languages. But when you print
the unicode string to display, it is decoded, and the font definition
is picked up from appropriate lookup table and printed.
Unicode is quite useful if a single page contains characters from
many different languages.

Since 1991, the Unicode Consortium has been working with ISO to 
develop the Unicode Standard and ISO/IEC 10646: the Universal 
Character Set (UCS) in tandem.


What is "native support for unicode" ?

    If you do  printf(unicode_string) and your display natively
prints all international characters, then you have native support
for unicode from OS. 

Typically unicode may be supported from inside browsers, not 
from terminal.

   Typically you set locale to some european language and then do:
printf(european_string) -- The corresponding iso_8859_N character
set font definition is looked up and 8bit bytes are printed.

If you print something in chinese, you have to have support for
unicode from OS as well as font mapping. When locale is set to
chinese, the display bytes are interpreted in sequence and
appropriate chinese fonts (subset of complete unicode character set)
are looked up and printed.

Is there a locale to set as "unicode" ?
 No. locale is set specify "default interpretation" of bytes.
 Whatever be the locale, the unicode strings should be able to
 display all language characters on same page.


I need to convert a program written in C (ASCII) to utf-8 encoding, 
running in Solaris 9 so as to accommodate Chinese/Japanese/Korean 
characters What areas should I pay attention to?  Should I convert 
all variables whose data types are char or char* to wchar_t?  
What functions should I use to manipulate the string?  

int main(int argc, char *argv[]){
    setlocale(LC_ALL, "en.US_UTF-8");

    //like to know which of the two below is correct
    //CASE I:
    char buf[256];
    sprintf(buf, "Here is the first arg value = %s\n", argv[1]);

    //CASE II:
    wchar_t wbuf[512];
    swprintf(buf, L"Here is the first arg value = %ls\n", argv[1]);
     
}

Should I declare wchar_t variable to hold values passed from argv[1]? 


 UTF-8 is a multi-byte encoding, so a single character could be 1, 2, 
or 3 bytes in length.  This has implications when you allocate 
memory, use loops, and pointer arithmetic.

I have an application written in C that reads some strings from a file.
The strings can be ISO-8859-1, UTF-16 encoded Unicode or UTF-8 
encoded Unicode.


Does anyone know of a free C library or other functions that are 
supported on Windows and the different *NIX platforms that I can use
to accomplish this?


From:

http://www.cs.uu.nl/wais/html/na-dir/internationalization/programming-faq.html


1. Which coding should I use for accented characters?
Use the internationally standardized ISO-8859-1 character set to type
accented characters. This character set contains all characters
necessary to type (West) European languages. This encoding is also the
preferred encoding on the Internet.  ISO 8859-X character sets use the
characters 0xa0 through 0xff to represent national characters, while
the characters in the 0x20-0x7f range are those used in the US-ASCII
(ISO 646) character set.  Thus, ASCII text is a proper subset of all
ISO 8859-X character sets.  

The characters 0x80 through 0x9f are earmarked as extended control
chracters, and are not used for encoding characters.  These characters
are not currently used to specify anything.  A practical reason for
this is interoperability with 7 bit devices (or when the 8th bit gets
stripped by faulty software).  Devices would then interpret the character
as some control character and put the device in an undefined state.
(When the 8th bit gets stripped from the characters at 0xa0 to 0xff, a
wrong character is represented, but this cannot change the state of a
terminal or other device.)

This character set is also used by AmigaDOS, MS-Windows, VMS (DEC MCS
is practically equivalent to ISO 8859-1) and (practically all) UNIX
implementations.  MS-DOS normally uses a different character set and
is not compatible with this character set. (It can, however, be
translated to this format with various tools. See below.)

Footnote: Supposedly, IBM code page 819 is fully ISO 8859-1 compliant.


ISO 8859-1 supports the following languages:
Afrikaans, Basque, Catalan, Danish, Dutch, English, Faeroese, Finnish,
French, Galician, German, Icelandic, Irish, Italian, Norwegian,
Portuguese, Spanish and Swedish.

----------------------------------------------------------------------
Note: ISO 8859-1  is 7 bit charset. All other 8859-X are 8 bit charsets
----------------------------------------------------------------------


Setting your environment for ISO-C (ANSI-C) programs

  
A C program inherits its locale environment variables when it starts up.
This happens automatically.  However, these variables do not
automatically control the locale used by the library functions, because
ISO/ANSI C says that all programs start by default in the standard C
locale.  To use the locales specified by the environment, The POSIX
standard defines the following call:
-----
setlocale (LC_ALL, "");     // Initialize from env variables.
-----

Note that 'support for Unicode' means different things to different
people.  Thus, Unix vendors claim Unicode support if they fulfill the
following:
* They define wchar_t in stddef.h. 
* They have ANSI functions mblen, mbstowcs, wctomb, mbtowc, and wcstombs. 
* The remaining infrastructure (file names, program names, shells,
  systems programs, etc.) is still at 8 bit characters.

-------------------

Windows NT supports Unicode, but there is no support for locale
mechanism

-------------------
Internationalization with X :

    int main (int argc, char** argv)
    {
        ...
        XtSetLanguageProc (NULL, NULL, NULL);
        top = XtAppInitialize ( ... );
        ...
    }

The LANG and LC_xxx environment variables (see section 3) will then be
used to determine the 'input method' for this X application.  This
input method is responsible for managing COMPOSE character sequences
or any other input mechanism for this particular implementation.

-----------------------

$ xlsfonts
$ xfd  -fn  '*adobe-adobe-courier-bold-o-normal--0-0-0-0-m-0-iso8859-1'
  to display all fonts in that fontset.

Note: some fontsets allocates 2 bytes per character.
-----------------------
7.3 Toolkits, Widgets, and I18N
The preferred way of inputing national characters when a national
keyboard is not available is one/several input methods.  These input
methods will then support various kinds of compose sequences to enter
national characters.

The environment variables LANG and/or LC_xxx select the language for
the Input Method (IM), but if several input methods exist, the
environment variable XMODIFIERS can be used to select a specific input
method.
----------------------------

Intro to I18N:
http://www.debian.org/doc/manuals/intro-i18n/

-----------------------------
multibyte characters for CJK (Chinese, Japanese, and Korean) languages,
combined characters for Thai?
-----------------------------

a. L10N (localization) model
    This model is to support two languages or character codes, 
English (ASCII) and another specific one. 

b. I18N (internationalization) model
    This model is to support many languages but only two of them, 
English (ASCII) and another one, at the same time. One have to 
specify the 'another' language, usually by LANG environmental variable.
The above I18N-L10N model can be regarded as a part of this I18N model.
gettextization is categorized into I18N model. 

c. M17N (multilingualization) model
    This model is to support many languages at the same time. 
------------------------------------------

The main part of the I18N model is, in the case of a C program, 
achieved using standardized locale technology and gettext.

The M17N model can be achieved using international encodings such as 
ISO 2022 and Unicode. Though you can hard-code these encodings for 
your software (i.e. approach B), I recommend to use standardized 
locale technology. However, using international encodings is not 
sufficient to achieve the M17N model. You will have to prepare a 
mechanism to switch input methods. You will also want to prepare an 
encoding-guessing mechanism for input files, such as jless and emacs 
have. Mule is the best software which achieved M17N 
(though it does not use locale technology). 

------------------------------------------


http://developers.sun.com/dev/gadc/educationtutorial/creference/locale/locale.html


From man setlocale:
 char *setlocale(int category, const char *locale);

category is :
 LC_CTYPE,    LC_NUMERIC,   LC_TIME,   LC_COLLATE,
 LC_MONETARY,  LC_MESSAGES,  and  LC_ALL. 

     The LC_MESSAGES variable affects the behavior  of  messaging
     functions such as dgettext(3C), gettext(3C), and gettxt(3C).

     A value of "C" for locale  specifies  the  traditional  UNIX
     system behavior. At program startup, the equivalent of
          setlocale(LC_ALL, "C")
     is executed. 

 setlocale(LC_ALL, ""); => means take it from Env.

 setlocale(LC_ALL, NULL); => Just return the current locale.

 setlocale(LC_ALL, locale_string); => set locale and return the
                           resulting locale.

The call always returns the resulting locale.
So, you can save the locale and call again to restore the locale.

-----------------------------------------------------------------


     locale(1), ctype(3C), getdate(3C)  gettext(3C),  gettxt(3C),
     isdigit(3C),    localeconv(3C),   mbtowc(3C),   strcoll(3C),
     strftime(3C),    strptime(3C)    strxfrm(3C)    tolower(3C),
     wctomb(3C),     libc(3LIB),    attributes(5),    environ(5),
     locale(5), standards(5)

-----------------------------------------------------------------
Environ variables priority order:
Lower Prio to Highest Priority
LANG =>  LC_* (specific type) => LC_ALL
-----------------------------------------------------------------

see man environ(5)

 NLSPATH Contains a sequence of templates which  catopen(3C)  and
         gettext(3C)  use when attempting to locate message cata-
         logs.

   NLSPATH=":%N.cat:/nlslib/%L/%N.cat"
        %t = territory ; %c = codeset (e.g. ISO8859-1; UTF-8 ) 

Note: always thought UTF-8 is encoding not a codeset. 
      but how comes locale -a lists  fr.UTF-8 ?

-----------------------------------------------------------------


sample programs:

http://developers.sun.com/dev/gadc/educationtutorial/creference/sampfiles/sampfiles.html


Pseudo code for messaging:

  // by default it remains as "C" if you don't do the following
  (void)setlocale(LC_ALL, ""); // Initialize locale from env.

 nl_catd catopen(const char *name, int oflag);

  // search catalog in $NLSPATH dir
  cat_id = catopen("./today.cat", NL_CAT_LOCALE);

 // Checking Order:
 //    $NLSPATH  
 //    /usr/lib/locale/locale/$LC_MESSAGES (if NLSPATH not defined)
 // In C locale, catopen() just succeeds without looking anywhere.

 //     oflag        meaning
 //      0            %L in NLSPATH means $LANG
 //  NL_CAT_LOCALE    %L in NLSPATH means $LC_MESSAGES


Example:  LANG=language[_territory[.codeset]] ;  
   LANG=De_A.88591 ; means De=German_Language;A=as spoken in Austria;
                     88591=The terminal supports ISO 8859-1 codeset.

  /*
   * Greet user and print day of the week.
   * Catgets will read string 1 from set 1 of the catalog
   * file cat_id, and if it can't find this, it prints the 
   * string  in the last parameter (i.e. "Hello\n" here).
   */
   printf(catgets(cat_id, 1, 1, "Hello\n"));
   printf(catgets(cat_id, 1, 2, "Today is %s\n"), localday());
   printf(catgets(cat_id, 1, 3, "Goodbye\n"));

   /* close the catalog file before we go. */
   catclose(cat_id);

----------------------------------------------------------

Note: Check how i18n is done in windows for HADB. 

----------------------------------------------------------

sql_srv pseudo code : 
  [1] ustchs(chsno = 0), line 50 in "utlcase.c"
  [2] sesGetEnv(envpar = (nil)), line 169 in "sesenv.c"
  [3] main(argc = 3, argv = 0xffbfebdc), line 688 in "sqlsrv.cc"

 ustchs(chsno = 0) : Set current character set.
               basically sets up chs upper/lower map. unused for now.

=>[1] utlInitMessages(), line 136 in "messages.c"

  Initialize 32 msg catalogs: msgcat[1..32] . unused ???

=>[1] utlInitLocale(locale = 0x4491d0), line 617 in "locale.c"

  This initializes our own locale structure for all months, etc!!
  why?!!! This is not internationalizable???

=> [1] utlSetLocale(locale = 0x4491d0, name = (nil), status = 0xffbfe61c), line 711 in "locale.c"

 DecodeLocale(loc, language, territory, modifier, extchsname);
 ...

   struct lconv *localeconv(void);
It returns values such as : 
     char *decimal_point;        /* "." */
     char *thousands_sep;        /* "" (zero length string) */
     char *grouping;             /* "" */
     .....

  populate our locale structs according to return val of localeconv().

SetMessageCatalog(locale, status); locale.c:767

=>[1] ugtfnm(type = fnm_message_file, infile = (nil), outfile = 0xffbfdf00 "lib/locale/ "), line 548 in "fileops.c"  
[2] BuildMsgFileName(charset = 4, language = 0x449201 "no", territory = 0x44920a "no", modifier = 0x449213 "", namesize = 256U, filename = 0xffbfe0cc "..."), line 649 in "messages.c"
[3] OpenMsgcatFile(charset = 4, language = 0x449201 "no", territory = 0x44920a "no", modifier = 0x449213 "", catd = 0xffbfe24c, retcharset = 0xffbfe252), line 588 in "messages.c"
 [4] utlBindMessages(charset = 4, language = 0x449201 "no", territory = 0x44920a "no", modifier = 0x449213 "", retcat = 0xffbfe2c8, status = 0xffbfe614), line 254 in "messages.c"
 [5] SetMessageCatalog(locale = 0x4491d0, status = 0xffbfe614), line 1427 in "locale.c"
 [6] utlSetLocale(locale = 0x4491d0, name = (nil), status = 0xffbfe614), line 767 in "locale.c"
 [7] sesGetEnv(envpar = (nil)), line 175 in "sesenv.c"
 [8] main(argc = 3, argv = 0xffbfebd4), line 688 in "sqlsrv.cc"

  
first the filename is filled with spaces at end then it gets
truncated.... why? why?


=>[1] BuildMsgFileName(charset = 4, language = 0x449201 "no", territory = 0x44920a "no", modifier = 0x449213 "", namesize = 256U, filename = 0xffbfe0cc "lib/locale/no_no.iso8859-1/LC_MESSAGES/status.cat"), line 701 in "messages.c"

current thread: t@1
=>[1] sesGetEnv(envpar = (nil)), line 207 in "sesenv.c"
  [2] main(argc = 3, argv = 0xffbfebd4), line 688 in "sqlsrv.cc"

struct dbenv_s *sesGetEnv(struct envpar_s *envpar)
{
  Initializes single env array.
  InitEnvironment(env = 0x449694, locale = 0x4491d0){
     env->system_charset.index = utlLookupCharset(CHS_ASCII);
     env->standard_charset.ident = CHS_LATIN1;
     env->national_charset.ident = CHS_UTF16; // unicode 16-bit enc!!!
     attach locale info to env.
  }
  tintxt(){
     init some hash lookup table. nothing to do with i18n.
  }
  sesInitEnvParams(){
      // what is this ????
      envpar->options  = ENV_HAS_MULTIPLE_THREADS;
      envpar->dbsmax   = 2;      /* Allocate max. 2 session objects */
      envpar->heappar  = 0;
      envpar->heapsize = 16384;  /* Allocate a small heap */
      envpar->heapbase = NULL;   /* Dynamically allocated heap */
  }

  dysCreateAlloc(env->allocator, &env->s, envpar->heapbase,
      initial, extend, maxsize, 0, DYS_HEAP,
      heap_options) {
     ....
     segment = (seghead_s *) AllocMemory(status, segsize, commit){
       ....
       base   = mmap(NULL, (size_t)size, PROT_READ|PROT_WRITE,
                          MAP_PRIVATE|MAP_ANON|MAP_NORESERVE, -1, 0);

           When MAP_ANON is set in flags, and  fildes  is  set  to  -1,
     mmap()  provides  a direct path to return anonymous pages to
     the caller.
         ....
     The advantage of this over malloc is that it won't reserve
     swap space when it allocates the memory ???
         ......
     }
     ....
  }
}


Note :  $ pmap $$   reveals mapping to locale library as follows :
/usr/lib/locale/fr_CA.ISO8859-1/fr_CA.ISO8859-1.so.3

so the system locale libs are installed in lang_Ter.<charset>.so

=>[1] createSqlServer(argc = 3, argv = 0xffbfebd4, stat = CLASS), line 751 in "sqlsrv.cc"
  [2] main(argc = 3, argv = 0xffbfebd4), line 700 in "sqlsrv.cc"

createSqlServer(...){
   scan args;
   utlRegProgramName(argv[0], (Status_s *)(&stat)){
    Initializes base dir variable to /home/thava/sol/ even if the 
      server had been invoked as /home/thava/sol/lib/server/clu_sql_srv
   }

}

 init_hadb_srv(node = 0, port = 0, configFile = 0xffbff96b "/export/home/log/thava/var1/dbdef/hadb/0/cfg", stat = CLASS, threaded = false), line 2059 in "pmanager.cc"
  [2] createSqlServer(argc = 3, argv = 0xffbfeb4c, stat = CLASS), line 784 in "sqlsrv.cc"
  [3] main(argc = 3, argv = 0xffbfeb4c), line 700 in "sqlsrv.cc"


init_hadb_srv(){
   ....
   synInit(){ ...; pthread_setconcurrency(50); ... }
}

=>[1] scanConf(mynono = 0, fileName = 0xffbff96b "/export/home/log/thava/var1/dbdef/hadb/0/cfg", reScan = false, stat = CLASS), line 187 in "scanconf.cc"
  [2] scanConf(mynono = 0, fileName = 0xffbff96b "/export/home/log/thava/var1/dbdef/hadb/0/cfg", stat = CLASS), line 427 in "scanconf.cc"
  [3] SystemConfig::reset(myNumber = 0, myName = 0, configFile = 0xffbff96b "/export/home/log/thava/var1/dbdef/hadb/0/cfg", stat = CLASS), line 473 in "config.cc"
  [4] init_hadb_srv(node = 0, port = 0, configFile = 0xffbff96b "/export/home/log/thava/var1/dbdef/hadb/0/cfg", stat = CLASS, threaded = false), line 2067 in "pmanager.cc"
  [5] createSqlServer(argc = 3, argv = 0xffbfeb4c, stat = CLASS), line 784 in "sqlsrv.cc"
  [6] main(argc = 3, argv = 0xffbfeb4c), line 700 in "sqlsrv.cc"

head /export/home/log/thava/var1/dbdef/hadb/0/cfg
#  HADB config file created Wed Mar 28 18:54:04 IST 2007
#  Config attributes:

set clientkey 1302674310
set msgkey 643592199
....


=>[1] LogMsg::newConfig(this = 0x4303b4), line 422 in "logger.cc"
  [2] lmNewConf(), line 166 in "logger.h"
  [3] ScanArg::setVariables(this = 0x40f730, silent = 0, stat = CLASS), line 83 in "scanarg.cc"
  [4] createSqlServer(argc = 3, argv = 0xffbfeb4c, stat = CLASS), line 786 in "sqlsrv.cc"
  [5] main(argc = 3, argv = 0xffbfeb4c), line 700 in "sqlsrv.cc"

=>[1] initLogLevMap(stat = CLASS), line 192 in "syseventlog.cc"
  [2] createSqlServer(argc = 3, argv = 0xffbfeb4c, stat = CLASS), line 788 in "sqlsrv.cc"


initLogLevMap(){
  .... (facility = 128 = 16<<3 = local0) ...
    openlog(syslog_ident="hadb", LOG_NOWAIT | LOG_NDELAY, facility=128);
  ...
}


#define toInfo(code)           {lmStart(LM_INFO);    code; lmLog();}
#define toWarning(code)        {lmStart(LM_WARNING); code; lmLog();}
#define toWarningInt(txt,val)  toWarning(lmIntX(txt, val, "\n"))


Note: Internationalization is limited to SQL query responses.

--------------------------------------------------------------------------
The english messages are hard coded in catalog0 :

static dflmsg_s eng_msgs[] =         /* All defined English messages */
 {
#include "errutils/defaultmsgs.h"
    {   -1, NULL}                     /* Terminator row */
 };
--------------------------------------------------------------------------

/src/utl/formatstatus.c
 *     utlFormatStatus   - Format message string(s) in status object
 *     utlSetStatusSrcLine - Set current source line for status reporting
* provides clu code to sqlcode mapping.

    Used only for SQL messages returned to the user ???

typedef struct
    {
         Clustatus_e code;
         char        sqlstate[6];
    } sqlstate_s;


void utlFormatStatus(Status_s *status, locale_s *locale);

errutils/statuss.h
   typedef struct Status_s         /* Status, error trace, etc            */
     {
           Clustatus_e number;          /* Same as StatusNumber in C++   */
           Severity_e  level;           /* success/warning/error/fatal   */
           char        sqlcode
                    [SQL_CODE_LEN];    /* SQL standardized code          */
           char        message
                       [MAX_MSGLEN+1];  /* Formatted status message      */
           ......
      }


utlFormatStatus() is called from here:
./common/errhandler.cc:{  utlFormatStatus(this, NULL); // Format "message" and "iosmsg" buffers
                 Note : called from Status::getText(); Why NULL ??? 
./src/ses/sesdbs.c:    utlFormatStatus(dbs->s, dbs->locale);
./src/ses/sesenv.c:    utlFormatStatus(&env->s, env->locale);
./src/sri/srimisc.c:        utlFormatStatus(status, NULL);  /* Format error message */
./src/srv/systrace.c:        utlFormatStatus(dbs->s, NULL); /* Format status message */
./src/utl/utlpro.h:void  utlFormatStatus(Status_s *status, locale_s *locale);

/errutils/statuse.h   lists all errors...


Questions:  Can the SQL client have different locale than server ?


Error Ranges Reserved Already :


* provides clu code to sqlcode mapping.

From cli/adapter/IHADBMOperation.java :
    ERROR_BASE = 22000  upto +260 are reserved.

client/com/sun/hadb/adminapi/HADBException.java:
   String retstr = "HADB-E-" + errorcode + ": ";
 public final static int BASE = 21500;  upto +60 are reserved.

common/com/sun/hadb/mgt/MgtException.java:
   private final static int BASE = 21000; upto +100 are reserved

./com/sun/hadb/comm/CommException.java :
    private final static int BASE = 25000; upto +25 are reserved.

./com/sun/hadb/jdbc/DbException.java :
     private final static int BASE = 20000;  +62 is reserved
     
./mgt/client/com/sun/hadb/cli/framework/CommandValidationException.java
   Uses Error Code 22127 - 

  Much of ./mgt/client/com/sun/hadb/cli/bundle.properties
  entries are directly retrieved by getLocalizedString(stringKey);

 ./mgt/client/com/sun/hadb/cli/errorcodes.properties list some 
 error codes.

./mgt/common/com/sun/hadb/mgt/OperationAction.properties
lists some operation action properties: e.g:
  deleteFileResources=Deleting history and device files.
Currently there is no message Id for operation action (progress) display

./mgt/server/com/sun/hadb/mgt/rema/WaitCondition.properties
   lists the conditions for which the wait has failed.
   There is no specific Error ID associated with these.

E.g. From hadbm trying to create hadb db when it already exists :

-- Lager database modell.                               |  6%  --
hadbm:Error 22021: Det finnes allerede en database med navn hadb.

Note: Error Id is visible to the user and internationalized.

We don't need to centralize the info messages unique id generation.

HADB Server Error Messages :
Note: Around 1100 messages are internationalized.
      The used range is 0-7000  10000-19999 28000-29000

e.g.
./clusql command displays following message: (in no_NO locale)

SQL: select * from ;
HADB-E-12402: Syntaks-feil at or near `;' @ <stdin>:1:14

These are mainly displayed from following routines:
./src/prg/dbclnt.c:            srvWriteStatus(dbs->s);
./src/prg/dbcrea.c:            srvWriteStatus(dbs->s);
./src/prg/dbserv.c:                    srvWriteStatus(dbs->s);
./src/prg/sql.c:    srvWriteStatus(dbs->s);
./src/prg/sqline.c:       srvWriteStatus(etx->s);
./src/srv/lineio.c:int  srvWriteStatus(Status_s *status)
....

For Example:
./errutils/statuse.h:   clustat_InvalidAccessMode = 0x2c2c, /* 11308 */
./errutils/defaultmsgs.h:    {11308, "Invalid access mode"},
./odbc/driver/errmapp.h:#define TII_ILLEGAL_ACCESS_MODE     11308


MA Log :

  MgtLogger.setUpLogging( cfgfile properties) {

     Logger logger = getLogger();

     logger.setUseParentHandlers(false);  // default = true

     /* take the string "logfile.loglevel" from mgt.cfg file */
     logger.setLevel(Level mylevel); /* i.e. INFO , FINEST, etc. */

     /* One file handler per file. This way you can attach
      * different files like console, file, etc to one log output.
      */
     FileHandler fh = new FileHandler(fileLogName, true);
     fh.setLevel(fileLogLevel);
     fh.setFormatter(new MgtFormatter());
     logger.addHandler(fh);
  }

public class  MgtFormatter extends formatter {
    ...
  public String format(LogRecord  record) {
    Stringbuffer sb; /* format msg */
    append record.getMessage();
    return sb.toString();
  }
}


public static Logger getLogger(String name)

    Find or create a logger for a named subsystem. If a logger has 
already been created with the given name it is returned. 
Otherwise a new logger is created.

    If a new logger is created its log level will be configured based 
on the LogManager configuration and it will configured to also send 
logging output to its parent's handlers. It will be registered in the
LogManager global namespace.

 Each Logger may have a ResourceBundle name associated with it. 
The named bundle will be used for localizing logging messages. 
If a Logger does not have its own ResourceBundle name, then it will 
inherit the ResourceBundle name from its parent, recursively 
up the tree.

 Most of the logger output methods take a "msg" argument. 
This msg argument may be either a raw value or a localization key. 
During formatting, if the logger has (or inherits) a localization 
ResourceBundle and if the ResourceBundle has a mapping for the msg
string, then the msg string is replaced by the localized value.

MgtLogger.java 	:  Logger.getLogger("com.sun.hadb.mgt");

  The log routines: 
        logger.info(); / warning, severe, fine, finer, finest
        logger.isLoggable(Level level);
        logger.log(Level level, String msg, Object[] params);

---------------------------------------------------------------
In common area, the MgtFormatter.java provides format logic:

      sb.append(classNameOnly(record.getSourceClassName()));
      sb.append('.');
      sb.append(record.getSourceMethodName());
      sb.append(": ");
      sb.append(record.getMessage());

This basically prints the ClassName.methodName():
---------------------------------------------------------------

The java logs :


HADBM Log Messages : 

     120000  : HADBM/client Info Messages.
     140000  : HADBM/client Warning Messages.
     160000  : HADBM/client Severe Messages.
     180000  : HADBM/client Config Messages.
     190000  : HADBM/client FINE Messages.

     220000  : MA Info Messages
     240000  : MA Warning Messages
     260000  : MA Severe Messages

     320000  : HADB Info Messages
     340000  : HADB Warning Messages
     360000  : HADB Severe Messages