diff options
Diffstat (limited to 'gcc-4.8.1/libstdc++-v3/doc/html/manual/facets.html')
-rw-r--r-- | gcc-4.8.1/libstdc++-v3/doc/html/manual/facets.html | 736 |
1 files changed, 0 insertions, 736 deletions
diff --git a/gcc-4.8.1/libstdc++-v3/doc/html/manual/facets.html b/gcc-4.8.1/libstdc++-v3/doc/html/manual/facets.html deleted file mode 100644 index 7d98192c7..000000000 --- a/gcc-4.8.1/libstdc++-v3/doc/html/manual/facets.html +++ /dev/null @@ -1,736 +0,0 @@ -<?xml version="1.0" encoding="UTF-8" standalone="no"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Facets</title><meta name="generator" content="DocBook XSL-NS Stylesheets V1.77.1" /><meta name="keywords" content="ISO C++, library" /><meta name="keywords" content="ISO C++, runtime, library" /><link rel="home" href="../index.html" title="The GNU C++ Library" /><link rel="up" href="localization.html" title="Chapter 8. Localization" /><link rel="prev" href="localization.html" title="Chapter 8. Localization" /><link rel="next" href="containers.html" title="Chapter 9. Containers" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Facets</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="localization.html">Prev</a> </td><th width="60%" align="center">Chapter 8. - Localization - -</th><td width="20%" align="right"> <a accesskey="n" href="containers.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="std.localization.facet"></a>Facets</h2></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.localization.facet.ctype"></a>ctype</h3></div></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.ctype.impl"></a>Implementation</h4></div></div></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="idp15778336"></a>Specializations</h5></div></div></div><p> -For the required specialization codecvt<wchar_t, char, mbstate_t> , -conversions are made between the internal character set (always UCS4 -on GNU/Linux) and whatever the currently selected locale for the -LC_CTYPE category implements. -</p><p> -The two required specializations are implemented as follows: -</p><p> -<code class="code"> -ctype<char> -</code> -</p><p> -This is simple specialization. Implementing this was a piece of cake. -</p><p> -<code class="code"> -ctype<wchar_t> -</code> -</p><p> -This specialization, by specifying all the template parameters, pretty -much ties the hands of implementors. As such, the implementation is -straightforward, involving mcsrtombs for the conversions between char -to wchar_t and wcsrtombs for conversions between wchar_t and char. -</p><p> -Neither of these two required specializations deals with Unicode -characters. -</p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.ctype.future"></a>Future</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> - How to deal with the global locale issue? - </p></li><li class="listitem"><p> - How to deal with different types than char, wchar_t? </p></li><li class="listitem"><p> - Overlap between codecvt/ctype: narrow/widen - </p></li><li class="listitem"><p> - Mask typedef in codecvt_base, argument types in codecvt. what - is know about this type? - </p></li><li class="listitem"><p> - Why mask* argument in codecvt? - </p></li><li class="listitem"><p> - Can this be made (more) generic? is there a simple way to - straighten out the configure-time mess that is a by-product of - this class? - </p></li><li class="listitem"><p> - Get the ctype<wchar_t>::mask stuff under control. Need to - make some kind of static table, and not do lookup every time - somebody hits the do_is... functions. Too bad we can't just - redefine mask for ctype<wchar_t> - </p></li><li class="listitem"><p> - Rename abstract base class. See if just smash-overriding is a - better approach. Clarify, add sanity to naming. - </p></li></ul></div></div><div class="bibliography"><div class="titlepage"><div><div><h4 class="title"><a id="facet.ctype.biblio"></a>Bibliography</h4></div></div></div><div class="biblioentry"><a id="idp15793392"></a><p><span class="citetitle"><em class="citetitle"> - The GNU C Library - </em>. </span><span class="author"><span class="firstname">Roland</span> <span class="surname">McGrath</span>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2007 FSF. </span><span class="pagenums">Chapters 6 Character Set Handling and 7 Locales and Internationalization. </span></p></div><div class="biblioentry"><a id="idp15798144"></a><p><span class="citetitle"><em class="citetitle"> - Correspondence - </em>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2002 . </span></p></div><div class="biblioentry"><a id="idp15801232"></a><p><span class="citetitle"><em class="citetitle"> - ISO/IEC 14882:1998 Programming languages - C++ - </em>. </span><span class="copyright">Copyright © 1998 ISO. </span></p></div><div class="biblioentry"><a id="idp15803520"></a><p><span class="citetitle"><em class="citetitle"> - ISO/IEC 9899:1999 Programming languages - C - </em>. </span><span class="copyright">Copyright © 1999 ISO. </span></p></div><div class="biblioentry"><a id="idp15805792"></a><p><span class="title"><em> - <a class="link" href="http://www.unix.org/version3/ieee_std.html" target="_top"> - The Open Group Base Specifications, Issue 6 (IEEE Std. 1003.1-2004) - </a> - </em>. </span><span class="copyright">Copyright © 1999 - The Open Group/The Institute of Electrical and Electronics Engineers, Inc.. </span></p></div><div class="biblioentry"><a id="idp15809040"></a><p><span class="citetitle"><em class="citetitle"> - The C++ Programming Language, Special Edition - </em>. </span><span class="author"><span class="firstname">Bjarne</span> <span class="surname">Stroustrup</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley, Inc.. </span><span class="pagenums">Appendix D. </span><span class="publisher"><span class="publishername"> - Addison Wesley - . </span></span></p></div><div class="biblioentry"><a id="idp15813664"></a><p><span class="citetitle"><em class="citetitle"> - Standard C++ IOStreams and Locales - </em>. </span><span class="subtitle"> - Advanced Programmer's Guide and Reference - . </span><span class="author"><span class="firstname">Angelika</span> <span class="surname">Langer</span>. </span><span class="author"><span class="firstname">Klaus</span> <span class="surname">Kreft</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley Longman, Inc.. </span><span class="publisher"><span class="publishername"> - Addison Wesley Longman - . </span></span></p></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.localization.facet.codecvt"></a>codecvt</h3></div></div></div><p> -The standard class codecvt attempts to address conversions between -different character encoding schemes. In particular, the standard -attempts to detail conversions between the implementation-defined wide -characters (hereafter referred to as wchar_t) and the standard type -char that is so beloved in classic <span class="quote">“<span class="quote">C</span>”</span> (which can now be -referred to as narrow characters.) This document attempts to describe -how the GNU libstdc++ implementation deals with the conversion between -wide and narrow characters, and also presents a framework for dealing -with the huge number of other encodings that iconv can convert, -including Unicode and UTF8. Design issues and requirements are -addressed, and examples of correct usage for both the required -specializations for wide and narrow characters and the -implementation-provided extended functionality are given. -</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.req"></a>Requirements</h4></div></div></div><p> -Around page 425 of the C++ Standard, this charming heading comes into view: -</p><div class="blockquote"><blockquote class="blockquote"><p> -22.2.1.5 - Template class codecvt -</p></blockquote></div><p> -The text around the codecvt definition gives some clues: -</p><div class="blockquote"><blockquote class="blockquote"><p> -<span class="emphasis"><em> --1- The class codecvt<internT,externT,stateT> is for use when -converting from one codeset to another, such as from wide characters -to multibyte characters, between wide character encodings such as -Unicode and EUC. -</em></span> -</p></blockquote></div><p> -Hmm. So, in some unspecified way, Unicode encodings and -translations between other character sets should be handled by this -class. -</p><div class="blockquote"><blockquote class="blockquote"><p> -<span class="emphasis"><em> --2- The stateT argument selects the pair of codesets being mapped between. -</em></span> -</p></blockquote></div><p> -Ah ha! Another clue... -</p><div class="blockquote"><blockquote class="blockquote"><p> -<span class="emphasis"><em> --3- The instantiations required in the Table ?? -(lib.locale.category), namely codecvt<wchar_t,char,mbstate_t> and -codecvt<char,char,mbstate_t>, convert the implementation-defined -native character set. codecvt<char,char,mbstate_t> implements a -degenerate conversion; it does not convert at -all. codecvt<wchar_t,char,mbstate_t> converts between the native -character sets for tiny and wide characters. Instantiations on -mbstate_t perform conversion between encodings known to the library -implementor. Other encodings can be converted by specializing on a -user-defined stateT type. The stateT object can contain any state that -is useful to communicate to or from the specialized do_convert member. -</em></span> -</p></blockquote></div><p> -At this point, a couple points become clear: -</p><p> -One: The standard clearly implies that attempts to add non-required -(yet useful and widely used) conversions need to do so through the -third template parameter, stateT.</p><p> -Two: The required conversions, by specifying mbstate_t as the third -template parameter, imply an implementation strategy that is mostly -(or wholly) based on the underlying C library, and the functions -mcsrtombs and wcsrtombs in particular.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.design"></a>Design</h4></div></div></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="codecvt.design.wchar_t_size"></a><span class="type">wchar_t</span> Size</h5></div></div></div><p> - The simple implementation detail of wchar_t's size seems to - repeatedly confound people. Many systems use a two byte, - unsigned integral type to represent wide characters, and use an - internal encoding of Unicode or UCS2. (See AIX, Microsoft NT, - Java, others.) Other systems, use a four byte, unsigned integral - type to represent wide characters, and use an internal encoding - of UCS4. (GNU/Linux systems using glibc, in particular.) The C - programming language (and thus C++) does not specify a specific - size for the type wchar_t. - </p><p> - Thus, portable C++ code cannot assume a byte size (or endianness) either. - </p></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="codecvt.design.unicode"></a>Support for Unicode</h5></div></div></div><p> - Probably the most frequently asked question about code conversion - is: "So dudes, what's the deal with Unicode strings?" - The dude part is optional, but apparently the usefulness of - Unicode strings is pretty widely appreciated. Sadly, this specific - encoding (And other useful encodings like UTF8, UCS4, ISO 8859-10, - etc etc etc) are not mentioned in the C++ standard. - </p><p> - A couple of comments: - </p><p> - The thought that all one needs to convert between two arbitrary - codesets is two types and some kind of state argument is - unfortunate. In particular, encodings may be stateless. The naming - of the third parameter as stateT is unfortunate, as what is really - needed is some kind of generalized type that accounts for the - issues that abstract encodings will need. The minimum information - that is required includes: - </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> - Identifiers for each of the codesets involved in the - conversion. For example, using the iconv family of functions - from the Single Unix Specification (what used to be called - X/Open) hosted on the GNU/Linux operating system allows - bi-directional mapping between far more than the following - tantalizing possibilities: - </p><p> - (An edited list taken from <code class="code">`iconv --list`</code> on a - Red Hat 6.2/Intel system: - </p><div class="blockquote"><blockquote class="blockquote"><pre class="programlisting"> -8859_1, 8859_9, 10646-1:1993, 10646-1:1993/UCS4, ARABIC, ARABIC7, -ASCII, EUC-CN, EUC-JP, EUC-KR, EUC-TW, GREEK-CCIcode, GREEK, GREEK7-OLD, -GREEK7, GREEK8, HEBREW, ISO-8859-1, ISO-8859-2, ISO-8859-3, -ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, -ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14, -ISO-8859-15, ISO-10646, ISO-10646/UCS2, ISO-10646/UCS4, -ISO-10646/UTF-8, ISO-10646/UTF8, SHIFT-JIS, SHIFT_JIS, UCS-2, UCS-4, -UCS2, UCS4, UNICODE, UNICODEBIG, UNICODELIcodeLE, US-ASCII, US, UTF-8, -UTF-16, UTF8, UTF16). -</pre></blockquote></div><p> -For iconv-based implementations, string literals for each of the -encodings (i.e. "UCS-2" and "UTF-8") are necessary, -although for other, -non-iconv implementations a table of enumerated values or some other -mechanism may be required. -</p></li><li class="listitem"><p> - Maximum length of the identifying string literal. -</p></li><li class="listitem"><p> - Some encodings require explicit endian-ness. As such, some kind - of endian marker or other byte-order marker will be necessary. See - "Footnotes for C/C++ developers" in Haible for more information on - UCS-2/Unicode endian issues. (Summary: big endian seems most likely, - however implementations, most notably Microsoft, vary.) -</p></li><li class="listitem"><p> - Types representing the conversion state, for conversions involving - the machinery in the "C" library, or the conversion descriptor, for - conversions using iconv (such as the type iconv_t.) Note that the - conversion descriptor encodes more information than a simple encoding - state type. -</p></li><li class="listitem"><p> - Conversion descriptors for both directions of encoding. (i.e., both - UCS-2 to UTF-8 and UTF-8 to UCS-2.) -</p></li><li class="listitem"><p> - Something to indicate if the conversion requested if valid. -</p></li><li class="listitem"><p> - Something to represent if the conversion descriptors are valid. -</p></li><li class="listitem"><p> - Some way to enforce strict type checking on the internal and - external types. As part of this, the size of the internal and - external types will need to be known. -</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="codecvt.design.issues"></a>Other Issues</h5></div></div></div><p> -In addition, multi-threaded and multi-locale environments also impact -the design and requirements for code conversions. In particular, they -affect the required specialization codecvt<wchar_t, char, mbstate_t> -when implemented using standard "C" functions. -</p><p> -Three problems arise, one big, one of medium importance, and one small. -</p><p> -First, the small: mcsrtombs and wcsrtombs may not be multithread-safe -on all systems required by the GNU tools. For GNU/Linux and glibc, -this is not an issue. -</p><p> -Of medium concern, in the grand scope of things, is that the functions -used to implement this specialization work on null-terminated -strings. Buffers, especially file buffers, may not be null-terminated, -thus giving conversions that end prematurely or are otherwise -incorrect. Yikes! -</p><p> -The last, and fundamental problem, is the assumption of a global -locale for all the "C" functions referenced above. For something like -C++ iostreams (where codecvt is explicitly used) the notion of -multiple locales is fundamental. In practice, most users may not run -into this limitation. However, as a quality of implementation issue, -the GNU C++ library would like to offer a solution that allows -multiple locales and or simultaneous usage with computationally -correct results. In short, libstdc++ is trying to offer, as an -option, a high-quality implementation, damn the additional complexity! -</p><p> -For the required specialization codecvt<wchar_t, char, mbstate_t> , -conversions are made between the internal character set (always UCS4 -on GNU/Linux) and whatever the currently selected locale for the -LC_CTYPE category implements. -</p></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.impl"></a>Implementation</h4></div></div></div><p> -The two required specializations are implemented as follows: -</p><p> -<code class="code"> -codecvt<char, char, mbstate_t> -</code> -</p><p> -This is a degenerate (i.e., does nothing) specialization. Implementing -this was a piece of cake. -</p><p> -<code class="code"> -codecvt<char, wchar_t, mbstate_t> -</code> -</p><p> -This specialization, by specifying all the template parameters, pretty -much ties the hands of implementors. As such, the implementation is -straightforward, involving mcsrtombs for the conversions between char -to wchar_t and wcsrtombs for conversions between wchar_t and char. -</p><p> -Neither of these two required specializations deals with Unicode -characters. As such, libstdc++ implements a partial specialization -of the codecvt class with and iconv wrapper class, encoding_state as the -third template parameter. -</p><p> -This implementation should be standards conformant. First of all, the -standard explicitly points out that instantiations on the third -template parameter, stateT, are the proper way to implement -non-required conversions. Second of all, the standard says (in Chapter -17) that partial specializations of required classes are a-ok. Third -of all, the requirements for the stateT type elsewhere in the standard -(see 21.1.2 traits typedefs) only indicate that this type be copy -constructible. -</p><p> -As such, the type encoding_state is defined as a non-templatized, POD -type to be used as the third type of a codecvt instantiation. This -type is just a wrapper class for iconv, and provides an easy interface -to iconv functionality. -</p><p> -There are two constructors for encoding_state: -</p><p> -<code class="code"> -encoding_state() : __in_desc(0), __out_desc(0) -</code> -</p><p> -This default constructor sets the internal encoding to some default -(currently UCS4) and the external encoding to whatever is returned by -nl_langinfo(CODESET). -</p><p> -<code class="code"> -encoding_state(const char* __int, const char* __ext) -</code> -</p><p> -This constructor takes as parameters string literals that indicate the -desired internal and external encoding. There are no defaults for -either argument. -</p><p> -One of the issues with iconv is that the string literals identifying -conversions are not standardized. Because of this, the thought of -mandating and or enforcing some set of pre-determined valid -identifiers seems iffy: thus, a more practical (and non-migraine -inducing) strategy was implemented: end-users can specify any string -(subject to a pre-determined length qualifier, currently 32 bytes) for -encodings. It is up to the user to make sure that these strings are -valid on the target system. -</p><p> -<code class="code"> -void -_M_init() -</code> -</p><p> -Strangely enough, this member function attempts to open conversion -descriptors for a given encoding_state object. If the conversion -descriptors are not valid, the conversion descriptors returned will -not be valid and the resulting calls to the codecvt conversion -functions will return error. -</p><p> -<code class="code"> -bool -_M_good() -</code> -</p><p> -Provides a way to see if the given encoding_state object has been -properly initialized. If the string literals describing the desired -internal and external encoding are not valid, initialization will -fail, and this will return false. If the internal and external -encodings are valid, but iconv_open could not allocate conversion -descriptors, this will also return false. Otherwise, the object is -ready to convert and will return true. -</p><p> -<code class="code"> -encoding_state(const encoding_state&) -</code> -</p><p> -As iconv allocates memory and sets up conversion descriptors, the copy -constructor can only copy the member data pertaining to the internal -and external code conversions, and not the conversion descriptors -themselves. -</p><p> -Definitions for all the required codecvt member functions are provided -for this specialization, and usage of codecvt<internal character type, -external character type, encoding_state> is consistent with other -codecvt usage. -</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.use"></a>Use</h4></div></div></div><p>A conversions involving string literal.</p><pre class="programlisting"> - typedef codecvt_base::result result; - typedef unsigned short unicode_t; - typedef unicode_t int_type; - typedef char ext_type; - typedef encoding_state state_type; - typedef codecvt<int_type, ext_type, state_type> unicode_codecvt; - - const ext_type* e_lit = "black pearl jasmine tea"; - int size = strlen(e_lit); - int_type i_lit_base[24] = - { 25088, 27648, 24832, 25344, 27392, 8192, 28672, 25856, 24832, 29184, - 27648, 8192, 27136, 24832, 29440, 27904, 26880, 28160, 25856, 8192, 29696, - 25856, 24832, 2560 - }; - const int_type* i_lit = i_lit_base; - const ext_type* efrom_next; - const int_type* ifrom_next; - ext_type* e_arr = new ext_type[size + 1]; - ext_type* eto_next; - int_type* i_arr = new int_type[size + 1]; - int_type* ito_next; - - // construct a locale object with the specialized facet. - locale loc(locale::classic(), new unicode_codecvt); - // sanity check the constructed locale has the specialized facet. - VERIFY( has_facet<unicode_codecvt>(loc) ); - const unicode_codecvt& cvt = use_facet<unicode_codecvt>(loc); - // convert between const char* and unicode strings - unicode_codecvt::state_type state01("UNICODE", "ISO_8859-1"); - initialize_state(state01); - result r1 = cvt.in(state01, e_lit, e_lit + size, efrom_next, - i_arr, i_arr + size, ito_next); - VERIFY( r1 == codecvt_base::ok ); - VERIFY( !int_traits::compare(i_arr, i_lit, size) ); - VERIFY( efrom_next == e_lit + size ); - VERIFY( ito_next == i_arr + size ); -</pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.future"></a>Future</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> - a. things that are sketchy, or remain unimplemented: - do_encoding, max_length and length member functions - are only weakly implemented. I have no idea how to do - this correctly, and in a generic manner. Nathan? -</p></li><li class="listitem"><p> - b. conversions involving std::string - </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p> - how should operators != and == work for string of - different/same encoding? - </p></li><li class="listitem"><p> - what is equal? A byte by byte comparison or an - encoding then byte comparison? - </p></li><li class="listitem"><p> - conversions between narrow, wide, and unicode strings - </p></li></ul></div></li><li class="listitem"><p> - c. conversions involving std::filebuf and std::ostream -</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p> - how to initialize the state object in a - standards-conformant manner? - </p></li><li class="listitem"><p> - how to synchronize the "C" and "C++" - conversion information? - </p></li><li class="listitem"><p> - wchar_t/char internal buffers and conversions between - internal/external buffers? - </p></li></ul></div></li></ul></div></div><div class="bibliography"><div class="titlepage"><div><div><h4 class="title"><a id="facet.codecvt.biblio"></a>Bibliography</h4></div></div></div><div class="biblioentry"><a id="idp15891136"></a><p><span class="citetitle"><em class="citetitle"> - The GNU C Library - </em>. </span><span class="author"><span class="firstname">Roland</span> <span class="surname">McGrath</span>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2007 FSF. </span><span class="pagenums"> - Chapters 6 Character Set Handling and 7 Locales and Internationalization - . </span></p></div><div class="biblioentry"><a id="idp15895888"></a><p><span class="citetitle"><em class="citetitle"> - Correspondence - </em>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2002 . </span></p></div><div class="biblioentry"><a id="idp15898976"></a><p><span class="citetitle"><em class="citetitle"> - ISO/IEC 14882:1998 Programming languages - C++ - </em>. </span><span class="copyright">Copyright © 1998 ISO. </span></p></div><div class="biblioentry"><a id="idp15901264"></a><p><span class="citetitle"><em class="citetitle"> - ISO/IEC 9899:1999 Programming languages - C - </em>. </span><span class="copyright">Copyright © 1999 ISO. </span></p></div><div class="biblioentry"><a id="idp15903536"></a><p><span class="title"><em> - <a class="link" href="http://www.opengroup.org/austin/" target="_top"> - System Interface Definitions, Issue 7 (IEEE Std. 1003.1-2008) - </a> - </em>. </span><span class="copyright">Copyright © 2008 - The Open Group/The Institute of Electrical and Electronics - Engineers, Inc. - . </span></p></div><div class="biblioentry"><a id="idp15906768"></a><p><span class="citetitle"><em class="citetitle"> - The C++ Programming Language, Special Edition - </em>. </span><span class="author"><span class="firstname">Bjarne</span> <span class="surname">Stroustrup</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley, Inc.. </span><span class="pagenums">Appendix D. </span><span class="publisher"><span class="publishername"> - Addison Wesley - . </span></span></p></div><div class="biblioentry"><a id="idp15911392"></a><p><span class="citetitle"><em class="citetitle"> - Standard C++ IOStreams and Locales - </em>. </span><span class="subtitle"> - Advanced Programmer's Guide and Reference - . </span><span class="author"><span class="firstname">Angelika</span> <span class="surname">Langer</span>. </span><span class="author"><span class="firstname">Klaus</span> <span class="surname">Kreft</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley Longman, Inc.. </span><span class="publisher"><span class="publishername"> - Addison Wesley Longman - . </span></span></p></div><div class="biblioentry"><a id="idp15917056"></a><p><span class="title"><em> - <a class="link" href="http://www.lysator.liu.se/c/na1.html" target="_top"> - A brief description of Normative Addendum 1 - </a> - </em>. </span><span class="author"><span class="firstname">Clive</span> <span class="surname">Feather</span>. </span><span class="pagenums">Extended Character Sets. </span></p></div><div class="biblioentry"><a id="idp15920304"></a><p><span class="title"><em> - <a class="link" href="http://tldp.org/HOWTO/Unicode-HOWTO.html" target="_top"> - The Unicode HOWTO - </a> - </em>. </span><span class="author"><span class="firstname">Bruno</span> <span class="surname">Haible</span>. </span></p></div><div class="biblioentry"><a id="idp15923088"></a><p><span class="title"><em> - <a class="link" href="http://www.cl.cam.ac.uk/~mgk25/unicode.html" target="_top"> - UTF-8 and Unicode FAQ for Unix/Linux - </a> - </em>. </span><span class="author"><span class="firstname">Markus</span> <span class="surname">Khun</span>. </span></p></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="manual.localization.facet.messages"></a>messages</h3></div></div></div><p> -The std::messages facet implements message retrieval functionality -equivalent to Java's java.text.MessageFormat .using either GNU gettext -or IEEE 1003.1-200 functions. -</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.req"></a>Requirements</h4></div></div></div><p> -The std::messages facet is probably the most vaguely defined facet in -the standard library. It's assumed that this facility was built into -the standard library in order to convert string literals from one -locale to the other. For instance, converting the "C" locale's -<code class="code">const char* c = "please"</code> to a German-localized <code class="code">"bitte"</code> -during program execution. -</p><div class="blockquote"><blockquote class="blockquote"><p> -22.2.7.1 - Template class messages [lib.locale.messages] -</p></blockquote></div><p> -This class has three public member functions, which directly -correspond to three protected virtual member functions. -</p><p> -The public member functions are: -</p><p> -<code class="code">catalog open(const string&, const locale&) const</code> -</p><p> -<code class="code">string_type get(catalog, int, int, const string_type&) const</code> -</p><p> -<code class="code">void close(catalog) const</code> -</p><p> -While the virtual functions are: -</p><p> -<code class="code">catalog do_open(const string&, const locale&) const</code> -</p><div class="blockquote"><blockquote class="blockquote"><p> -<span class="emphasis"><em> --1- Returns: A value that may be passed to get() to retrieve a -message, from the message catalog identified by the string name -according to an implementation-defined mapping. The result can be used -until it is passed to close(). Returns a value less than 0 if no such -catalog can be opened. -</em></span> -</p></blockquote></div><p> -<code class="code">string_type do_get(catalog, int, int, const string_type&) const</code> -</p><div class="blockquote"><blockquote class="blockquote"><p> -<span class="emphasis"><em> --3- Requires: A catalog cat obtained from open() and not yet closed. --4- Returns: A message identified by arguments set, msgid, and dfault, -according to an implementation-defined mapping. If no such message can -be found, returns dfault. -</em></span> -</p></blockquote></div><p> -<code class="code">void do_close(catalog) const</code> -</p><div class="blockquote"><blockquote class="blockquote"><p> -<span class="emphasis"><em> --5- Requires: A catalog cat obtained from open() and not yet closed. --6- Effects: Releases unspecified resources associated with cat. --7- Notes: The limit on such resources, if any, is implementation-defined. -</em></span> -</p></blockquote></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.design"></a>Design</h4></div></div></div><p> -A couple of notes on the standard. -</p><p> -First, why is <code class="code">messages_base::catalog</code> specified as a typedef -to int? This makes sense for implementations that use -<code class="code">catopen</code> and define <code class="code">nl_catd</code> as int, but not for -others. Fortunately, it's not heavily used and so only a minor irritant. -This has been reported as a possible defect in the standard (LWG 2028). -</p><p> -Second, by making the member functions <code class="code">const</code>, it is -impossible to save state in them. Thus, storing away information used -in the 'open' member function for use in 'get' is impossible. This is -unfortunate. -</p><p> -The 'open' member function in particular seems to be oddly -designed. The signature seems quite peculiar. Why specify a <code class="code">const -string& </code> argument, for instance, instead of just <code class="code">const -char*</code>? Or, why specify a <code class="code">const locale&</code> argument that is -to be used in the 'get' member function? How, exactly, is this locale -argument useful? What was the intent? It might make sense if a locale -argument was associated with a given default message string in the -'open' member function, for instance. Quite murky and unclear, on -reflection. -</p><p> -Lastly, it seems odd that messages, which explicitly require code -conversion, don't use the codecvt facet. Because the messages facet -has only one template parameter, it is assumed that ctype, and not -codecvt, is to be used to convert between character sets. -</p><p> -It is implicitly assumed that the locale for the default message -string in 'get' is in the "C" locale. Thus, all source code is assumed -to be written in English, so translations are always from "en_US" to -other, explicitly named locales. -</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.impl"></a>Implementation</h4></div></div></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="messages.impl.models"></a>Models</h5></div></div></div><p> - This is a relatively simple class, on the face of it. The standard - specifies very little in concrete terms, so generic - implementations that are conforming yet do very little are the - norm. Adding functionality that would be useful to programmers and - comparable to Java's java.text.MessageFormat takes a bit of work, - and is highly dependent on the capabilities of the underlying - operating system. - </p><p> - Three different mechanisms have been provided, selectable via - configure flags: - </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> - generic - </p><p> - This model does very little, and is what is used by default. - </p></li><li class="listitem"><p> - gnu - </p><p> - The gnu model is complete and fully tested. It's based on the - GNU gettext package, which is part of glibc. It uses the - functions <code class="code">textdomain, bindtextdomain, gettext</code> to - implement full functionality. Creating message catalogs is a - relatively straight-forward process and is lightly documented - below, and fully documented in gettext's distributed - documentation. - </p></li><li class="listitem"><p> - ieee_1003.1-200x - </p><p> - This is a complete, though untested, implementation based on - the IEEE standard. The functions <code class="code">catopen, catgets, - catclose</code> are used to retrieve locale-specific messages - given the appropriate message catalogs that have been - constructed for their use. Note, the script <code class="code"> - po2msg.sed</code> that is part of the gettext distribution can - convert gettext catalogs into catalogs that - <code class="code">catopen</code> can use. - </p></li></ul></div><p> -A new, standards-conformant non-virtual member function signature was -added for 'open' so that a directory could be specified with a given -message catalog. This simplifies calling conventions for the gnu -model. -</p></div><div class="section"><div class="titlepage"><div><div><h5 class="title"><a id="messages.impl.gnu"></a>The GNU Model</h5></div></div></div><p> - The messages facet, because it is retrieving and converting - between characters sets, depends on the ctype and perhaps the - codecvt facet in a given locale. In addition, underlying "C" - library locale support is necessary for more than just the - <code class="code">LC_MESSAGES</code> mask: <code class="code">LC_CTYPE</code> is also - necessary. To avoid any unpleasantness, all bits of the "C" mask - (i.e. <code class="code">LC_ALL</code>) are set before retrieving messages. - </p><p> - Making the message catalogs can be initially tricky, but become - quite simple with practice. For complete info, see the gettext - documentation. Here's an idea of what is required: - </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> - Make a source file with the required string literals that need - to be translated. See <code class="code">intl/string_literals.cc</code> for - an example. - </p></li><li class="listitem"><p> - Make initial catalog (see "4 Making the PO Template File" from - the gettext docs).</p><p> - <code class="code"> xgettext --c++ --debug string_literals.cc -o libstdc++.pot </code> - </p></li><li class="listitem"><p>Make language and country-specific locale catalogs.</p><p> - <code class="code">cp libstdc++.pot fr_FR.po</code> - </p><p> - <code class="code">cp libstdc++.pot de_DE.po</code> - </p></li><li class="listitem"><p> - Edit localized catalogs in emacs so that strings are - translated. - </p><p> - <code class="code">emacs fr_FR.po</code> - </p></li><li class="listitem"><p>Make the binary mo files.</p><p> - <code class="code">msgfmt fr_FR.po -o fr_FR.mo</code> - </p><p> - <code class="code">msgfmt de_DE.po -o de_DE.mo</code> - </p></li><li class="listitem"><p>Copy the binary files into the correct directory structure.</p><p> - <code class="code">cp fr_FR.mo (dir)/fr_FR/LC_MESSAGES/libstdc++.mo</code> - </p><p> - <code class="code">cp de_DE.mo (dir)/de_DE/LC_MESSAGES/libstdc++.mo</code> - </p></li><li class="listitem"><p>Use the new message catalogs.</p><p> - <code class="code">locale loc_de("de_DE");</code> - </p><p> - <code class="code"> - use_facet<messages<char> >(loc_de).open("libstdc++", locale(), dir); - </code> - </p></li></ul></div></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.use"></a>Use</h4></div></div></div><p> - A simple example using the GNU model of message conversion. - </p><pre class="programlisting"> -#include <iostream> -#include <locale> -using namespace std; - -void test01() -{ - typedef messages<char>::catalog catalog; - const char* dir = - "/mnt/egcs/build/i686-pc-linux-gnu/libstdc++/po/share/locale"; - const locale loc_de("de_DE"); - const messages<char>& mssg_de = use_facet<messages<char> >(loc_de); - - catalog cat_de = mssg_de.open("libstdc++", loc_de, dir); - string s01 = mssg_de.get(cat_de, 0, 0, "please"); - string s02 = mssg_de.get(cat_de, 0, 0, "thank you"); - cout << "please in german:" << s01 << '\n'; - cout << "thank you in german:" << s02 << '\n'; - mssg_de.close(cat_de); -} -</pre></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.future"></a>Future</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> - Things that are sketchy, or remain unimplemented: - </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p> - _M_convert_from_char, _M_convert_to_char are in flux, - depending on how the library ends up doing character set - conversions. It might not be possible to do a real character - set based conversion, due to the fact that the template - parameter for messages is not enough to instantiate the - codecvt facet (1 supplied, need at least 2 but would prefer - 3). - </p></li><li class="listitem"><p> - There are issues with gettext needing the global locale set - to extract a message. This dependence on the global locale - makes the current "gnu" model non MT-safe. Future versions - of glibc, i.e. glibc 2.3.x will fix this, and the C++ library - bits are already in place. - </p></li></ul></div></li><li class="listitem"><p> - Development versions of the GNU "C" library, glibc 2.3 will allow - a more efficient, MT implementation of std::messages, and will - allow the removal of the _M_name_messages data member. If this is - done, it will change the library ABI. The C++ parts to support - glibc 2.3 have already been coded, but are not in use: once this - version of the "C" library is released, the marked parts of the - messages implementation can be switched over to the new "C" - library functionality. - </p></li><li class="listitem"><p> - At some point in the near future, std::numpunct will probably use - std::messages facilities to implement truename/falsename - correctly. This is currently not done, but entries in - libstdc++.pot have already been made for "true" and "false" string - literals, so all that remains is the std::numpunct coding and the - configure/make hassles to make the installed library search its - own catalog. Currently the libstdc++.mo catalog is only searched - for the testsuite cases involving messages members. - </p></li><li class="listitem"><p> The following member functions:</p><p> - <code class="code"> - catalog - open(const basic_string<char>& __s, const locale& __loc) const - </code> - </p><p> - <code class="code"> - catalog - open(const basic_string<char>&, const locale&, const char*) const; - </code> - </p><p> - Don't actually return a "value less than 0 if no such catalog - can be opened" as required by the standard in the "gnu" - model. As of this writing, it is unknown how to query to see - if a specified message catalog exists using the gettext - package. - </p></li></ul></div></div><div class="bibliography"><div class="titlepage"><div><div><h4 class="title"><a id="facet.messages.biblio"></a>Bibliography</h4></div></div></div><div class="biblioentry"><a id="idp16003632"></a><p><span class="citetitle"><em class="citetitle"> - The GNU C Library - </em>. </span><span class="author"><span class="firstname">Roland</span> <span class="surname">McGrath</span>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2007 FSF. </span><span class="pagenums">Chapters 6 Character Set Handling, and 7 Locales and Internationalization - . </span></p></div><div class="biblioentry"><a id="idp16008384"></a><p><span class="citetitle"><em class="citetitle"> - Correspondence - </em>. </span><span class="author"><span class="firstname">Ulrich</span> <span class="surname">Drepper</span>. </span><span class="copyright">Copyright © 2002 . </span></p></div><div class="biblioentry"><a id="idp16011472"></a><p><span class="citetitle"><em class="citetitle"> - ISO/IEC 14882:1998 Programming languages - C++ - </em>. </span><span class="copyright">Copyright © 1998 ISO. </span></p></div><div class="biblioentry"><a id="idp16013760"></a><p><span class="citetitle"><em class="citetitle"> - ISO/IEC 9899:1999 Programming languages - C - </em>. </span><span class="copyright">Copyright © 1999 ISO. </span></p></div><div class="biblioentry"><a id="idp16016032"></a><p><span class="title"><em> - <a class="link" href="http://www.opengroup.org/austin/" target="_top"> - System Interface Definitions, Issue 7 (IEEE Std. 1003.1-2008) - </a> - </em>. </span><span class="copyright">Copyright © 2008 - The Open Group/The Institute of Electrical and Electronics - Engineers, Inc. - . </span></p></div><div class="biblioentry"><a id="idp16019264"></a><p><span class="citetitle"><em class="citetitle"> - The C++ Programming Language, Special Edition - </em>. </span><span class="author"><span class="firstname">Bjarne</span> <span class="surname">Stroustrup</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley, Inc.. </span><span class="pagenums">Appendix D. </span><span class="publisher"><span class="publishername"> - Addison Wesley - . </span></span></p></div><div class="biblioentry"><a id="idp16023888"></a><p><span class="citetitle"><em class="citetitle"> - Standard C++ IOStreams and Locales - </em>. </span><span class="subtitle"> - Advanced Programmer's Guide and Reference - . </span><span class="author"><span class="firstname">Angelika</span> <span class="surname">Langer</span>. </span><span class="author"><span class="firstname">Klaus</span> <span class="surname">Kreft</span>. </span><span class="copyright">Copyright © 2000 Addison Wesley Longman, Inc.. </span><span class="publisher"><span class="publishername"> - Addison Wesley Longman - . </span></span></p></div><div class="biblioentry"><a id="idp16029552"></a><p><span class="title"><em> - <a class="link" href="http://java.sun.com/reference/api/index.html" target="_top"> - API Specifications, Java Platform - </a> - </em>. </span><span class="pagenums">java.util.Properties, java.text.MessageFormat, -java.util.Locale, java.util.ResourceBundle - . </span></p></div><div class="biblioentry"><a id="idp16031888"></a><p><span class="title"><em> - <a class="link" href="https://www.gnu.org/software/gettext/" target="_top"> - GNU gettext tools, version 0.10.38, Native Language Support - Library and Tools. - </a> - </em>. </span></p></div></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="localization.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="localization.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="containers.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 8. - Localization - - </td><td width="20%" align="center"><a accesskey="h" href="../index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 9. - Containers - -</td></tr></table></div></body></html>
\ No newline at end of file |