diff options
Diffstat (limited to 'gcc-4.4.3/libstdc++-v3/doc/xml/manual/strings.xml')
-rw-r--r-- | gcc-4.4.3/libstdc++-v3/doc/xml/manual/strings.xml | 498 |
1 files changed, 0 insertions, 498 deletions
diff --git a/gcc-4.4.3/libstdc++-v3/doc/xml/manual/strings.xml b/gcc-4.4.3/libstdc++-v3/doc/xml/manual/strings.xml deleted file mode 100644 index 2ea3da20e..000000000 --- a/gcc-4.4.3/libstdc++-v3/doc/xml/manual/strings.xml +++ /dev/null @@ -1,498 +0,0 @@ -<?xml version='1.0'?> -<!DOCTYPE part PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" - "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" -[ ]> - -<part id="manual.strings" xreflabel="Strings"> -<?dbhtml filename="strings.html"?> - -<partinfo> - <keywordset> - <keyword> - ISO C++ - </keyword> - <keyword> - library - </keyword> - </keywordset> -</partinfo> - -<title> - Strings - <indexterm><primary>Strings</primary></indexterm> -</title> - -<!-- Chapter 01 : Character Traits --> - -<!-- Chapter 02 : String Classes --> -<chapter id="manual.strings.string" xreflabel="string"> - <title>String Classes</title> - - <sect1 id="strings.string.simple" xreflabel="Simple Transformations"> - <title>Simple Transformations</title> - <para> - Here are Standard, simple, and portable ways to perform common - transformations on a <code>string</code> instance, such as - "convert to all upper case." The word transformations - is especially apt, because the standard template function - <code>transform<></code> is used. - </para> - <para> - This code will go through some iterations. Here's a simple - version: - </para> - <programlisting> - #include <string> - #include <algorithm> - #include <cctype> // old <ctype.h> - - struct ToLower - { - char operator() (char c) const { return std::tolower(c); } - }; - - struct ToUpper - { - char operator() (char c) const { return std::toupper(c); } - }; - - int main() - { - std::string s ("Some Kind Of Initial Input Goes Here"); - - // Change everything into upper case - std::transform (s.begin(), s.end(), s.begin(), ToUpper()); - - // Change everything into lower case - std::transform (s.begin(), s.end(), s.begin(), ToLower()); - - // Change everything back into upper case, but store the - // result in a different string - std::string capital_s; - capital_s.resize(s.size()); - std::transform (s.begin(), s.end(), capital_s.begin(), ToUpper()); - } - </programlisting> - <para> - <emphasis>Note</emphasis> that these calls all - involve the global C locale through the use of the C functions - <code>toupper/tolower</code>. This is absolutely guaranteed to work -- - but <emphasis>only</emphasis> if the string contains <emphasis>only</emphasis> characters - from the basic source character set, and there are <emphasis>only</emphasis> - 96 of those. Which means that not even all English text can be - represented (certain British spellings, proper names, and so forth). - So, if all your input forevermore consists of only those 96 - characters (hahahahahaha), then you're done. - </para> - <para><emphasis>Note</emphasis> that the - <code>ToUpper</code> and <code>ToLower</code> function objects - are needed because <code>toupper</code> and <code>tolower</code> - are overloaded names (declared in <code><cctype></code> and - <code><locale></code>) so the template-arguments for - <code>transform<></code> cannot be deduced, as explained in - <ulink url="http://gcc.gnu.org/ml/libstdc++/2002-11/msg00180.html">this - message</ulink>. - <!-- section 14.8.2.4 clause 16 in ISO 14882:1998 --> - At minimum, you can write short wrappers like - </para> - <programlisting> - char toLower (char c) - { - return std::tolower(c); - } </programlisting> - <para>The correct method is to use a facet for a particular locale - and call its conversion functions. These are discussed more in - Chapter 22; the specific part is - <ulink url="../22_locale/howto.html#7">Correct Transformations</ulink>, - which shows the final version of this code. (Thanks to James Kanze - for assistance and suggestions on all of this.) - </para> - <para>Another common operation is trimming off excess whitespace. Much - like transformations, this task is trivial with the use of string's - <code>find</code> family. These examples are broken into multiple - statements for readability: - </para> - <programlisting> - std::string str (" \t blah blah blah \n "); - - // trim leading whitespace - string::size_type notwhite = str.find_first_not_of(" \t\n"); - str.erase(0,notwhite); - - // trim trailing whitespace - notwhite = str.find_last_not_of(" \t\n"); - str.erase(notwhite+1); </programlisting> - <para>Obviously, the calls to <code>find</code> could be inserted directly - into the calls to <code>erase</code>, in case your compiler does not - optimize named temporaries out of existence. - </para> - - </sect1> - <sect1 id="strings.string.case" xreflabel="Case Sensitivity"> - <title>Case Sensitivity</title> - <para> - </para> - - <para>The well-known-and-if-it-isn't-well-known-it-ought-to-be - <ulink url="http://www.gotw.ca/gotw/">Guru of the Week</ulink> - discussions held on Usenet covered this topic in January of 1998. - Briefly, the challenge was, <quote>write a 'ci_string' class which - is identical to the standard 'string' class, but is - case-insensitive in the same way as the (common but nonstandard) - C function stricmp()</quote>. - </para> - <programlisting> - ci_string s( "AbCdE" ); - - // case insensitive - assert( s == "abcde" ); - assert( s == "ABCDE" ); - - // still case-preserving, of course - assert( strcmp( s.c_str(), "AbCdE" ) == 0 ); - assert( strcmp( s.c_str(), "abcde" ) != 0 ); </programlisting> - - <para>The solution is surprisingly easy. The original answer was - posted on Usenet, and a revised version appears in Herb Sutter's - book <emphasis>Exceptional C++</emphasis> and on his website as <ulink url="http://www.gotw.ca/gotw/029.htm">GotW 29</ulink>. - </para> - <para>See? Told you it was easy!</para> - <para> - <emphasis>Added June 2000:</emphasis> The May 2000 issue of C++ - Report contains a fascinating <ulink - url="http://lafstern.org/matt/col2_new.pdf"> article</ulink> by - Matt Austern (yes, <emphasis>the</emphasis> Matt Austern) on why - case-insensitive comparisons are not as easy as they seem, and - why creating a class is the <emphasis>wrong</emphasis> way to go - about it in production code. (The GotW answer mentions one of - the principle difficulties; his article mentions more.) - </para> - <para>Basically, this is "easy" only if you ignore some things, - things which may be too important to your program to ignore. (I chose - to ignore them when originally writing this entry, and am surprised - that nobody ever called me on it...) The GotW question and answer - remain useful instructional tools, however. - </para> - <para><emphasis>Added September 2000:</emphasis> James Kanze provided a link to a - <ulink url="http://www.unicode.org/unicode/reports/tr21/">Unicode - Technical Report discussing case handling</ulink>, which provides some - very good information. - </para> - - </sect1> - <sect1 id="strings.string.character_types" xreflabel="Arbitrary Characters"> - <title>Arbitrary Character Types</title> - <para> - </para> - - <para>The <code>std::basic_string</code> is tantalizingly general, in that - it is parameterized on the type of the characters which it holds. - In theory, you could whip up a Unicode character class and instantiate - <code>std::basic_string<my_unicode_char></code>, or assuming - that integers are wider than characters on your platform, maybe just - declare variables of type <code>std::basic_string<int></code>. - </para> - <para>That's the theory. Remember however that basic_string has additional - type parameters, which take default arguments based on the character - type (called <code>CharT</code> here): - </para> - <programlisting> - template <typename CharT, - typename Traits = char_traits<CharT>, - typename Alloc = allocator<CharT> > - class basic_string { .... };</programlisting> - <para>Now, <code>allocator<CharT></code> will probably Do The Right - Thing by default, unless you need to implement your own allocator - for your characters. - </para> - <para>But <code>char_traits</code> takes more work. The char_traits - template is <emphasis>declared</emphasis> but not <emphasis>defined</emphasis>. - That means there is only - </para> - <programlisting> - template <typename CharT> - struct char_traits - { - static void foo (type1 x, type2 y); - ... - };</programlisting> - <para>and functions such as char_traits<CharT>::foo() are not - actually defined anywhere for the general case. The C++ standard - permits this, because writing such a definition to fit all possible - CharT's cannot be done. - </para> - <para>The C++ standard also requires that char_traits be specialized for - instantiations of <code>char</code> and <code>wchar_t</code>, and it - is these template specializations that permit entities like - <code>basic_string<char,char_traits<char>></code> to work. - </para> - <para>If you want to use character types other than char and wchar_t, - such as <code>unsigned char</code> and <code>int</code>, you will - need suitable specializations for them. For a time, in earlier - versions of GCC, there was a mostly-correct implementation that - let programmers be lazy but it broke under many situations, so it - was removed. GCC 3.4 introduced a new implementation that mostly - works and can be specialized even for <code>int</code> and other - built-in types. - </para> - <para>If you want to use your own special character class, then you have - <ulink url="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00163.html">a lot - of work to do</ulink>, especially if you with to use i18n features - (facets require traits information but don't have a traits argument). - </para> - <para>Another example of how to specialize char_traits was given <ulink url="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00260.html">on the - mailing list</ulink> and at a later date was put into the file <code> - include/ext/pod_char_traits.h</code>. We agree - that the way it's used with basic_string (scroll down to main()) - doesn't look nice, but that's because <ulink url="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00236.html">the - nice-looking first attempt</ulink> turned out to <ulink url="http://gcc.gnu.org/ml/libstdc++/2002-08/msg00242.html">not - be conforming C++</ulink>, due to the rule that CharT must be a POD. - (See how tricky this is?) - </para> - - </sect1> - - <sect1 id="strings.string.token" xreflabel="Tokenizing"> - <title>Tokenizing</title> - <para> - </para> - <para>The Standard C (and C++) function <code>strtok()</code> leaves a lot to - be desired in terms of user-friendliness. It's unintuitive, it - destroys the character string on which it operates, and it requires - you to handle all the memory problems. But it does let the client - code decide what to use to break the string into pieces; it allows - you to choose the "whitespace," so to speak. - </para> - <para>A C++ implementation lets us keep the good things and fix those - annoyances. The implementation here is more intuitive (you only - call it once, not in a loop with varying argument), it does not - affect the original string at all, and all the memory allocation - is handled for you. - </para> - <para>It's called stringtok, and it's a template function. Sources are - as below, in a less-portable form than it could be, to keep this - example simple (for example, see the comments on what kind of - string it will accept). - </para> - -<programlisting> -#include <string> -template <typename Container> -void -stringtok(Container &container, string const &in, - const char * const delimiters = " \t\n") -{ - const string::size_type len = in.length(); - string::size_type i = 0; - - while (i < len) - { - // Eat leading whitespace - i = in.find_first_not_of(delimiters, i); - if (i == string::npos) - return; // Nothing left but white space - - // Find the end of the token - string::size_type j = in.find_first_of(delimiters, i); - - // Push token - if (j == string::npos) - { - container.push_back(in.substr(i)); - return; - } - else - container.push_back(in.substr(i, j-i)); - - // Set up for next loop - i = j + 1; - } -} -</programlisting> - - - <para> - The author uses a more general (but less readable) form of it for - parsing command strings and the like. If you compiled and ran this - code using it: - </para> - - - <programlisting> - std::list<string> ls; - stringtok (ls, " this \t is\t\n a test "); - for (std::list<string>const_iterator i = ls.begin(); - i != ls.end(); ++i) - { - std::cerr << ':' << (*i) << ":\n"; - } </programlisting> - <para>You would see this as output: - </para> - <programlisting> - :this: - :is: - :a: - :test: </programlisting> - <para>with all the whitespace removed. The original <code>s</code> is still - available for use, <code>ls</code> will clean up after itself, and - <code>ls.size()</code> will return how many tokens there were. - </para> - <para>As always, there is a price paid here, in that stringtok is not - as fast as strtok. The other benefits usually outweigh that, however. - <ulink url="stringtok_std_h.txt">Another version of stringtok is given - here</ulink>, suggested by Chris King and tweaked by Petr Prikryl, - and this one uses the - transformation functions mentioned below. If you are comfortable - with reading the new function names, this version is recommended - as an example. - </para> - <para><emphasis>Added February 2001:</emphasis> Mark Wilden pointed out that the - standard <code>std::getline()</code> function can be used with standard - <ulink url="../27_io/howto.html">istringstreams</ulink> to perform - tokenizing as well. Build an istringstream from the input text, - and then use std::getline with varying delimiters (the three-argument - signature) to extract tokens into a string. - </para> - - - </sect1> - <sect1 id="strings.string.shrink" xreflabel="Shrink to Fit"> - <title>Shrink to Fit</title> - <para> - </para> - <para>From GCC 3.4 calling <code>s.reserve(res)</code> on a - <code>string s</code> with <code>res < s.capacity()</code> will - reduce the string's capacity to <code>std::max(s.size(), res)</code>. - </para> - <para>This behaviour is suggested, but not required by the standard. Prior - to GCC 3.4 the following alternative can be used instead - </para> - <programlisting> - std::string(str.data(), str.size()).swap(str); - </programlisting> - <para>This is similar to the idiom for reducing a <code>vector</code>'s - memory usage (see <ulink url='../faq/index.html#5_9'>FAQ 5.9</ulink>) but - the regular copy constructor cannot be used because libstdc++'s - <code>string</code> is Copy-On-Write. - </para> - - - </sect1> - - <sect1 id="strings.string.Cstring" xreflabel="CString (MFC)"> - <title>CString (MFC)</title> - <para> - </para> - - <para>A common lament seen in various newsgroups deals with the Standard - string class as opposed to the Microsoft Foundation Class called - CString. Often programmers realize that a standard portable - answer is better than a proprietary nonportable one, but in porting - their application from a Win32 platform, they discover that they - are relying on special functions offered by the CString class. - </para> - <para>Things are not as bad as they seem. In - <ulink url="http://gcc.gnu.org/ml/gcc/1999-04n/msg00236.html">this - message</ulink>, Joe Buck points out a few very important things: - </para> - <itemizedlist> - <listitem><para>The Standard <code>string</code> supports all the operations - that CString does, with three exceptions. - </para></listitem> - <listitem><para>Two of those exceptions (whitespace trimming and case - conversion) are trivial to implement. In fact, we do so - on this page. - </para></listitem> - <listitem><para>The third is <code>CString::Format</code>, which allows formatting - in the style of <code>sprintf</code>. This deserves some mention: - </para></listitem> - </itemizedlist> - <para> - The old libg++ library had a function called form(), which did much - the same thing. But for a Standard solution, you should use the - stringstream classes. These are the bridge between the iostream - hierarchy and the string class, and they operate with regular - streams seamlessly because they inherit from the iostream - hierarchy. An quick example: - </para> - <programlisting> - #include <iostream> - #include <string> - #include <sstream> - - string f (string& incoming) // incoming is "foo N" - { - istringstream incoming_stream(incoming); - string the_word; - int the_number; - - incoming_stream >> the_word // extract "foo" - >> the_number; // extract N - - ostringstream output_stream; - output_stream << "The word was " << the_word - << " and 3*N was " << (3*the_number); - - return output_stream.str(); - } </programlisting> - <para>A serious problem with CString is a design bug in its memory - allocation. Specifically, quoting from that same message: - </para> - <programlisting> - CString suffers from a common programming error that results in - poor performance. Consider the following code: - - CString n_copies_of (const CString& foo, unsigned n) - { - CString tmp; - for (unsigned i = 0; i < n; i++) - tmp += foo; - return tmp; - } - - This function is O(n^2), not O(n). The reason is that each += - causes a reallocation and copy of the existing string. Microsoft - applications are full of this kind of thing (quadratic performance - on tasks that can be done in linear time) -- on the other hand, - we should be thankful, as it's created such a big market for high-end - ix86 hardware. :-) - - If you replace CString with string in the above function, the - performance is O(n). - </programlisting> - <para>Joe Buck also pointed out some other things to keep in mind when - comparing CString and the Standard string class: - </para> - <itemizedlist> - <listitem><para>CString permits access to its internal representation; coders - who exploited that may have problems moving to <code>string</code>. - </para></listitem> - <listitem><para>Microsoft ships the source to CString (in the files - MFC\SRC\Str{core,ex}.cpp), so you could fix the allocation - bug and rebuild your MFC libraries. - <emphasis><emphasis>Note:</emphasis> It looks like the CString shipped - with VC++6.0 has fixed this, although it may in fact have been - one of the VC++ SPs that did it.</emphasis> - </para></listitem> - <listitem><para><code>string</code> operations like this have O(n) complexity - <emphasis>if the implementors do it correctly</emphasis>. The libstdc++ - implementors did it correctly. Other vendors might not. - </para></listitem> - <listitem><para>While parts of the SGI STL are used in libstdc++, their - string class is not. The SGI <code>string</code> is essentially - <code>vector<char></code> and does not do any reference - counting like libstdc++'s does. (It is O(n), though.) - So if you're thinking about SGI's string or rope classes, - you're now looking at four possibilities: CString, the - libstdc++ string, the SGI string, and the SGI rope, and this - is all before any allocator or traits customizations! (More - choices than you can shake a stick at -- want fries with that?) - </para></listitem> - </itemizedlist> - - </sect1> -</chapter> - -<!-- Chapter 03 : Interacting with C --> - -</part> |