Chapter 13. String Classes

Table of Contents

Simple Transformations
Case Sensitivity
Arbitrary Character Types
Tokenizing
Shrink to Fit
CString (MFC)

Simple Transformations

Here are Standard, simple, and portable ways to perform common transformations on a string instance, such as "convert to all upper case." The word transformations is especially apt, because the standard template function transform<> is used.

This code will go through some iterations. Here's a simple version:

   #include <string>
   #include <algorithm>
   #include <cctype>      // old <ctype.h>

   struct ToLower
   {
     char operator() (char c) const  { return std::tolower(c); }
   };

   struct ToUpper
   {
     char operator() (char c) const  { return std::toupper(c); }
   };

   int main()
   {
     std::string  s ("Some Kind Of Initial Input Goes Here");

     // Change everything into upper case
     std::transform (s.begin(), s.end(), s.begin(), ToUpper());

     // Change everything into lower case
     std::transform (s.begin(), s.end(), s.begin(), ToLower());

     // Change everything back into upper case, but store the
     // result in a different string
     std::string  capital_s;
     capital_s.resize(s.size());
     std::transform (s.begin(), s.end(), capital_s.begin(), ToUpper());
   } 
   

Note that these calls all involve the global C locale through the use of the C functions toupper/tolower. This is absolutely guaranteed to work -- but only if the string contains only characters from the basic source character set, and there are only 96 of those. Which means that not even all English text can be represented (certain British spellings, proper names, and so forth). So, if all your input forevermore consists of only those 96 characters (hahahahahaha), then you're done.

Note that the ToUpper and ToLower function objects are needed because toupper and tolower are overloaded names (declared in <cctype> and <locale>) so the template-arguments for transform<> cannot be deduced, as explained in this message. At minimum, you can write short wrappers like

   char toLower (char c)
   {
      return std::tolower(c);
   } 

The correct method is to use a facet for a particular locale and call its conversion functions. These are discussed more in Chapter 22; the specific part is Correct Transformations, which shows the final version of this code. (Thanks to James Kanze for assistance and suggestions on all of this.)

Another common operation is trimming off excess whitespace. Much like transformations, this task is trivial with the use of string's find family. These examples are broken into multiple statements for readability:

   std::string  str (" \t blah blah blah    \n ");

   // trim leading whitespace
   string::size_type  notwhite = str.find_first_not_of(" \t\n");
   str.erase(0,notwhite);

   // trim trailing whitespace
   notwhite = str.find_last_not_of(" \t\n"); 
   str.erase(notwhite+1); 

Obviously, the calls to find could be inserted directly into the calls to erase, in case your compiler does not optimize named temporaries out of existence.