Switch to a working UTF-8 mb/wc implementation.

Although glibc gets by with an 8-byte mbstate_t, OpenBSD uses 12 bytes (of the 128 bytes it reserves!). We can actually implement UTF-8 encoding/decoding with a 0-byte mbstate_t which means we can make things work on LP32 too, as long as we accept the limitation that the caller needs to present us with a complete sequence before we'll process it. Our behavior is fine when going from characters to bytes; we just update the source wchar_t** to say how far through the input we got. I'll come back and use the 4 bytes we do have to cope with byte sequences split across multiple input buffers. The fact that we don't support UTF-8 sequences longer than 4 bytes plus the fact that the first byte of a UTF-8 sequence encodes the length means we shouldn't need the other fields OpenBSD used (at the cost of some recomputation in cases where a sequence is split across buffers). This patch also makes the minimal changes necessary to setlocale(3) to make us behave like glibc when an app requests UTF-8. (The difference being that our "C" locale is the same as our "C.UTF-8" locale.) Change-Id: Ied327a8c4643744b3611bf6bb005a9b389ba4c2f
author: Elliott Hughes <enh@google.com> 2014-04-30 22:03:12 -0700
committer: Elliott Hughes <enh@google.com> 2014-05-01 14:46:54 -0700
commit: 5a0aa3dee247a313f04252cf45608097695d5953 (patch)
tree: 1bbc0d1e4e60717285b17b40ab155bdfbace5e37 /libc/bionic/locale.cpp
parent: 9fb53dd4dbaa7633c234d9da8417827fa3d3c32f (diff)
download: android_bionic-5a0aa3dee247a313f04252cf45608097695d5953.tar.gz
android_bionic-5a0aa3dee247a313f04252cf45608097695d5953.tar.bz2
android_bionic-5a0aa3dee247a313f04252cf45608097695d5953.zip
1 files changed, 15 insertions, 13 deletions
diff --git a/libc/bionic/locale.cpp b/libc/bionic/locale.cpp
index 5ab834dcc..3752fa44f 100644
--- a/libc/bionic/locale.cpp
+++ b/libc/bionic/locale.cpp
@@ -75,8 +75,12 @@ static void __locale_init() {
   gLocale.int_n_sign_posn = CHAR_MAX;
 }
 
+static bool __bionic_current_locale_is_utf8 = false;
+
 static bool __is_supported_locale(const char* locale) {
-  return (strcmp(locale, "") == 0 || strcmp(locale, "C") == 0 || strcmp(locale, "POSIX") == 0);
+  return (strcmp(locale, "") == 0 ||
+          strcmp(locale, "C") == 0 || strcmp(locale, "C.UTF-8") == 0 ||
+          strcmp(locale, "POSIX") == 0);
 }
 
 static locale_t __new_locale() {
@@ -115,26 +119,24 @@ locale_t newlocale(int category_mask, const char* locale_name, locale_t /*base*/
   return __new_locale();
 }
 
-char* setlocale(int category, char const* locale_name) {
+char* setlocale(int category, const char* locale_name) {
   // Is 'category' valid?
   if (category < LC_CTYPE || category > LC_IDENTIFICATION) {
     errno = EINVAL;
     return NULL;
   }
 
-  // Caller just wants to query the current locale?
-  if (locale_name == NULL) {
-    return const_cast<char*>("C");
-  }
-
-  // Caller wants one of the mandatory POSIX locales?
-  if (__is_supported_locale(locale_name)) {
-    return const_cast<char*>("C");
+  // Caller wants to set the locale rather than just query?
+  if (locale_name != NULL) {
+    if (!__is_supported_locale(locale_name)) {
+      // We don't support this locale.
+      errno = ENOENT;
+      return NULL;
+    }
+    __bionic_current_locale_is_utf8 = (strstr(locale_name, "UTF-8") != NULL);
   }
 
-  // We don't support any other locales.
-  errno = ENOENT;
-  return NULL;
+  return const_cast<char*>(__bionic_current_locale_is_utf8 ? "C.UTF-8" : "C");
 }
 
 locale_t uselocale(locale_t new_locale) {
author	Elliott Hughes <enh@google.com>	2014-04-30 22:03:12 -0700
committer	Elliott Hughes <enh@google.com>	2014-05-01 14:46:54 -0700
commit	5a0aa3dee247a313f04252cf45608097695d5953 (patch)
tree	1bbc0d1e4e60717285b17b40ab155bdfbace5e37 /libc/bionic/locale.cpp
parent	9fb53dd4dbaa7633c234d9da8417827fa3d3c32f (diff)
download	android_bionic-5a0aa3dee247a313f04252cf45608097695d5953.tar.gz android_bionic-5a0aa3dee247a313f04252cf45608097695d5953.tar.bz2 android_bionic-5a0aa3dee247a313f04252cf45608097695d5953.zip