aboutsummaryrefslogtreecommitdiffstats
path: root/doc/tutorial
diff options
context:
space:
mode:
authorMST 2002 John Fleck <jfleck@inkstain.net>2002-11-11 03:49:33 +0000
committerJohn Fleck <jfleck@src.gnome.org>2002-11-11 03:49:33 +0000
commit52717f344bb221adbb97e5057266310e95a3b50f (patch)
tree92b0441fa2b5c248950c76886958a70014446377 /doc/tutorial
parentbd3b4fd15bc7fa17816a73cc8325085dcc378e8f (diff)
downloadandroid_external_libxml2-52717f344bb221adbb97e5057266310e95a3b50f.tar.gz
android_external_libxml2-52717f344bb221adbb97e5057266310e95a3b50f.tar.bz2
android_external_libxml2-52717f344bb221adbb97e5057266310e95a3b50f.zip
doc/tutorial/ar01s08.html adding file what I forgot for tutorial
Sun Nov 10 20:48:57 MST 2002 John Fleck <jfleck@inkstain.net> * doc/tutorial/ar01s08.html adding file what I forgot for tutorial
Diffstat (limited to 'doc/tutorial')
-rw-r--r--doc/tutorial/ar01s08.html63
1 files changed, 63 insertions, 0 deletions
diff --git a/doc/tutorial/ar01s08.html b/doc/tutorial/ar01s08.html
new file mode 100644
index 00000000..509c8d56
--- /dev/null
+++ b/doc/tutorial/ar01s08.html
@@ -0,0 +1,63 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Encoding Conversion</title><meta name="generator" content="DocBook XSL Stylesheets V1.57.0"><link rel="home" href="index.html" title="Libxml Tutorial"><link rel="up" href="index.html" title="Libxml Tutorial"><link rel="previous" href="ar01s07.html" title="Retrieving Attributes"><link rel="next" href="apa.html" title="A. Sample Document"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Encoding Conversion</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="ar01s07.html">Prev</a> </td><th width="60%" align="center"> </th><td width="20%" align="right"> <a accesskey="n" href="apa.html">Next</a></td></tr></table><hr></div><div class="sect1" lang="en"><div class="titlepage"><div><h2 class="title" style="clear: both"><a name="xmltutorialconvert"></a>Encoding Conversion</h2></div></div><p>Data encoding compatibility problems are one of the most common
+ difficulties encountered by programmers new to XML in
+ general and libxml in particular. Thinking
+ through the design of your application in light of this issue will help
+ avoid difficulties later. Internally, libxml
+ stores and manipulates date in the UTF-8 format. Data used by your program
+ in other formats, such as the commonly used ISO-8859-1 encoding, must be
+ converted to UTF-8 before passing it to libxml
+ functions. If you want your program's output in an encoding other than
+ UTF-8, you also must convert it.</p><p>Libxml uses
+ iconv if it is available to convert
+ data. Without iconv, only UTF-8, UTF-16 and
+ ISO-8859-1 can be used as external formats. With
+ iconv, any format can be used provided
+ iconv is able to convert it to and from
+ UTF-8. Currently iconv supports about 150
+ different character formats with ability to convert from any to any. While
+ the actual number of supported formats varies between implementations, every
+ iconv implementation is almost guaranteed to
+ support every format anyone has ever heard of.</p><div class="warning" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Warning"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Warning]" src="images/warning.png"></td><th>Warning</th></tr><tr><td colspan="2" align="left" valign="top"><p>A common mistake is to use different formats for the internal data
+ in different parts of one's code. The most common case is an application
+ that assumes ISO-8859-1 to be the internal data format, combined with
+ libxml, which assumes UTF-8 to be the
+ internal data format. The result is an application that treats internal
+ data differently, depending on which code section is executing. The one or
+ the other part of code will then, naturally, misinterpret the data.
+ </p></td></tr></table></div><p>This example constructs a simple document, then adds content provided
+ at the command line to the document's root element and outputs the results
+ to <tt>stdout</tt> in the proper encoding. For this example, we
+ use ISO-8859-1 encoding. The encoding of the string input at the command
+ line is converted from ISO-8859-1 to UTF-8. Full code: <a href="apf.html" title="F. Code for Encoding Conversion Example">Appendix F</a></p><p>The conversion, encapsulated in the example code in the
+ <tt>convert</tt> function, uses
+ libxml's
+ <tt>xmlFindCharEncodingHandler</tt> function:
+ </p><pre class="programlisting">
+ <a name="handlerdatatype"></a><img src="images/callouts/1.png" alt="1" border="0">xmlCharEncodingHandlerPtr handler;
+ <a name="calcsize"></a><img src="images/callouts/2.png" alt="2" border="0">size = (int)strlen(in)+1;
+ out_size = size*2-1;
+ out = malloc((size_t)out_size);
+
+&#8230;
+ <a name="findhandlerfunction"></a><img src="images/callouts/3.png" alt="3" border="0">handler = xmlFindCharEncodingHandler(encoding);
+&#8230;
+ <a name="callconversionfunction"></a><img src="images/callouts/4.png" alt="4" border="0">handler-&gt;input(out, &amp;out_size, in, &amp;temp);
+&#8230;
+ <a name="outputencoding"></a><img src="images/callouts/5.png" alt="5" border="0">xmlSaveFormatFileEnc(&quot;-&quot;, doc, encoding, 1);
+ </pre><p>
+ </p><div class="calloutlist"><table border="0" summary="Callout list"><tr><td width="5%" valign="top" align="left"><a href="#handlerdatatype"><img src="images/callouts/1.png" alt="1" border="0"></a> </td><td valign="top" align="left"><p><tt>handler</tt> is declared as a pointer to an
+ <tt>xmlCharEncodingHandler</tt> function.</p></td></tr><tr><td width="5%" valign="top" align="left"><a href="#calcsize"><img src="images/callouts/2.png" alt="2" border="0"></a> </td><td valign="top" align="left"><p>The <tt>xmlCharEncodingHandler</tt> function needs
+ to be given the size of the input and output strings, which are
+ calculated here for strings <tt>in</tt> and
+ <tt>out</tt>.</p></td></tr><tr><td width="5%" valign="top" align="left"><a href="#findhandlerfunction"><img src="images/callouts/3.png" alt="3" border="0"></a> </td><td valign="top" align="left"><p><tt>xmlFindCharEncodingHandler</tt> takes as its
+ argument the data's initial encoding and searches
+ libxml's built-in set of conversion
+ handlers, returning a pointer to the function or NULL if none is
+ found.</p></td></tr><tr><td width="5%" valign="top" align="left"><a href="#callconversionfunction"><img src="images/callouts/4.png" alt="4" border="0"></a> </td><td valign="top" align="left"><p>The conversion function identified by <tt>handler</tt>
+ requires as its arguments pointers to the input and output strings,
+ along with the length of each. The lengths must be determined
+ separately by the application.</p></td></tr><tr><td width="5%" valign="top" align="left"><a href="#outputencoding"><img src="images/callouts/5.png" alt="5" border="0"></a> </td><td valign="top" align="left"><p>To output in a specified encoding rather than UTF-8, we use
+ <tt>xmlSaveFormatFileEnc</tt>, specifying the
+ encoding.</p></td></tr></table></div><p>
+ </p></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="ar01s07.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="index.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="apa.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Retrieving Attributes </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> A. Sample Document</td></tr></table></div></body></html>