summaryrefslogtreecommitdiff
path: root/docs/docbook/projdoc/unicode.xml
diff options
context:
space:
mode:
Diffstat (limited to 'docs/docbook/projdoc/unicode.xml')
-rw-r--r--docs/docbook/projdoc/unicode.xml97
1 files changed, 50 insertions, 47 deletions
diff --git a/docs/docbook/projdoc/unicode.xml b/docs/docbook/projdoc/unicode.xml
index 28d6f76cdf..699f29f1ba 100644
--- a/docs/docbook/projdoc/unicode.xml
+++ b/docs/docbook/projdoc/unicode.xml
@@ -1,6 +1,7 @@
<chapter id="unicode">
<chapterinfo>
&author.jelmer;
+ &author.jht;
<author>
<firstname>TAKAHASHI</firstname><surname>Motonobu</surname>
<affiliation>
@@ -25,62 +26,65 @@ origin.
<para>
Of all the effort that has been brought to bear on providing native language support
-for all computer users, the efforts of the <ulink url="http://www.openi18n.org/">Openi18n organisation</ulink> is deserving of
+for all computer users, the efforts of the <ulink url="http://www.openi18n.org/">Openi18n organization</ulink> is deserving of
special mention.
</para>
<para>
Samba-2.x supported a single locale through a mechanism called
<emphasis>codepages</emphasis>. Samba-3 is destined to become a truly trans-global
-file and printer sharing platform.
+file and printer-sharing platform.
</para>
</sect1>
<sect1>
-<title>What are charsets and unicode?</title>
+<title>What Are Charsets and Unicode?</title>
<para>
Computers communicate in numbers. In texts, each number will be
translated to a corresponding letter. The meaning that will be assigned
-to a certain number depends on the <emphasis>character set(charset)
+to a certain number depends on the <emphasis>character set (charset)
</emphasis> that is used.
+</para>
+
+<para>
A charset can be seen as a table that is used to translate numbers to
letters. Not all computers use the same charset (there are charsets
-with German umlauts, Japanese characters, etc). Usually a charset contains
+with German umlauts, Japanese characters, and so on). Usually a charset contains
256 characters, which means that storing a character with it takes
exactly one byte. </para>
<para>
There are also charsets that support even more characters,
-but those need twice(or even more) as much storage space. These
+but those need twice as much storage space (or more). These
charsets can contain <command>256 * 256 = 65536</command> characters, which
-is more then all possible characters one could think of. They are called
-multibyte charsets (because they use more then one byte to
-store one character).
+is more than all possible characters one could think of. They are called
+multibyte charsets because they use more then one byte to
+store one character.
</para>
<para>
- A standardised multibyte charset is <ulink url="http://www.unicode.org/">unicode</ulink>.
+A standardized multibyte charset is <ulink url="http://www.unicode.org/">unicode</ulink>.
A big advantage of using a multibyte charset is that you only need one; there
is no need to make sure two computers use the same charset when they are
communicating.
</para>
-<para>Old windows clients use single-byte charsets, named
-'codepages' by Microsoft. However, there is no support for
-negotiating the charset to be used in the smb protocol. Thus, you
+<para>Old Windows clients use single-byte charsets, named
+<parameter>codepages</parameter>, by Microsoft. However, there is no support for
+negotiating the charset to be used in the SMB/CIFS protocol. Thus, you
have to make sure you are using the same charset when talking to an older client.
-Newer clients (Windows NT, 2K, XP) talk unicode over the wire.
+Newer clients (Windows NT, 200x, XP) talk unicode over the wire.
</para>
</sect1>
<sect1>
-<title>Samba and charsets</title>
+<title>Samba and Charsets</title>
<para>
-As of samba 3.0, samba can (and will) talk unicode over the wire. Internally,
-samba knows of three kinds of character sets:
+As of Samba-3.0, Samba can (and will) talk unicode over the wire. Internally,
+Samba knows of three kinds of character sets:
</para>
<variablelist>
@@ -89,21 +93,21 @@ samba knows of three kinds of character sets:
<listitem><para>
This is the charset used internally by your operating system.
The default is <constant>UTF-8</constant>, which is fine for most
- systems. The default in previous samba releases was <constant>ASCII</constant>.
+ systems, which covers all characters in all languages. The default in previous Samba releases was <constant>ASCII</constant>.
</para></listitem>
</varlistentry>
<varlistentry>
<term><smbconfoption><name>display charset</name></smbconfoption></term>
- <listitem><para>This is the charset samba will use to print messages
- on your screen. It should generally be the same as the <command>unix charset</command>.
+ <listitem><para>This is the charset Samba will use to print messages
+ on your screen. It should generally be the same as the <parameter>unix charset</parameter>.
</para></listitem>
</varlistentry>
<varlistentry>
<term><smbconfoption><name>dos charset</name></smbconfoption></term>
- <listitem><para>This is the charset samba uses when communicating with
- DOS and Windows 9x clients. It will talk unicode to all newer clients.
+ <listitem><para>This is the charset Samba uses when communicating with
+ DOS and Windows 9x/Me clients. It will talk unicode to all newer clients.
The default depends on the charsets you have installed on your system.
Run <command>testparm -v | grep "dos charset"</command> to see
what the default is on your system.
@@ -114,42 +118,38 @@ samba knows of three kinds of character sets:
</sect1>
<sect1>
-<title>Conversion from old names</title>
+<title>Conversion from Old Names</title>
-<para>Because previous samba versions did not do any charset conversion,
-characters in filenames are usually not correct in the unix charset but only
+<para>Because previous Samba versions did not do any charset conversion,
+characters in filenames are usually not correct in the UNIX charset but only
for the local charset used by the DOS/Windows clients.</para>
-<para>Bjoern Jacke has written a utility named <ulink url="http://j3e.de/linux/convmv/">convm</ulink> that can convert whole directory
- structures to different charsets with one single command.
-</para>
-
</sect1>
<sect1>
-<title>Japanese charsets</title>
+<title>Japanese Charsets</title>
-<para>Samba doesn't work correctly with Japanese charsets yet. Here are
+<para>Samba does not work correctly with Japanese charsets yet. Here are
points of attention when setting it up:</para>
<itemizedlist>
<listitem><para>You should set <smbconfoption><name>mangling method</name><value>hash</value></smbconfoption></para></listitem>
-<listitem><para>There are various iconv() implementations around and not
-all of them work equally well. glibc2's iconv() has a critical problem
-in CP932. libiconv-1.8 works with CP932 but still has some problems and
-does not work with EUC-JP.</para></listitem>
+ <listitem><para>There are various iconv() implementations around and not
+ all of them work equally well. glibc2's iconv() has a critical problem
+ in CP932. libiconv-1.8 works with CP932 but still has some problems and
+ does not work with EUC-JP.</para></listitem>
-<listitem><para>You should set <smbconfoption><name>dos charset</name><value>CP932</value></smbconfoption>, not
-Shift_JIS, SJIS...</para></listitem>
+ <listitem><para>You should set <smbconfoption><name>dos charset</name><value>CP932</value></smbconfoption>, not
+ Shift_JIS, SJIS.</para></listitem>
-<listitem><para>Currently only <smbconfoption><name>unix charset</name><value>CP932</value></smbconfoption>
-will work (but still has some problems...) because of iconv() issues.
-<smbconfoption><name>unix charset</name><value>EUC-JP</value></smbconfoption> doesn't work well because of
-iconv() issues.</para></listitem>
+ <listitem><para>Currently only <smbconfoption><name>UNIX charset</name><value>CP932</value></smbconfoption>
+ will work (but still has some problems...) because of iconv() issues.
+ <smbconfoption><name>UNIX charset</name><value>EUC-JP</value></smbconfoption> does not work well because of
+ iconv() issues.</para></listitem>
-<listitem><para>Currently Samba 3.0 does not support <smbconfoption><name>unix charset</name><value>UTF8-MAC/CAP/HEX/JIS*</value></smbconfoption></para></listitem>
+ <listitem><para>Currently Samba-3.0 does not support <smbconfoption><name>UNIX charset</name><value>UTF8-MAC/CAP/HEX/JIS*</value></smbconfoption>.</para></listitem>
</itemizedlist>
@@ -158,16 +158,19 @@ iconv() issues.</para></listitem>
</sect1>
<sect1>
- <title>Common errors</title>
+ <title>Common Errors</title>
<sect2>
- <title>CP850.so can't be found</title>
+ <title>CP850.so Can't Be Found</title>
- <para><quote>Samba is complaining about a missing <filename>CP850.so</filename> file</quote>.</para>
+ <para><quote>Samba is complaining about a missing <filename>CP850.so</filename> file.</quote></para>
- <para>CP850 is the default <smbconfoption><name>dos charset</name></smbconfoption>. The <smbconfoption><name>dos charset</name></smbconfoption> is used to convert data to the codepage used by your dos clients. If you don't have any dos clients, you can safely ignore this message. </para>
+ <para><emphasis>Answer:</emphasis> CP850 is the default <smbconfoption><name>dos charset</name></smbconfoption>.
+ The <smbconfoption><name>dos charset</name></smbconfoption> is used to convert data to the codepage used by your dos clients.
+ If you do not have any dos clients, you can safely ignore this message. </para>
- <para>CP850 should be supported by your local iconv implementation. Make sure you have all the required packages installed. If you compiled samba from source, make sure configure found iconv.</para>
+ <para>CP850 should be supported by your local iconv implementation. Make sure you have all the required packages installed.
+ If you compiled Samba from source, make sure to configure found iconv.</para>
</sect2>
</sect1>