diff options
Diffstat (limited to 'docs/docbook/projdoc/unicode.sgml')
-rw-r--r-- | docs/docbook/projdoc/unicode.sgml | 128 |
1 files changed, 0 insertions, 128 deletions
diff --git a/docs/docbook/projdoc/unicode.sgml b/docs/docbook/projdoc/unicode.sgml deleted file mode 100644 index eaf9990dcb..0000000000 --- a/docs/docbook/projdoc/unicode.sgml +++ /dev/null @@ -1,128 +0,0 @@ -<chapter id="unicode"> -<chapterinfo> - &author.jelmer; - <author> - <firstname>TAKAHASHI</firstname><surname>Motonobu</surname> - <affiliation> - <address><email>monyo@home.monyo.com</email></address> - </affiliation> - </author> - <pubdate>25 March 2003</pubdate> -</chapterinfo> - -<title>Unicode/Charsets</title> - -<sect1> -<title>What are charsets and unicode?</title> - -<para> -Computers communicate in numbers. In texts, each number will be -translated to a corresponding letter. The meaning that will be assigned -to a certain number depends on the <emphasis>character set(charset) -</emphasis> that is used. -A charset can be seen as a table that is used to translate numbers to -letters. Not all computers use the same charset (there are charsets -with German umlauts, Japanese characters, etc). Usually a charset contains -256 characters, which means that storing a character with it takes -exactly one byte. </para> - -<para> -There are also charsets that support even more characters, -but those need twice(or even more) as much storage space. These -charsets can contain <command>256 * 256 = 65536</command> characters, which -is more then all possible characters one could think of. They are called -multibyte charsets (because they use more then one byte to -store one character). -</para> - -<para> -A standardised multibyte charset is unicode, info is available at -<ulink url="http://www.unicode.org/">www.unicode.org</ulink>. -A big advantage of using a multibyte charset is that you only need one; no -need to make sure two computers use the same charset when they are -communicating. -</para> - -<para>Old windows clients used to use single-byte charsets, named -'codepages' by microsoft. However, there is no support for -negotiating the charset to be used in the smb protocol. Thus, you -have to make sure you are using the same charset when talking to an old client. -Newer clients (Windows NT, 2K, XP) talk unicode over the wire. -</para> -</sect1> - -<sect1> -<title>Samba and charsets</title> - -<para> -As of samba 3.0, samba can (and will) talk unicode over the wire. Internally, -samba knows of three kinds of character sets: -</para> - -<variablelist> - <varlistentry> - <term>unix charset</term> - <listitem><para> - This is the charset used internally by your operating system. - The default is <constant>ASCII</constant>, which is fine for most - systems. - </para></listitem> - </varlistentry> - - <varlistentry> - <term>display charset</term> - <listitem><para>This is the charset samba will use to print messages - on your screen. It should generally be the same as the <command>unix charset</command>. - </para></listitem> - </varlistentry> - - <varlistentry> - <term>dos charset</term> - <listitem><para>This is the charset samba uses when communicating with - DOS and Windows 9x clients. It will talk unicode to all newer clients. - The default depends on the charsets you have installed on your system. - Run <command>testparm -v | grep "dos charset"</command> to see - what the default is on your system. - </para></listitem> - </varlistentry> -</variablelist> - -</sect1> - -<sect1> -<title>Conversion from old names</title> - -<para>Because previous samba versions did not do any charset conversion, -characters in filenames are usually not correct in the unix charset but only -for the local charset used by the DOS/Windows clients.</para> - -<para>The following script from Steve Langasek converts all -filenames from CP850 to the iso8859-15 charset.</para> - -<para> -<prompt>#</prompt><userinput>find <replaceable>/path/to/share</replaceable> -type f -exec bash -c 'CP="{}"; ISO=`echo -n "$CP" | iconv -f cp850 \ - -t iso8859-15`; if [ "$CP" != "$ISO" ]; then mv "$CP" "$ISO"; fi' \; -</userinput> -</para> -</sect1> - -<sect1> -<title>Japanese charsets</title> - -<para>Samba doesn't work correctly with Japanese charsets yet. Here are points of attention when setting it up:</para> - -<simplelist> -<member>You should set <command>mangling method = hash</command></member> -<member>There are various iconv() implementations around and not all of -them work equally well. glibc2's iconv() has a critical problem in CP932. -libiconv-1.8 works with CP932 but still has some problems and does not -work with EUC-JP. </member> -<member>You should set <command>dos charset = CP932</command>, not Shift_JIS, SJIS...</member> -<member>Currently only <command>unix charset = CP932</command> will work (but still has some problems...) because of iconv() issues. <command>unix charset = EUC-JP</command> doesn't work well because of iconv() issues.</member> -<member>Currently Samba 3.0 does not support <command>unix charset = UTF8-MAC/CAP/HEX/JIS*</command></member> -</simplelist> - -<para>More information (in Japanese) is available at: <ulink url="http://www.atmarkit.co.jp/flinux/special/samba3/samba3a.html">http://www.atmarkit.co.jp/flinux/special/samba3/samba3a.html</ulink>.</para> -</sect1> - -</chapter> |