From b39559c4e52b9f83a9f57510e490d0a75dbbe0df Mon Sep 17 00:00:00 2001
From: Jelmer Vernooij <jelmer@samba.org>
Date: Thu, 1 May 2003 13:58:23 +0000
Subject: Merge over Alexanders' conversion to docbook XML (This used to be
 commit e75624c382d640747b54ba43f134fa043d23b7fe)

---
 docs/docbook/projdoc/unicode.sgml | 128 --------------------------------------
 1 file changed, 128 deletions(-)
 delete mode 100644 docs/docbook/projdoc/unicode.sgml

(limited to 'docs/docbook/projdoc/unicode.sgml')
diff --git a/docs/docbook/projdoc/unicode.sgml b/docs/docbook/projdoc/unicode.sgml
deleted file mode 100644
index eaf9990dcb..0000000000
--- a/docs/docbook/projdoc/unicode.sgml
+++ /dev/null
@@ -1,128 +0,0 @@
-<chapter id="unicode">
-<chapterinfo>
-	&author.jelmer;
-	<author>
-		<firstname>TAKAHASHI</firstname><surname>Motonobu</surname>
-		<affiliation>
-		<address><email>monyo@home.monyo.com</email></address>
-		</affiliation>
-	</author>
-	<pubdate>25 March 2003</pubdate>
-</chapterinfo>
-
-<title>Unicode/Charsets</title>
-
-<sect1>
-<title>What are charsets and unicode?</title>
-
-<para>
-Computers communicate in numbers. In texts, each number will be 
-translated to a corresponding letter. The meaning that will be assigned 
-to a certain number depends on the <emphasis>character set(charset)
-</emphasis> that is used. 
-A charset can be seen as a table that is used to translate numbers to 
-letters. Not all computers use the same charset (there are charsets 
-with German umlauts, Japanese characters, etc). Usually a charset contains 
-256 characters, which means that storing a character with it takes 
-exactly one byte. </para>
-
-<para>
-There are also charsets that support even more characters, 
-but those need twice(or even more) as much storage space. These 
-charsets can contain <command>256 * 256 = 65536</command> characters, which
-is more then all possible characters one could think of. They are called 
-multibyte charsets (because they use more then one byte to 
-store one character). 
-</para>
-
-<para>
-A standardised multibyte charset is unicode, info is available at 
-<ulink url="http://www.unicode.org/">www.unicode.org</ulink>. 
-A big advantage of using a multibyte charset is that you only need one; no 
-need to make sure two computers use the same charset when they are 
-communicating.
-</para>
-
-<para>Old windows clients used to use single-byte charsets, named 
-'codepages' by microsoft. However, there is no support for 
-negotiating the charset to be used in the smb protocol. Thus, you 
-have to make sure you are using the same charset when talking to an old client.
-Newer clients (Windows NT, 2K, XP) talk unicode over the wire.
-</para>
-</sect1>
-
-<sect1>
-<title>Samba and charsets</title>
-
-<para>
-As of samba 3.0, samba can (and will) talk unicode over the wire. Internally, 
-samba knows of three kinds of character sets: 
-</para>
-
-<variablelist>
-	<varlistentry>
-		<term>unix charset</term>
-		<listitem><para>
-		This is the charset used internally by your operating system. 
-		The default is <constant>ASCII</constant>, which is fine for most 
-		systems.
-		</para></listitem>
-	</varlistentry>
-
-	<varlistentry>
-		<term>display charset</term>
-		<listitem><para>This is the charset samba will use to print messages
-		on your screen. It should generally be the same as the <command>unix charset</command>.
-		</para></listitem>
-	</varlistentry>
-
-	<varlistentry>
-		<term>dos charset</term>
-		<listitem><para>This is the charset samba uses when communicating with 
-		DOS and Windows 9x clients. It will talk unicode to all newer clients.
-		The default depends on the charsets you have installed on your system.
-		Run <command>testparm -v | grep "dos charset"</command> to see 
-		what the default is on your system. 
-		</para></listitem>
-	</varlistentry>
-</variablelist>
-
-</sect1>
-
-<sect1>
-<title>Conversion from old names</title>
-
-<para>Because previous samba versions did not do any charset conversion, 
-characters in filenames are usually not correct in the unix charset but only 
-for the local charset used by the DOS/Windows clients.</para>
-
-<para>The following script from Steve Langasek converts all 
-filenames from CP850 to the iso8859-15 charset.</para>
-
-<para>
-<prompt>#</prompt><userinput>find <replaceable>/path/to/share</replaceable> -type f -exec bash -c 'CP="{}"; ISO=`echo -n "$CP" | iconv -f cp850 \
-  -t iso8859-15`; if [ "$CP" != "$ISO" ]; then mv "$CP" "$ISO"; fi' \;
-</userinput>
-</para>
-</sect1>
-
-<sect1>
-<title>Japanese charsets</title>
-
-<para>Samba doesn't work correctly with Japanese charsets yet. Here are points of attention when setting it up:</para>
-
-<simplelist>
-<member>You should set <command>mangling method = hash</command></member>
-<member>There are various iconv() implementations around and not all of 
-them work equally well. glibc2's iconv() has a critical problem in CP932. 
-libiconv-1.8 works with CP932 but still has some problems and does not 
-work with EUC-JP. </member>
-<member>You should set <command>dos charset = CP932</command>, not Shift_JIS, SJIS...</member>
-<member>Currently only <command>unix charset = CP932</command> will work (but still has some problems...) because of iconv() issues. <command>unix charset = EUC-JP</command> doesn't work well because of iconv() issues.</member>
-<member>Currently Samba 3.0 does not support <command>unix charset = UTF8-MAC/CAP/HEX/JIS*</command></member>
-</simplelist>
-
-<para>More information (in Japanese) is available at: <ulink url="http://www.atmarkit.co.jp/flinux/special/samba3/samba3a.html">http://www.atmarkit.co.jp/flinux/special/samba3/samba3a.html</ulink>.</para>
-</sect1>
-
-</chapter>
-- 
cgit