diff options
Diffstat (limited to 'docs/htmldocs/unicode.html')
-rw-r--r-- | docs/htmldocs/unicode.html | 60 |
1 files changed, 60 insertions, 0 deletions
diff --git a/docs/htmldocs/unicode.html b/docs/htmldocs/unicode.html new file mode 100644 index 0000000000..0c5bb01d13 --- /dev/null +++ b/docs/htmldocs/unicode.html @@ -0,0 +1,60 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> +<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter 25. Unicode/Charsets</title><link rel="stylesheet" href="samba.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.59.1"><link rel="home" href="index.html" title="SAMBA Project Documentation"><link rel="up" href="optional.html" title="Part III. Advanced Configuration"><link rel="previous" href="securing-samba.html" title="Chapter 24. Securing Samba"><link rel="next" href="locking.html" title="Chapter 26. File and Record Locking"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 25. Unicode/Charsets</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="securing-samba.html">Prev</a> </td><th width="60%" align="center">Part III. Advanced Configuration</th><td width="20%" align="right"> <a accesskey="n" href="locking.html">Next</a></td></tr></table><hr></div><div class="chapter" lang="en"><div class="titlepage"><div><h2 class="title"><a name="unicode"></a>Chapter 25. Unicode/Charsets</h2></div><div><div class="author"><h3 class="author">Jelmer R. Vernooij</h3><div class="affiliation"><span class="orgname">The Samba Team<br></span><div class="address"><p><tt><<a href="mailto:jelmer@samba.org">jelmer@samba.org</a>></tt></p></div></div></div></div><div><div class="author"><h3 class="author">TAKAHASHI Motonobu</h3><div class="affiliation"><div class="address"><p><tt><<a href="mailto:monyo@home.monyo.com">monyo@home.monyo.com</a>></tt></p></div></div></div></div><div><p class="pubdate">25 March 2003</p></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><a href="unicode.html#id2901255">What are charsets and unicode?</a></dt><dt><a href="unicode.html#id2901324">Samba and charsets</a></dt><dt><a href="unicode.html#id2901414">Conversion from old names</a></dt><dt><a href="unicode.html#id2901459">Japanese charsets</a></dt></dl></div><div class="sect1" lang="en"><div class="titlepage"><div><h2 class="title" style="clear: both"><a name="id2901255"></a>What are charsets and unicode?</h2></div></div><p> +Computers communicate in numbers. In texts, each number will be +translated to a corresponding letter. The meaning that will be assigned +to a certain number depends on the <span class="emphasis"><em>character set(charset) +</em></span> that is used. +A charset can be seen as a table that is used to translate numbers to +letters. Not all computers use the same charset (there are charsets +with German umlauts, Japanese characters, etc). Usually a charset contains +256 characters, which means that storing a character with it takes +exactly one byte. </p><p> +There are also charsets that support even more characters, +but those need twice(or even more) as much storage space. These +charsets can contain <b>256 * 256 = 65536</b> characters, which +is more then all possible characters one could think of. They are called +multibyte charsets (because they use more then one byte to +store one character). +</p><p> +A standardised multibyte charset is unicode, info is available at +<a href="http://www.unicode.org/" target="_top">www.unicode.org</a>. +A big advantage of using a multibyte charset is that you only need one; no +need to make sure two computers use the same charset when they are +communicating. +</p><p>Old windows clients used to use single-byte charsets, named +'codepages' by microsoft. However, there is no support for +negotiating the charset to be used in the smb protocol. Thus, you +have to make sure you are using the same charset when talking to an old client. +Newer clients (Windows NT, 2K, XP) talk unicode over the wire. +</p></div><div class="sect1" lang="en"><div class="titlepage"><div><h2 class="title" style="clear: both"><a name="id2901324"></a>Samba and charsets</h2></div></div><p> +As of samba 3.0, samba can (and will) talk unicode over the wire. Internally, +samba knows of three kinds of character sets: +</p><div class="variablelist"><dl><dt><span class="term">unix charset</span></dt><dd><p> + This is the charset used internally by your operating system. + The default is <tt>ASCII</tt>, which is fine for most + systems. + </p></dd><dt><span class="term">display charset</span></dt><dd><p>This is the charset samba will use to print messages + on your screen. It should generally be the same as the <b>unix charset</b>. + </p></dd><dt><span class="term">dos charset</span></dt><dd><p>This is the charset samba uses when communicating with + DOS and Windows 9x clients. It will talk unicode to all newer clients. + The default depends on the charsets you have installed on your system. + Run <b>testparm -v | grep "dos charset"</b> to see + what the default is on your system. + </p></dd></dl></div></div><div class="sect1" lang="en"><div class="titlepage"><div><h2 class="title" style="clear: both"><a name="id2901414"></a>Conversion from old names</h2></div></div><p>Because previous samba versions did not do any charset conversion, +characters in filenames are usually not correct in the unix charset but only +for the local charset used by the DOS/Windows clients.</p><p>The following script from Steve Langasek converts all +filenames from CP850 to the iso8859-15 charset.</p><p> +<tt>#</tt><b><tt>find <i><tt>/path/to/share</tt></i> -type f -exec bash -c 'CP="{}"; ISO=`echo -n "$CP" | iconv -f cp850 \ + -t iso8859-15`; if [ "$CP" != "$ISO" ]; then mv "$CP" "$ISO"; fi' \; +</tt></b> +</p></div><div class="sect1" lang="en"><div class="titlepage"><div><h2 class="title" style="clear: both"><a name="id2901459"></a>Japanese charsets</h2></div></div><p>Samba doesn't work correctly with Japanese charsets yet. Here are +points of attention when setting it up:</p><div class="itemizedlist"><ul type="disc"><li><p>You should set <b>mangling method = +hash</b></p></li><li><p>There are various iconv() implementations around and not +all of them work equally well. glibc2's iconv() has a critical problem +in CP932. libiconv-1.8 works with CP932 but still has some problems and +does not work with EUC-JP.</p></li><li><p>You should set <b>dos charset = CP932</b>, not +Shift_JIS, SJIS...</p></li><li><p>Currently only <b>unix charset = CP932</b> +will work (but still has some problems...) because of iconv() issues. +<b>unix charset = EUC-JP</b> doesn't work well because of +iconv() issues.</p></li><li><p>Currently Samba 3.0 does not support <b>unix charset += UTF8-MAC/CAP/HEX/JIS*</b></p></li></ul></div><p>More information (in Japanese) is available at: <a href="http://www.atmarkit.co.jp/flinux/special/samba3/samba3a.html" target="_top">http://www.atmarkit.co.jp/flinux/special/samba3/samba3a.html</a>.</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="securing-samba.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="optional.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="locking.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 24. Securing Samba </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 26. File and Record Locking</td></tr></table></div></body></html> |