From bb7a2a1334a1a96a250c0a9282c9a11458766d16 Mon Sep 17 00:00:00 2001 From: Jelmer Vernooij Date: Tue, 25 Mar 2003 20:49:13 +0000 Subject: Add documentation on unicode (This used to be commit cdfb0161adb37f70247b047eb93b92cfcf11783b) --- docs/docbook/projdoc/unicode.sgml | 93 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/docbook/projdoc/unicode.sgml (limited to 'docs/docbook/projdoc/unicode.sgml') diff --git a/docs/docbook/projdoc/unicode.sgml b/docs/docbook/projdoc/unicode.sgml new file mode 100644 index 0000000000..a467a0d4e7 --- /dev/null +++ b/docs/docbook/projdoc/unicode.sgml @@ -0,0 +1,93 @@ + + + + JelmerVernooij + + Samba Team +
jelmer@samba.org
+
+
+ 25 March 2003 +
+ +Unicode/Charsets + + +What are charsets and unicode? + + +Computers communicate in numbers. In texts, each number will be +translated to a corresponding letter. The meaning that will be assigned +to a certain number depends on the character set(charset) + that is used. +A charset can be seen as a table that is used to translate numbers to +letters. Not all computers use the same charset (there are charsets +with German umlauts, Japanese characters, etc). Usually a charset contains +256 characters, which means that storing a character with it takes +exactly one byte. + + +There are also charsets that support even more characters, +but those need twice(or even more) as much storage space. These +charsets can contain 256 * 256 = 65536 characters, which +is more then all possible characters one could think of. They are called +multibyte charsets (because they use more then one byte to +store one character). + + + +A standardised multibyte charset is unicode, info available at +www.unicode.org. +Big advantage of using a multibyte charset is that you only need one; no +need to make sure two computers use the same charset when they are +communicating. + + +Old windows clients used to use single-byte charsets, named +'codepages' by microsoft. However, there is no support for +negotiating the charset to be used in the smb protocol. Thus, you +have to make sure you are using the same charset when talking to an old client. +Newer clients (Windows NT, 2K, XP) talk unicode over the wire. + + + + +Samba and charsets + + +As of samba 3.0, samba can (and will) talk unicode over the wire. Internally, +samba knows of three kinds of character sets: + + + + + unix charset + + This is the charset used internally by your operating system. + The default is ASCII, which is fine for most + systems. + + + + + display charset + This is the charset samba will use to print messages + on your screen. It should generally be the same as the unix charset. + + + + + dos charset + This is the charset samba uses when communicating with + DOS and Windows 9x clients. It will talk unicode to all newer clients. + The default depends on the charsets you have installed on your system. + Run testparm -v | grep "dos charset" to see + what the default is on your system. + + + + + + + +
-- cgit