Age | Commit message (Collapse) | Author | Files | Lines |
|
of associated functions.
The motivation for this change was to avoid having to convert to/from
ucs2 strings for so many operations. Doing that was slow, used many
static buffers, and was also incorrect as it didn't cope properly with
unicode codepoints above 65536 (which could not be represented
correctly as smb_ucs2_t chars)
The two core functions that allowed this change are next_codepoint()
and push_codepoint(). These functions allow you to correctly walk a
arbitrary multi-byte string a character at a time without converting
the whole string to ucs2.
While doing this cleanup I also fixed several ucs2 string handling
bugs. See the commit for details.
The following code (which counts the number of occuraces of 'c' in a
string) shows how to use the new interface:
size_t count_chars(const char *s, char c)
{
size_t count = 0;
while (*s) {
size_t size;
codepoint_t c2 = next_codepoint(s, &size);
if (c2 == c) count++;
s += size;
}
return count;
}
(This used to be commit 814881f0e50019196b3aa9fbe4aeadbb98172040)
|
|
I had previously thought this was unnecessary, as windows doesn't use
standards compliant UTF-16, and for filesystem operations treats bytes
as UCS-2, but Bjoern Jacke has pointed out to me that this means we
don't correctly store extended UTF-16 characters as UTF-8 on
disk. This can be seen with (for example) the gothic characters with
codepoints above 64k.
This commit also adds a LOCAL-ICONV torture test that tests the first
1 million codepoints against the system iconv library, and tests 5
million random UTF-16LE buffers for identical error handling to the
system iconv library.
the lib/iconv.c changes need backporting to samba3
(This used to be commit 756f28ac95feaa84b42402723d5f7286865c78db)
|
|
rename CLI_ -> SMBCLI_
metze
(This used to be commit 8441750fd9427dd6fe477f27e603821b4026f038)
|
|
metze
(This used to be commit 2986c5f08c8f0c26a2ea7b6ce20aae025183109f)
|
|
- added a CHARSET set of tests, which determines how the server deals
with some specific charset issues related to UTF-16
support. Interestingly, Samba3 already passes all but one of these
tests, because our incorrect UCS-2 and UTF-8 implementations where we
don't check the validity of characters actually matches what Windows
does! This means that adding UTF-16 support to Samba is going to be
_much_ easier than we expected.
(This used to be commit c8497a42364d186f08102224d5062d176ee81f5b)
|