summaryrefslogtreecommitdiff
path: root/source4/lib/charcnv.c
AgeCommit message (Collapse)AuthorFilesLines
2007-10-10r2857: this commit gets rid of smb_ucs2_t, wpstring and fpstring, plus lots ↵Andrew Tridgell1-48/+156
of associated functions. The motivation for this change was to avoid having to convert to/from ucs2 strings for so many operations. Doing that was slow, used many static buffers, and was also incorrect as it didn't cope properly with unicode codepoints above 65536 (which could not be represented correctly as smb_ucs2_t chars) The two core functions that allowed this change are next_codepoint() and push_codepoint(). These functions allow you to correctly walk a arbitrary multi-byte string a character at a time without converting the whole string to ucs2. While doing this cleanup I also fixed several ucs2 string handling bugs. See the commit for details. The following code (which counts the number of occuraces of 'c' in a string) shows how to use the new interface: size_t count_chars(const char *s, char c) { size_t count = 0; while (*s) { size_t size; codepoint_t c2 = next_codepoint(s, &size); if (c2 == c) count++; s += size; } return count; } (This used to be commit 814881f0e50019196b3aa9fbe4aeadbb98172040)
2007-10-10r2684: Free the right talloc context (don't panic when encountering illegal ↵Jelmer Vernooij1-1/+1
multibyte sequences) (This used to be commit b90da2337b83eb261a8072f9d0b13ec28caf3c4d)
2007-10-10r2671: we're getting too many errors caused by the talloc_realloc() API notAndrew Tridgell1-6/+2
taking a context (so when you pass a NULL pointer you end up with memory in a top level context). Fixed it by changing the API to take a context. The context is only used if the pointer you are reallocing is NULL. (This used to be commit 8dc23821c9f54b2f13049b5e608a0cafb81aa540)
2007-10-10r2642: smb_iconv_t is a pointer, so checks against -1 errors should use a castAndrew Tridgell1-1/+1
(This used to be commit 28dcd2202948b003f8d13951395baa4a722593f4)
2007-10-10r2638: do lazy initialisation of iconv handles, so we don't initialise aAndrew Tridgell1-39/+29
handle unless we use it. This saves quite a bit of memory (libc chews a lot loading a handle). Typically smbd now loads 3 handles, instead of 36. (This used to be commit 60e8d154fda548862cd6f8e8c1dadd64b3c4bd9c)
2007-10-10r2552: Character set conversion and string handling updates.Andrew Bartlett1-292/+87
The intial motivation for this commit was to merge in some of the bugfixes present in Samba3's chrcnv and string handling code into Samba4. However, along the way I found a lot of unused functions, and decided to do a bit more... The strlen_m code now does not use a fixed buffer, but more work is needed to finish off other functions in str_util.c. These fixed length buffers hav caused very nasty, hard to chase down bugs at some sites. The strupper_m() function has a strupper_talloc() to replace it (we need to go around and fix more uses, but it's a start). Use of these new functions will avoid bugs where the upper or lowercase version of a string is a different length. I have removed the push_*_allocate functions, which are replaced by calls to push_*_talloc. Likewise, pstring and other 'fixed length' wrappers are removed, where possible. I have removed the first ('base pointer') argument, used by push_ucs2, as the Samba4 way of doing things ensures that this is always on an even boundary anyway. (It was used in only one place, in any case). (This used to be commit dfecb0150627b500cb026b8a4932fe87902ca392)
2007-10-10r2380: nicer error reporting in convert_string()Andrew Tridgell1-6/+9
(This used to be commit 6807d336c2365e4e7f45605d75667dbf05715b34)
2007-10-10r2159: converted samba4 over to UTF-16.Andrew Tridgell1-17/+17
I had previously thought this was unnecessary, as windows doesn't use standards compliant UTF-16, and for filesystem operations treats bytes as UCS-2, but Bjoern Jacke has pointed out to me that this means we don't correctly store extended UTF-16 characters as UTF-8 on disk. This can be seen with (for example) the gothic characters with codepoints above 64k. This commit also adds a LOCAL-ICONV torture test that tests the first 1 million codepoints against the system iconv library, and tests 5 million random UTF-16LE buffers for identical error handling to the system iconv library. the lib/iconv.c changes need backporting to samba3 (This used to be commit 756f28ac95feaa84b42402723d5f7286865c78db)
2007-10-10r2106: try to cope with a wider range of UTF-16 characters when we are usingAndrew Tridgell1-3/+5
an external libiconv library. (This used to be commit 168be7fbd7ae876ded39f73a7835e91b35e67244)
2007-10-10r1196: Remove unused pstring/fstring functions.Andrew Bartlett1-45/+0
Andrew Bartlett (This used to be commit 4f06bf4ab8cc61aec730f84766306119eb976c57)
2007-10-10r934: on ascii strings STR_TERMINATE_ASCII should trigger STR_TERMINATE ↵Andrew Tridgell1-1/+1
behaviour (This used to be commit b7935c96742a3c09ee4bf69f708b19095f497be1)
2007-10-10r831: These functions duplicate the push/pull charcnv interfaces that we useAndrew Bartlett1-83/+0
everywhere else in the Samba code, so remove them for clarity. (ok, so also just never liked the names ;-) Andrew Bartlett (This used to be commit 5f5786ad5ff6cc133a143476e8968b00ed057a62)
2003-12-16added support for big-endian ucs2 strings (as used by big-endianAndrew Tridgell1-0/+1
msrpc). this was easier than I expected! (This used to be commit a0a51af6b746b1f82faaa49d33c17fea9d708fb0)
2003-08-15more fixes from the IRIX compiler (thanks herb!)Andrew Tridgell1-1/+1
(This used to be commit 02d068ba7d81d6db25122144981c63f74ad44025)
2003-08-15fixed some places where we don't brace (flags & STR_UNICODE)Andrew Tridgell1-2/+2
this fixes the samba4 server with ascii clients (This used to be commit c770603ac6c3331a4ac79a650cbbbeb21c778137)
2003-08-13first public release of samba4 codeAndrew Tridgell1-0/+925
(This used to be commit b0510b5428b3461aeb9bbe3cc95f62fc73e2b97f)