diff options
Diffstat (limited to 'docs/devel/internals.xml')
-rw-r--r-- | docs/devel/internals.xml | 440 |
1 files changed, 440 insertions, 0 deletions
diff --git a/docs/devel/internals.xml b/docs/devel/internals.xml new file mode 100644 index 0000000000..982cfd2e10 --- /dev/null +++ b/docs/devel/internals.xml @@ -0,0 +1,440 @@ +<chapter id="internals"> +<chapterinfo> + <author> + <firstname>David</firstname><surname>Chappell</surname> + <affiliation> + <address><email>David.Chappell@mail.trincoll.edu</email></address> + </affiliation> + </author> + <pubdate>8 May 1996</pubdate> +</chapterinfo> + +<title>Samba Internals</title> + +<sect1> +<title>Character Handling</title> +<para> +This section describes character set handling in Samba, as implemented in +Samba 3.0 and above +</para> + +<para> +In the past Samba had very ad-hoc character set handling. Scattered +throughout the code were numerous calls which converted particular +strings to/from DOS codepages. The problem is that there was no way of +telling if a particular char* is in dos codepage or unix +codepage. This led to a nightmare of code that tried to cope with +particular cases without handlingt the general case. +</para> +</sect1> + +<sect1> +<title>The new functions</title> + +<para> +The new system works like this: +</para> + +<orderedlist> +<listitem><para> + all char* strings inside Samba are "unix" strings. These are + multi-byte strings that are in the charset defined by the "unix + charset" option in smb.conf. +</para></listitem> + +<listitem><para> + there is no single fixed character set for unix strings, but any + character set that is used does need the following properties: + </para> + <orderedlist> + + <listitem><para> + must not contain NULLs except for termination + </para></listitem> + + <listitem><para> + must be 7-bit compatible with C strings, so that a constant + string or character in C will be byte-for-byte identical to the + equivalent string in the chosen character set. + </para></listitem> + + <listitem><para> + when you uppercase or lowercase a string it does not become + longer than the original string + </para></listitem> + + <listitem><para> + must be able to correctly hold all characters that your client + will throw at it + </para></listitem> + </orderedlist> + + <para> + For example, UTF-8 is fine, and most multi-byte asian character sets + are fine, but UCS2 could not be used for unix strings as they + contain nulls. + </para> +</listitem> + +<listitem><para> + when you need to put a string into a buffer that will be sent on the + wire, or you need a string in a character set format that is + compatible with the clients character set then you need to use a + pull_ or push_ function. The pull_ functions pull a string from a + wire buffer into a (multi-byte) unix string. The push_ functions + push a string out to a wire buffer. +</para></listitem> + +<listitem><para> + the two main pull_ and push_ functions you need to understand are + pull_string and push_string. These functions take a base pointer + that should point at the start of the SMB packet that the string is + in. The functions will check the flags field in this packet to + automatically determine if the packet is marked as a unicode packet, + and they will choose whether to use unicode for this string based on + that flag. You may also force this decision using the STR_UNICODE or + STR_ASCII flags. For use in smbd/ and libsmb/ there are wrapper + functions clistr_ and srvstr_ that call the pull_/push_ functions + with the appropriate first argument. + </para> + + <para> + You may also call the pull_ascii/pull_ucs2 or push_ascii/push_ucs2 + functions if you know that a particular string is ascii or + unicode. There are also a number of other convenience functions in + charcnv.c that call the pull_/push_ functions with particularly + common arguments, such as pull_ascii_pstring() + </para> +</listitem> + +<listitem><para> + The biggest thing to remember is that internal (unix) strings in Samba + may now contain multi-byte characters. This means you cannot assume + that characters are always 1 byte long. Often this means that you will + have to convert strings to ucs2 and back again in order to do some + (seemingly) simple task. For examples of how to do this see functions + like strchr_m(). I know this is very slow, and we will eventually + speed it up but right now we want this stuff correct not fast. +</para></listitem> + +<listitem><para> + all lp_ functions now return unix strings. The magic "DOS" flag on + parameters is gone. +</para></listitem> + +<listitem><para> + all vfs functions take unix strings. Don't convert when passing to them +</para></listitem> + +</orderedlist> + +</sect1> + +<sect1> +<title>Macros in byteorder.h</title> + +<para> +This section describes the macros defined in byteorder.h. These macros +are used extensively in the Samba code. +</para> + +<sect2> +<title>CVAL(buf,pos)</title> + +<para> +returns the byte at offset pos within buffer buf as an unsigned character. +</para> +</sect2> + +<sect2> +<title>PVAL(buf,pos)</title> +<para>returns the value of CVAL(buf,pos) cast to type unsigned integer.</para> +</sect2> + +<sect2> +<title>SCVAL(buf,pos,val)</title> +<para>sets the byte at offset pos within buffer buf to value val.</para> +</sect2> + +<sect2> +<title>SVAL(buf,pos)</title> +<para> + returns the value of the unsigned short (16 bit) little-endian integer at + offset pos within buffer buf. An integer of this type is sometimes + refered to as "USHORT". +</para> +</sect2> + +<sect2> +<title>IVAL(buf,pos)</title> +<para>returns the value of the unsigned 32 bit little-endian integer at offset +pos within buffer buf.</para> +</sect2> + +<sect2> +<title>SVALS(buf,pos)</title> +<para>returns the value of the signed short (16 bit) little-endian integer at +offset pos within buffer buf.</para> +</sect2> + +<sect2> +<title>IVALS(buf,pos)</title> +<para>returns the value of the signed 32 bit little-endian integer at offset pos +within buffer buf.</para> +</sect2> + +<sect2> +<title>SSVAL(buf,pos,val)</title> +<para>sets the unsigned short (16 bit) little-endian integer at offset pos within +buffer buf to value val.</para> +</sect2> + +<sect2> +<title>SIVAL(buf,pos,val)</title> +<para>sets the unsigned 32 bit little-endian integer at offset pos within buffer +buf to the value val.</para> +</sect2> + +<sect2> +<title>SSVALS(buf,pos,val)</title> +<para>sets the short (16 bit) signed little-endian integer at offset pos within +buffer buf to the value val.</para> +</sect2> + +<sect2> +<title>SIVALS(buf,pos,val)</title> +<para>sets the signed 32 bit little-endian integer at offset pos withing buffer +buf to the value val.</para> +</sect2> + +<sect2> +<title>RSVAL(buf,pos)</title> +<para>returns the value of the unsigned short (16 bit) big-endian integer at +offset pos within buffer buf.</para> +</sect2> + +<sect2> +<title>RIVAL(buf,pos)</title> +<para>returns the value of the unsigned 32 bit big-endian integer at offset +pos within buffer buf.</para> +</sect2> + +<sect2> +<title>RSSVAL(buf,pos,val)</title> +<para>sets the value of the unsigned short (16 bit) big-endian integer at +offset pos within buffer buf to value val. +refered to as "USHORT".</para> +</sect2> + +<sect2> +<title>RSIVAL(buf,pos,val)</title> +<para>sets the value of the unsigned 32 bit big-endian integer at offset +pos within buffer buf to value val.</para> +</sect2> + +</sect1> + + +<sect1> +<title>LAN Manager Samba API</title> + +<para> +This section describes the functions need to make a LAN Manager RPC call. +This information had been obtained by examining the Samba code and the LAN +Manager 2.0 API documentation. It should not be considered entirely +reliable. +</para> + +<para> +<programlisting> +call_api(int prcnt, int drcnt, int mprcnt, int mdrcnt, + char *param, char *data, char **rparam, char **rdata); +</programlisting> +</para> + +<para> +This function is defined in client.c. It uses an SMB transaction to call a +remote api. +</para> + +<sect2> +<title>Parameters</title> + +<para>The parameters are as follows:</para> + +<orderedlist> +<listitem><para> + prcnt: the number of bytes of parameters begin sent. +</para></listitem> +<listitem><para> + drcnt: the number of bytes of data begin sent. +</para></listitem> +<listitem><para> + mprcnt: the maximum number of bytes of parameters which should be returned +</para></listitem> +<listitem><para> + mdrcnt: the maximum number of bytes of data which should be returned +</para></listitem> +<listitem><para> + param: a pointer to the parameters to be sent. +</para></listitem> +<listitem><para> + data: a pointer to the data to be sent. +</para></listitem> +<listitem><para> + rparam: a pointer to a pointer which will be set to point to the returned + paramters. The caller of call_api() must deallocate this memory. +</para></listitem> +<listitem><para> + rdata: a pointer to a pointer which will be set to point to the returned + data. The caller of call_api() must deallocate this memory. +</para></listitem> +</orderedlist> + +<para> +These are the parameters which you ought to send, in the order of their +appearance in the parameter block: +</para> + +<orderedlist> + +<listitem><para> +An unsigned 16 bit integer API number. You should set this value with +SSVAL(). I do not know where these numbers are described. +</para></listitem> + +<listitem><para> +An ASCIIZ string describing the parameters to the API function as defined +in the LAN Manager documentation. The first parameter, which is the server +name, is ommited. This string is based uppon the API function as described +in the manual, not the data which is actually passed. +</para></listitem> + +<listitem><para> +An ASCIIZ string describing the data structure which ought to be returned. +</para></listitem> + +<listitem><para> +Any parameters which appear in the function call, as defined in the LAN +Manager API documentation, after the "Server" and up to and including the +"uLevel" parameters. +</para></listitem> + +<listitem><para> +An unsigned 16 bit integer which gives the size in bytes of the buffer we +will use to receive the returned array of data structures. Presumably this +should be the same as mdrcnt. This value should be set with SSVAL(). +</para></listitem> + +<listitem><para> +An ASCIIZ string describing substructures which should be returned. If no +substructures apply, this string is of zero length. +</para></listitem> + +</orderedlist> + +<para> +The code in client.c always calls call_api() with no data. It is unclear +when a non-zero length data buffer would be sent. +</para> + +</sect2> + +<sect2> +<title>Return value</title> + +<para> +The returned parameters (pointed to by rparam), in their order of appearance +are:</para> + +<orderedlist> + +<listitem><para> +An unsigned 16 bit integer which contains the API function's return code. +This value should be read with SVAL(). +</para></listitem> + +<listitem><para> +An adjustment which tells the amount by which pointers in the returned +data should be adjusted. This value should be read with SVAL(). Basically, +the address of the start of the returned data buffer should have the returned +pointer value added to it and then have this value subtracted from it in +order to obtain the currect offset into the returned data buffer. +</para></listitem> + +<listitem><para> +A count of the number of elements in the array of structures returned. +It is also possible that this may sometimes be the number of bytes returned. +</para></listitem> +</orderedlist> + +<para> +When call_api() returns, rparam points to the returned parameters. The +first if these is the result code. It will be zero if the API call +suceeded. This value by be read with "SVAL(rparam,0)". +</para> + +<para> +The second parameter may be read as "SVAL(rparam,2)". It is a 16 bit offset +which indicates what the base address of the returned data buffer was when +it was built on the server. It should be used to correct pointer before +use. +</para> + +<para> +The returned data buffer contains the array of returned data structures. +Note that all pointers must be adjusted before use. The function +fix_char_ptr() in client.c can be used for this purpose. +</para> + +<para> +The third parameter (which may be read as "SVAL(rparam,4)") has something to +do with indicating the amount of data returned or possibly the amount of +data which can be returned if enough buffer space is allowed. +</para> + +</sect2> +</sect1> + +<sect1> +<title>Code character table</title> +<para> +Certain data structures are described by means of ASCIIz strings containing +code characters. These are the code characters: +</para> + +<orderedlist> +<listitem><para> +W a type byte little-endian unsigned integer +</para></listitem> +<listitem><para> +N a count of substructures which follow +</para></listitem> +<listitem><para> +D a four byte little-endian unsigned integer +</para></listitem> +<listitem><para> +B a byte (with optional count expressed as trailing ASCII digits) +</para></listitem> +<listitem><para> +z a four byte offset to a NULL terminated string +</para></listitem> +<listitem><para> +l a four byte offset to non-string user data +</para></listitem> +<listitem><para> +b an offset to data (with count expressed as trailing ASCII digits) +</para></listitem> +<listitem><para> +r pointer to returned data buffer??? +</para></listitem> +<listitem><para> +L length in bytes of returned data buffer??? +</para></listitem> +<listitem><para> +h number of bytes of information available??? +</para></listitem> +</orderedlist> + +</sect1> +</chapter> |