From 544919436a3fcc1a21d5ffce61a666d151273136 Mon Sep 17 00:00:00 2001 From: Jelmer Vernooij Date: Thu, 29 Aug 2002 13:28:17 +0000 Subject: Add more documents to the developers guide (This used to be commit e05bdd9eab760b5dc6a4442dc89752080ff1d2c1) --- docs/docbook/devdoc/CodingSuggestions.sgml | 237 ++++++++++++++++ docs/docbook/devdoc/architecture.sgml | 184 ++++++++++++ docs/docbook/devdoc/debug.sgml | 321 +++++++++++++++++++++ docs/docbook/devdoc/dev-doc.sgml | 12 + docs/docbook/devdoc/internals.sgml | 439 +++++++++++++++++++++++++++++ docs/docbook/devdoc/parsing.sgml | 239 ++++++++++++++++ docs/docbook/devdoc/unix-smb.sgml | 311 ++++++++++++++++++++ 7 files changed, 1743 insertions(+) create mode 100644 docs/docbook/devdoc/CodingSuggestions.sgml create mode 100644 docs/docbook/devdoc/architecture.sgml create mode 100644 docs/docbook/devdoc/debug.sgml create mode 100644 docs/docbook/devdoc/internals.sgml create mode 100644 docs/docbook/devdoc/parsing.sgml create mode 100644 docs/docbook/devdoc/unix-smb.sgml diff --git a/docs/docbook/devdoc/CodingSuggestions.sgml b/docs/docbook/devdoc/CodingSuggestions.sgml new file mode 100644 index 0000000000..bdf6d3d17d --- /dev/null +++ b/docs/docbook/devdoc/CodingSuggestions.sgml @@ -0,0 +1,237 @@ + + + + SteveFrench + + + SimoSorce + + + AndrewBartlett + + + TimPotter + + + MartinPool + + + +Coding Suggestions + + +So you want to add code to Samba ... + + + +One of the daunting tasks facing a programmer attempting to write code for +Samba is understanding the various coding conventions used by those most +active in the project. These conventions were mostly unwritten and helped +improve either the portability, stability or consistency of the code. This +document will attempt to document a few of the more important coding +practices used at this time on the Samba project. The coding practices are +expected to change slightly over time, and even to grow as more is learned +about obscure portability considerations. Two existing documents +samba/source/internals.doc and +samba/source/architecture.doc provide +additional information. + + + +The loosely related question of coding style is very personal and this +document does not attempt to address that subject, except to say that I +have observed that eight character tabs seem to be preferred in Samba +source. If you are interested in the topic of coding style, two oft-quoted +documents are: + + + +http://lxr.linux.no/source/Documentation/CodingStyle + + + +http://www.fsf.org/prep/standards_toc.html + + + +But note that coding style in Samba varies due to the many different +programmers who have contributed. + + + +Following are some considerations you should use when adding new code to +Samba. First and foremost remember that: + + + +Portability is a primary consideration in adding function, as is network +compatability with de facto, existing, real world CIFS/SMB implementations. +There are lots of platforms that Samba builds on so use caution when adding +a call to a library function that is not invoked in existing Samba code. +Also note that there are many quite different SMB/CIFS clients that Samba +tries to support, not all of which follow the SNIA CIFS Technical Reference +(or the earlier Microsoft reference documents or the X/Open book on the SMB +Standard) perfectly. + + + +Here are some other suggestions: + + + + + + use d_printf instead of printf for display text + reason: enable auto-substitution of translated language text + + + + use SAFE_FREE instead of free + reason: reduce traps due to null pointers + + + + don't use bzero use memset, or ZERO_STRUCT and ZERO_STRUCTP macros + reason: not POSIX + + + + don't use strcpy and strlen (use safe_* equivalents) + reason: to avoid traps due to buffer overruns + + + + don't use getopt_long, use popt functions instead + reason: portability + + + + explicitly add const qualifiers on parm passing in functions where parm + is input only (somewhat controversial but const can be #defined away) + + + + when passing a va_list as an arg, or assigning one to another + please use the VA_COPY() macro + reason: on some platforms, va_list is a struct that must be + initialized in each function...can SEGV if you don't. + + + + discourage use of threads + reason: portability (also see architecture.doc) + + + + don't explicitly include new header files in C files - new h files + should be included by adding them once to includes.h + reason: consistency + + + + don't explicitly extern functions (they are autogenerated by + "make proto" into proto.h) + reason: consistency + + + + use endian safe macros when unpacking SMBs (see byteorder.h and + internals.doc) + reason: not everyone uses Intel + + + + Note Unicode implications of charset handling (see internals.doc). See + pull_* and push_* and convert_string functions. + reason: Internationalization + + + + Don't assume English only + reason: See above + + + + Try to avoid using in/out parameters (functions that return data which + overwrites input parameters) + reason: Can cause stability problems + + + + Ensure copyright notices are correct, don't append Tridge's name to code + that he didn't write. If you did not write the code, make sure that it + can coexist with the rest of the Samba GPLed code. + + + + Consider usage of DATA_BLOBs for length specified byte-data. + reason: stability + + + + Take advantage of tdbs for database like function + reason: consistency + + + + Don't access the SAM_ACCOUNT structure directly, they should be accessed + via pdb_get...() and pdb_set...() functions. + reason: stability, consistency + + + + Don't check a password directly against the passdb, always use the + check_password() interface. + reason: long term pluggability + + + + Try to use asprintf rather than pstrings and fstrings where possible + + + + Use normal C comments / * instead of C++ comments // like + this. Although the C++ comment format is part of the C99 + standard, some older vendor C compilers do not accept it. + + + + Try to write documentation for API functions and structures + explaining the point of the code, the way it should be used, and + any special conditions or results. Mark these with a double-star + comment start / ** so that they can be picked up by Doxygen, as in + this file. + + + + Keep the scope narrow. This means making functions/variables + static whenever possible. We don't want our namespace + polluted. Each module should have a minimal number of externally + visible functions or variables. + + + + Use function pointers to keep knowledge about particular pieces of + code isolated in one place. We don't want a particular piece of + functionality to be spread out across lots of places - that makes + for fragile, hand to maintain code. Instead, design an interface + and use tables containing function pointers to implement specific + functionality. This is particularly important for command + interpreters. + + + + Think carefully about what it will be like for someone else to add + to and maintain your code. If it would be hard for someone else to + maintain then do it another way. + + + + + +The suggestions above are simply that, suggestions, but the information may +help in reducing the routine rework done on new code. The preceeding list +is expected to change routinely as new support routines and macros are +added. + + diff --git a/docs/docbook/devdoc/architecture.sgml b/docs/docbook/devdoc/architecture.sgml new file mode 100644 index 0000000000..312a63af97 --- /dev/null +++ b/docs/docbook/devdoc/architecture.sgml @@ -0,0 +1,184 @@ + + + + DanShearer + + November 1997 + + +Samba Architecture + + +Introduction + + +This document gives a general overview of how Samba works +internally. The Samba Team has tried to come up with a model which is +the best possible compromise between elegance, portability, security +and the constraints imposed by the very messy SMB and CIFS +protocol. + + + +It also tries to answer some of the frequently asked questions such as: + + + + + Is Samba secure when running on Unix? The xyz platform? + What about the root priveliges issue? + + +Pros and cons of multithreading in various parts of Samba + +Why not have a separate process for name resolution, WINS, and browsing? + + + + + + +Multithreading and Samba + + +People sometimes tout threads as a uniformly good thing. They are very +nice in their place but are quite inappropriate for smbd. nmbd is +another matter, and multi-threading it would be very nice. + + + +The short version is that smbd is not multithreaded, and alternative +servers that take this approach under Unix (such as Syntax, at the +time of writing) suffer tremendous performance penalties and are less +robust. nmbd is not threaded either, but this is because it is not +possible to do it while keeping code consistent and portable across 35 +or more platforms. (This drawback also applies to threading smbd.) + + + +The longer versions is that there are very good reasons for not making +smbd multi-threaded. Multi-threading would actually make Samba much +slower, less scalable, less portable and much less robust. The fact +that we use a separate process for each connection is one of Samba's +biggest advantages. + + + + + +Threading smbd + + +A few problems that would arise from a threaded smbd are: + + + + + It's not only to create threads instead of processes, but you + must care about all variables if they have to be thread specific + (currently they would be global). + + + + if one thread dies (eg. a seg fault) then all threads die. We can + immediately throw robustness out the window. + + + + many of the system calls we make are blocking. Non-blocking + equivalents of many calls are either not available or are awkward (and + slow) to use. So while we block in one thread all clients are + waiting. Imagine if one share is a slow NFS filesystem and the others + are fast, we will end up slowing all clients to the speed of NFS. + + + + you can't run as a different uid in different threads. This means + we would have to switch uid/gid on _every_ SMB packet. It would be + horrendously slow. + + + + the per process file descriptor limit would mean that we could only + support a limited number of clients. + + + + we couldn't use the system locking calls as the locking context of + fcntl() is a process, not a thread. + + + + + + + +Threading nmbd + + +This would be ideal, but gets sunk by portability requirements. + + + +Andrew tried to write a test threads library for nmbd that used only +ansi-C constructs (using setjmp and longjmp). Unfortunately some OSes +defeat this by restricting longjmp to calling addresses that are +shallower than the current address on the stack (apparently AIX does +this). This makes a truly portable threads library impossible. So to +support all our current platforms we would have to code nmbd both with +and without threads, and as the real aim of threads is to make the +code clearer we would not have gained anything. (it is a myth that +threads make things faster. threading is like recursion, it can make +things clear but the same thing can always be done faster by some +other method) + + + +Chris tried to spec out a general design that would abstract threading +vs separate processes (vs other methods?) and make them accessible +through some general API. This doesn't work because of the data +sharing requirements of the protocol (packets in the future depending +on packets now, etc.) At least, the code would work but would be very +clumsy, and besides the fork() type model would never work on Unix. (Is there an OS that it would work on, for nmbd?) + + + +A fork() is cheap, but not nearly cheap enough to do on every UDP +packet that arrives. Having a pool of processes is possible but is +nasty to program cleanly due to the enormous amount of shared data (in +complex structures) between the processes. We can't rely on each +platform having a shared memory system. + + + + + +nbmd Design + + +Originally Andrew used recursion to simulate a multi-threaded +environment, which use the stack enormously and made for really +confusing debugging sessions. Luke Leighton rewrote it to use a +queuing system that keeps state information on each packet. The +first version used a single structure which was used by all the +pending states. As the initialisation of this structure was +done by adding arguments, as the functionality developed, it got +pretty messy. So, it was replaced with a higher-order function +and a pointer to a user-defined memory block. This suddenly +made things much simpler: large numbers of functions could be +made static, and modularised. This is the same principle as used +in NT's kernel, and achieves the same effect as threads, but in +a single process. + + + +Then Jeremy rewrote nmbd. The packet data in nmbd isn't what's on the +wire. It's a nice format that is very amenable to processing but still +keeps the idea of a distinct packet. See "struct packet_struct" in +nameserv.h. It has all the detail but none of the on-the-wire +mess. This makes it ideal for using in disk or memory-based databases +for browsing and WINS support. + + + + diff --git a/docs/docbook/devdoc/debug.sgml b/docs/docbook/devdoc/debug.sgml new file mode 100644 index 0000000000..7e81cc825d --- /dev/null +++ b/docs/docbook/devdoc/debug.sgml @@ -0,0 +1,321 @@ + + + + ChrisHertel + + July 1998 + + +The samba DEBUG system + + +New Output Syntax + + + The syntax of a debugging log file is represented as: + + + + >debugfile< :== { >debugmsg< } + + >debugmsg< :== >debughdr< '\n' >debugtext< + + >debughdr< :== '[' TIME ',' LEVEL ']' FILE ':' [FUNCTION] '(' LINE ')' + + >debugtext< :== { >debugline< } + + >debugline< :== TEXT '\n' + + + +TEXT is a string of characters excluding the newline character. + + + +LEVEL is the DEBUG level of the message (an integer in the range + 0..10). + + + +TIME is a timestamp. + + + +FILE is the name of the file from which the debug message was +generated. + + + +FUNCTION is the function from which the debug message was generated. + + + +LINE is the line number of the debug statement that generated the +message. + + +Basically, what that all means is: + + +A debugging log file is made up of debug messages. + + +Each debug message is made up of a header and text. The header is +separated from the text by a newline. + + +The header begins with the timestamp and debug level of the +message enclosed in brackets. The filename, function, and line +number at which the message was generated follow. The filename is +terminated by a colon, and the function name is terminated by the +parenthesis which contain the line number. Depending upon the +compiler, the function name may be missing (it is generated by the +__FUNCTION__ macro, which is not universally implemented, dangit). + + +The message text is made up of zero or more lines, each terminated +by a newline. + + + +Here's some example output: + + + [1998/08/03 12:55:25, 1] nmbd.c:(659) + Netbios nameserver version 1.9.19-prealpha started. + Copyright Andrew Tridgell 1994-1997 + [1998/08/03 12:55:25, 3] loadparm.c:(763) + Initializing global parameters + + + +Note that in the above example the function names are not listed on +the header line. That's because the example above was generated on an +SGI Indy, and the SGI compiler doesn't support the __FUNCTION__ macro. + + + + + +The DEBUG() Macro + + +Use of the DEBUG() macro is unchanged. DEBUG() takes two parameters. +The first is the message level, the second is the body of a function +call to the Debug1() function. + + +That's confusing. + +Here's an example which may help a bit. If you would write + + +printf( "This is a %s message.\n", "debug" ); + + + +to send the output to stdout, then you would write + + + +DEBUG( 0, ( "This is a %s message.\n", "debug" ) ); + + + +to send the output to the debug file. All of the normal printf() +formatting escapes work. + + + +Note that in the above example the DEBUG message level is set to 0. +Messages at level 0 always print. Basically, if the message level is +less than or equal to the global value DEBUGLEVEL, then the DEBUG +statement is processed. + + + +The output of the above example would be something like: + + + + [1998/07/30 16:00:51, 0] file.c:function(128) + This is a debug message. + + + +Each call to DEBUG() creates a new header *unless* the output produced +by the previous call to DEBUG() did not end with a '\n'. Output to the +debug file is passed through a formatting buffer which is flushed +every time a newline is encountered. If the buffer is not empty when +DEBUG() is called, the new input is simply appended. + + + +...but that's really just a Kludge. It was put in place because +DEBUG() has been used to write partial lines. Here's a simple (dumb) +example of the kind of thing I'm talking about: + + + + DEBUG( 0, ("The test returned " ) ); + if( test() ) + DEBUG(0, ("True") ); + else + DEBUG(0, ("False") ); + DEBUG(0, (".\n") ); + + + +Without the format buffer, the output (assuming test() returned true) +would look like this: + + + + [1998/07/30 16:00:51, 0] file.c:function(256) + The test returned + [1998/07/30 16:00:51, 0] file.c:function(258) + True + [1998/07/30 16:00:51, 0] file.c:function(261) + . + + +Which isn't much use. The format buffer kludge fixes this problem. + + + + + +The DEBUGADD() Macro + + +In addition to the kludgey solution to the broken line problem +described above, there is a clean solution. The DEBUGADD() macro never +generates a header. It will append new text to the current debug +message even if the format buffer is empty. The syntax of the +DEBUGADD() macro is the same as that of the DEBUG() macro. + + + + DEBUG( 0, ("This is the first line.\n" ) ); + DEBUGADD( 0, ("This is the second line.\nThis is the third line.\n" ) ); + + +Produces + + + [1998/07/30 16:00:51, 0] file.c:function(512) + This is the first line. + This is the second line. + This is the third line. + + + + + +The DEBUGLVL() Macro + + +One of the problems with the DEBUG() macro was that DEBUG() lines +tended to get a bit long. Consider this example from +nmbd_sendannounce.c: + + + + DEBUG(3,("send_local_master_announcement: type %x for name %s on subnet %s for workgroup %s\n", + type, global_myname, subrec->subnet_name, work->work_group)); + + + +One solution to this is to break it down using DEBUG() and DEBUGADD(), +as follows: + + + + DEBUG( 3, ( "send_local_master_announcement: " ) ); + DEBUGADD( 3, ( "type %x for name %s ", type, global_myname ) ); + DEBUGADD( 3, ( "on subnet %s ", subrec->subnet_name ) ); + DEBUGADD( 3, ( "for workgroup %s\n", work->work_group ) ); + + + +A similar, but arguably nicer approach is to use the DEBUGLVL() macro. +This macro returns True if the message level is less than or equal to +the global DEBUGLEVEL value, so: + + + + if( DEBUGLVL( 3 ) ) + { + dbgtext( "send_local_master_announcement: " ); + dbgtext( "type %x for name %s ", type, global_myname ); + dbgtext( "on subnet %s ", subrec->subnet_name ); + dbgtext( "for workgroup %s\n", work->work_group ); + } + + +(The dbgtext() function is explained below.) + +There are a few advantages to this scheme: + + +The test is performed only once. + + +You can allocate variables off of the stack that will only be used +within the DEBUGLVL() block. + + +Processing that is only relevant to debug output can be contained +within the DEBUGLVL() block. + + + + + + +New Functions + + +dbgtext() + +This function prints debug message text to the debug file (and +possibly to syslog) via the format buffer. The function uses a +variable argument list just like printf() or Debug1(). The +input is printed into a buffer using the vslprintf() function, +and then passed to format_debug_text(). + +If you use DEBUGLVL() you will probably print the body of the +message using dbgtext(). + + + + +dbghdr() + +This is the function that writes a debug message header. +Headers are not processed via the format buffer. Also note that +if the format buffer is not empty, a call to dbghdr() will not +produce any output. See the comments in dbghdr() for more info. + + + +It is not likely that this function will be called directly. It +is used by DEBUG() and DEBUGADD(). + + + + +format_debug_text() + +This is a static function in debug.c. It stores the output text +for the body of the message in a buffer until it encounters a +newline. When the newline character is found, the buffer is +written to the debug file via the Debug1() function, and the +buffer is reset. This allows us to add the indentation at the +beginning of each line of the message body, and also ensures +that the output is written a line at a time (which cleans up +syslog output). + + + + diff --git a/docs/docbook/devdoc/dev-doc.sgml b/docs/docbook/devdoc/dev-doc.sgml index f84c129f00..76ad512add 100644 --- a/docs/docbook/devdoc/dev-doc.sgml +++ b/docs/docbook/devdoc/dev-doc.sgml @@ -1,5 +1,11 @@ + + + + + + ]> @@ -40,5 +46,11 @@ url="http://www.fsf.org/licenses/gpl.txt">http://www.fsf.org/licenses/gpl.txt &NetBIOS; +&Architecture; +&debug; +&CodingSuggestions; +&internals; +&parsing; +&unix-smb; diff --git a/docs/docbook/devdoc/internals.sgml b/docs/docbook/devdoc/internals.sgml new file mode 100644 index 0000000000..79524347b6 --- /dev/null +++ b/docs/docbook/devdoc/internals.sgml @@ -0,0 +1,439 @@ + + + + DavidChappell + +
David.Chappell@mail.trincoll.edu
+
+
+ 8 May 1996 +
+ +Samba Internals + + +Character Handling + +This section describes character set handling in Samba, as implemented in +Samba 3.0 and above + + + +In the past Samba had very ad-hoc character set handling. Scattered +throughout the code were numerous calls which converted particular +strings to/from DOS codepages. The problem is that there was no way of +telling if a particular char* is in dos codepage or unix +codepage. This led to a nightmare of code that tried to cope with +particular cases without handlingt the general case. + + + +The new functions + + +The new system works like this: + + + + + all char* strings inside Samba are "unix" strings. These are + multi-byte strings that are in the charset defined by the "unix + charset" option in smb.conf. + + + + there is no single fixed character set for unix strings, but any + character set that is used does need the following properties: + + + + + must not contain NULLs except for termination + + + + must be 7-bit compatible with C strings, so that a constant + string or character in C will be byte-for-byte identical to the + equivalent string in the chosen character set. + + + + when you uppercase or lowercase a string it does not become + longer than the original string + + + + must be able to correctly hold all characters that your client + will throw at it + + + + + For example, UTF-8 is fine, and most multi-byte asian character sets + are fine, but UCS2 could not be used for unix strings as they + contain nulls. + + + + + when you need to put a string into a buffer that will be sent on the + wire, or you need a string in a character set format that is + compatible with the clients character set then you need to use a + pull_ or push_ function. The pull_ functions pull a string from a + wire buffer into a (multi-byte) unix string. The push_ functions + push a string out to a wire buffer. + + + + the two main pull_ and push_ functions you need to understand are + pull_string and push_string. These functions take a base pointer + that should point at the start of the SMB packet that the string is + in. The functions will check the flags field in this packet to + automatically determine if the packet is marked as a unicode packet, + and they will choose whether to use unicode for this string based on + that flag. You may also force this decision using the STR_UNICODE or + STR_ASCII flags. For use in smbd/ and libsmb/ there are wrapper + functions clistr_ and srvstr_ that call the pull_/push_ functions + with the appropriate first argument. + + + + You may also call the pull_ascii/pull_ucs2 or push_ascii/push_ucs2 + functions if you know that a particular string is ascii or + unicode. There are also a number of other convenience functions in + charcnv.c that call the pull_/push_ functions with particularly + common arguments, such as pull_ascii_pstring() + + + + + The biggest thing to remember is that internal (unix) strings in Samba + may now contain multi-byte characters. This means you cannot assume + that characters are always 1 byte long. Often this means that you will + have to convert strings to ucs2 and back again in order to do some + (seemingly) simple task. For examples of how to do this see functions + like strchr_m(). I know this is very slow, and we will eventually + speed it up but right now we want this stuff correct not fast. + + + + all lp_ functions now return unix strings. The magic "DOS" flag on + parameters is gone. + + + + all vfs functions take unix strings. Don't convert when passing to them + + + + + + + +Macros in byteorder.h + + +This section describes the macros defined in byteorder.h. These macros +are used extensively in the Samba code. + + + +CVAL(buf,pos) + + +returns the byte at offset pos within buffer buf as an unsigned character. + + + + +PVAL(buf,pos) +returns the value of CVAL(buf,pos) cast to type unsigned integer. + + + +SCVAL(buf,pos,val) +sets the byte at offset pos within buffer buf to value val. + + + +SVAL(buf,pos) + + returns the value of the unsigned short (16 bit) little-endian integer at + offset pos within buffer buf. An integer of this type is sometimes + refered to as "USHORT". + + + + +IVAL(buf,pos) +returns the value of the unsigned 32 bit little-endian integer at offset +pos within buffer buf. + + + +SVALS(buf,pos) +returns the value of the signed short (16 bit) little-endian integer at +offset pos within buffer buf. + + + +IVALS(buf,pos) +returns the value of the signed 32 bit little-endian integer at offset pos +within buffer buf. + + + +SSVAL(buf,pos,val) +sets the unsigned short (16 bit) little-endian integer at offset pos within +buffer buf to value val. + + + +SIVAL(buf,pos,val) +sets the unsigned 32 bit little-endian integer at offset pos within buffer +buf to the value val. + + + +SSVALS(buf,pos,val) +sets the short (16 bit) signed little-endian integer at offset pos within +buffer buf to the value val. + + + +SIVALS(buf,pos,val) +sets the signed 32 bit little-endian integer at offset pos withing buffer +buf to the value val. + + + +RSVAL(buf,pos) +returns the value of the unsigned short (16 bit) big-endian integer at +offset pos within buffer buf. + + + +RIVAL(buf,pos) +returns the value of the unsigned 32 bit big-endian integer at offset +pos within buffer buf. + + + +RSSVAL(buf,pos,val) +sets the value of the unsigned short (16 bit) big-endian integer at +offset pos within buffer buf to value val. +refered to as "USHORT". + + + +RSIVAL(buf,pos,val) +sets the value of the unsigned 32 bit big-endian integer at offset +pos within buffer buf to value val. + + + + + + +LAN Manager Samba API + + +This section describes the functions need to make a LAN Manager RPC call. +This information had been obtained by examining the Samba code and the LAN +Manager 2.0 API documentation. It should not be considered entirely +reliable. + + + + +call_api(int prcnt, int drcnt, int mprcnt, int mdrcnt, + char *param, char *data, char **rparam, char **rdata); + + + + +This function is defined in client.c. It uses an SMB transaction to call a +remote api. + + + +Parameters + +The parameters are as follows: + + + + prcnt: the number of bytes of parameters begin sent. + + + drcnt: the number of bytes of data begin sent. + + + mprcnt: the maximum number of bytes of parameters which should be returned + + + mdrcnt: the maximum number of bytes of data which should be returned + + + param: a pointer to the parameters to be sent. + + + data: a pointer to the data to be sent. + + + rparam: a pointer to a pointer which will be set to point to the returned + paramters. The caller of call_api() must deallocate this memory. + + + rdata: a pointer to a pointer which will be set to point to the returned + data. The caller of call_api() must deallocate this memory. + + + + +These are the parameters which you ought to send, in the order of their +appearance in the parameter block: + + + + + +An unsigned 16 bit integer API number. You should set this value with +SSVAL(). I do not know where these numbers are described. + + + +An ASCIIZ string describing the parameters to the API function as defined +in the LAN Manager documentation. The first parameter, which is the server +name, is ommited. This string is based uppon the API function as described +in the manual, not the data which is actually passed. + + + +An ASCIIZ string describing the data structure which ought to be returned. + + + +Any parameters which appear in the function call, as defined in the LAN +Manager API documentation, after the "Server" and up to and including the +"uLevel" parameters. + + + +An unsigned 16 bit integer which gives the size in bytes of the buffer we +will use to receive the returned array of data structures. Presumably this +should be the same as mdrcnt. This value should be set with SSVAL(). + + + +An ASCIIZ string describing substructures which should be returned. If no +substructures apply, this string is of zero length. + + + + + +The code in client.c always calls call_api() with no data. It is unclear +when a non-zero length data buffer would be sent. + + + + + +Return value + + +The returned parameters (pointed to by rparam), in their order of appearance +are: + + + + +An unsigned 16 bit integer which contains the API function's return code. +This value should be read with SVAL(). + + + +An adjustment which tells the amount by which pointers in the returned +data should be adjusted. This value should be read with SVAL(). Basically, +the address of the start of the returned data buffer should have the returned +pointer value added to it and then have this value subtracted from it in +order to obtain the currect offset into the returned data buffer. + + + +A count of the number of elements in the array of structures returned. +It is also possible that this may sometimes be the number of bytes returned. + + + + +When call_api() returns, rparam points to the returned parameters. The +first if these is the result code. It will be zero if the API call +suceeded. This value by be read with "SVAL(rparam,0)". + + + +The second parameter may be read as "SVAL(rparam,2)". It is a 16 bit offset +which indicates what the base address of the returned data buffer was when +it was built on the server. It should be used to correct pointer before +use. + + + +The returned data buffer contains the array of returned data structures. +Note that all pointers must be adjusted before use. The function +fix_char_ptr() in client.c can be used for this purpose. + + + +The third parameter (which may be read as "SVAL(rparam,4)") has something to +do with indicating the amount of data returned or possibly the amount of +data which can be returned if enough buffer space is allowed. + + + + + + +Code character table + +Certain data structures are described by means of ASCIIz strings containing +code characters. These are the code characters: + + + + +W a type byte little-endian unsigned integer + + +N a count of substructures which follow + + +D a four byte little-endian unsigned integer + + +B a byte (with optional count expressed as trailing ASCII digits) + + +z a four byte offset to a NULL terminated string + + +l a four byte offset to non-string user data + + +b an offset to data (with count expressed as trailing ASCII digits) + + +r pointer to returned data buffer??? + + +L length in bytes of returned data buffer??? + + +h number of bytes of information available??? + + + + +
diff --git a/docs/docbook/devdoc/parsing.sgml b/docs/docbook/devdoc/parsing.sgml new file mode 100644 index 0000000000..0121935d26 --- /dev/null +++ b/docs/docbook/devdoc/parsing.sgml @@ -0,0 +1,239 @@ + + + + ChrisHertel + + November 1997 + + +The smb.conf file + + +Lexical Analysis + + +Basically, the file is processed on a line by line basis. There are +four types of lines that are recognized by the lexical analyzer +(params.c): + + + + +Blank lines - Lines containing only whitespace. + + +Comment lines - Lines beginning with either a semi-colon or a +pound sign (';' or '#'). + + +Section header lines - Lines beginning with an open square bracket ('['). + + +Parameter lines - Lines beginning with any other character. +(The default line type.) + + + + +The first two are handled exclusively by the lexical analyzer, which +ignores them. The latter two line types are scanned for + + + + + - Section names + + + - Parameter names + + + - Parameter values + + + + +These are the only tokens passed to the parameter loader +(loadparm.c). Parameter names and values are divided from one +another by an equal sign: '='. + + + +Handling of Whitespace + + +Whitespace is defined as all characters recognized by the isspace() +function (see ctype(3C)) except for the newline character ('\n') +The newline is excluded because it identifies the end of the line. + + + + +The lexical analyzer scans past white space at the beginning of a line. + + + +Section and parameter names may contain internal white space. All +whitespace within a name is compressed to a single space character. + + + +Internal whitespace within a parameter value is kept verbatim with +the exception of carriage return characters ('\r'), all of which +are removed. + + + +Leading and trailing whitespace is removed from names and values. + + + + + + + +Handling of Line Continuation + + +Long section header and parameter lines may be extended across +multiple lines by use of the backslash character ('\\'). Line +continuation is ignored for blank and comment lines. + + + +If the last (non-whitespace) character within a section header or on +a parameter line is a backslash, then the next line will be +(logically) concatonated with the current line by the lexical +analyzer. For example: + + + + param name = parameter value string \ + with line continuation. + + +Would be read as + + + param name = parameter value string with line continuation. + + + +Note that there are five spaces following the word 'string', +representing the one space between 'string' and '\\' in the top +line, plus the four preceeding the word 'with' in the second line. +(Yes, I'm counting the indentation.) + + + +Line continuation characters are ignored on blank lines and at the end +of comments. They are *only* recognized within section and parameter +lines. + + + + + +Line Continuation Quirks + +Note the following example: + + + param name = parameter value string \ + \ + with line continuation. + + + +The middle line is *not* parsed as a blank line because it is first +concatonated with the top line. The result is + + + +param name = parameter value string with line continuation. + + +The same is true for comment lines. + + + param name = parameter value string \ + ; comment \ + with a comment. + + +This becomes: + + +param name = parameter value string ; comment with a comment. + + + +On a section header line, the closing bracket (']') is considered a +terminating character, and the rest of the line is ignored. The lines + + + + [ section name ] garbage \ + param name = value + + +are read as + + + [section name] + param name = value + + + + + + +Syntax + +The syntax of the smb.conf file is as follows: + + + <file> :== { <section> } EOF + <section> :== <section header> { <parameter line> } + <section header> :== '[' NAME ']' + <parameter line> :== NAME '=' VALUE NL + + +Basically, this means that + + + + a file is made up of zero or more sections, and is terminated by + an EOF (we knew that). + + + + A section is made up of a section header followed by zero or more + parameter lines. + + + + A section header is identified by an opening bracket and + terminated by the closing bracket. The enclosed NAME identifies + the section. + + + + A parameter line is divided into a NAME and a VALUE. The *first* + equal sign on the line separates the NAME from the VALUE. The + VALUE is terminated by a newline character (NL = '\n'). + + + + + +About params.c + + +The parsing of the config file is a bit unusual if you are used to +lex, yacc, bison, etc. Both lexical analysis (scanning) and parsing +are performed by params.c. Values are loaded via callbacks to +loadparm.c. + + + + diff --git a/docs/docbook/devdoc/unix-smb.sgml b/docs/docbook/devdoc/unix-smb.sgml new file mode 100644 index 0000000000..be79698857 --- /dev/null +++ b/docs/docbook/devdoc/unix-smb.sgml @@ -0,0 +1,311 @@ + + + + AndrewTridgell + + April 1995 + + +NetBIOS in a Unix World + + +Introduction + +This is a short document that describes some of the issues that +confront a SMB implementation on unix, and how Samba copes with +them. They may help people who are looking at unix<->PC +interoperability. + + + +It was written to help out a person who was writing a paper on unix to +PC connectivity. + + + + + +Usernames + +The SMB protocol has only a loose username concept. Early SMB +protocols (such as CORE and COREPLUS) have no username concept at +all. Even in later protocols clients often attempt operations +(particularly printer operations) without first validating a username +on the server. + + + +Unix security is based around username/password pairs. A unix box +should not allow clients to do any substantive operation without some +sort of validation. + + + +The problem mostly manifests itself when the unix server is in "share +level" security mode. This is the default mode as the alternative +"user level" security mode usually forces a client to connect to the +server as the same user for each connected share, which is +inconvenient in many sites. + + + +In "share level" security the client normally gives a username in the +"session setup" protocol, but does not supply an accompanying +password. The client then connects to resources using the "tree +connect" protocol, and supplies a password. The problem is that the +user on the PC types the username and the password in different +contexts, unaware that they need to go together to give access to the +server. The username is normally the one the user typed in when they +"logged onto" the PC (this assumes Windows for Workgroups). The +password is the one they chose when connecting to the disk or printer. + + + +The user often chooses a totally different username for their login as +for the drive connection. Often they also want to access different +drives as different usernames. The unix server needs some way of +divining the correct username to combine with each password. + + + +Samba tries to avoid this problem using several methods. These succeed +in the vast majority of cases. The methods include username maps, the +service%user syntax, the saving of session setup usernames for later +validation and the derivation of the username from the service name +(either directly or via the user= option). + + + + + +File Ownership + + +The commonly used SMB protocols have no way of saying "you can't do +that because you don't own the file". They have, in fact, no concept +of file ownership at all. + + + +This brings up all sorts of interesting problems. For example, when +you copy a file to a unix drive, and the file is world writeable but +owned by another user the file will transfer correctly but will +receive the wrong date. This is because the utime() call under unix +only succeeds for the owner of the file, or root, even if the file is +world writeable. For security reasons Samba does all file operations +as the validated user, not root, so the utime() fails. This can stuff +up shared development diectories as programs like "make" will not get +file time comparisons right. + + + +There are several possible solutions to this problem, including +username mapping, and forcing a specific username for particular +shares. + + + + + +Passwords + + +Many SMB clients uppercase passwords before sending them. I have no +idea why they do this. Interestingly WfWg uppercases the password only +if the server is running a protocol greater than COREPLUS, so +obviously it isn't just the data entry routines that are to blame. + + + +Unix passwords are case sensitive. So if users use mixed case +passwords they are in trouble. + + + +Samba can try to cope with this by either using the "password level" +option which causes Samba to try the offered password with up to the +specified number of case changes, or by using the "password server" +option which allows Samba to do its validation via another machine +(typically a WinNT server). + + + +Samba supports the password encryption method used by SMB +clients. Note that the use of password encryption in Microsoft +networking leads to password hashes that are "plain text equivalent". +This means that it is *VERY* important to ensure that the Samba +smbpasswd file containing these password hashes is only readable +by the root user. See the documentation ENCRYPTION.txt for more +details. + + + + + +Locking + +The locking calls available under a DOS/Windows environment are much +richer than those available in unix. This means a unix server (like +Samba) choosing to use the standard fcntl() based unix locking calls +to implement SMB locking has to improvise a bit. + + + +One major problem is that dos locks can be in a 32 bit (unsigned) +range. Unix locking calls are 32 bits, but are signed, giving only a 31 +bit range. Unfortunately OLE2 clients use the top bit to select a +locking range used for OLE semaphores. + + + +To work around this problem Samba compresses the 32 bit range into 31 +bits by appropriate bit shifting. This seems to work but is not +ideal. In a future version a separate SMB lockd may be added to cope +with the problem. + + + +It also doesn't help that many unix lockd daemons are very buggy and +crash at the slightest provocation. They normally go mostly unused in +a unix environment because few unix programs use byte range +locking. The stress of huge numbers of lock requests from dos/windows +clients can kill the daemon on some systems. + + + +The second major problem is the "opportunistic locking" requested by +some clients. If a client requests opportunistic locking then it is +asking the server to notify it if anyone else tries to do something on +the same file, at which time the client will say if it is willing to +give up its lock. Unix has no simple way of implementing +opportunistic locking, and currently Samba has no support for it. + + + + + +Deny Modes + + +When a SMB client opens a file it asks for a particular "deny mode" to +be placed on the file. These modes (DENY_NONE, DENY_READ, DENY_WRITE, +DENY_ALL, DENY_FCB and DENY_DOS) specify what actions should be +allowed by anyone else who tries to use the file at the same time. If +DENY_READ is placed on the file, for example, then any attempt to open +the file for reading should fail. + + + +Unix has no equivalent notion. To implement this Samba uses either lock +files based on the files inode and placed in a separate lock +directory or a shared memory implementation. The lock file method +is clumsy and consumes processing and file resources, +the shared memory implementation is vastly prefered and is turned on +by default for those systems that support it. + + + + + +Trapdoor UIDs + +A SMB session can run with several uids on the one socket. This +happens when a user connects to two shares with different +usernames. To cope with this the unix server needs to switch uids +within the one process. On some unixes (such as SCO) this is not +possible. This means that on those unixes the client is restricted to +a single uid. + + + +Note that you can also get the "trapdoor uid" message for other +reasons. Please see the FAQ for details. + + + + + +Port numbers + +There is a convention that clients on sockets use high "unprivilaged" +port numbers (>1000) and connect to servers on low "privilaged" port +numbers. This is enforced in Unix as non-root users can't open a +socket for listening on port numbers less than 1000. + + + +Most PC based SMB clients (such as WfWg and WinNT) don't follow this +convention completely. The main culprit is the netbios nameserving on +udp port 137. Name query requests come from a source port of 137. This +is a problem when you combine it with the common firewalling technique +of not allowing incoming packets on low port numbers. This means that +these clients can't query a netbios nameserver on the other side of a +low port based firewall. + + + +The problem is more severe with netbios node status queries. I've +found that WfWg, Win95 and WinNT3.5 all respond to netbios node status +queries on port 137 no matter what the source port was in the +request. This works between machines that are both using port 137, but +it means it's not possible for a unix user to do a node status request +to any of these OSes unless they are running as root. The answer comes +back, but it goes to port 137 which the unix user can't listen +on. Interestingly WinNT3.1 got this right - it sends node status +responses back to the source port in the request. + + + + + +Protocol Complexity + +There are many "protocol levels" in the SMB protocol. It seems that +each time new functionality was added to a Microsoft operating system, +they added the equivalent functions in a new protocol level of the SMB +protocol to "externalise" the new capabilities. + + + +This means the protocol is very "rich", offering many ways of doing +each file operation. This means SMB servers need to be complex and +large. It also means it is very difficult to make them bug free. It is +not just Samba that suffers from this problem, other servers such as +WinNT don't support every variation of every call and it has almost +certainly been a headache for MS developers to support the myriad of +SMB calls that are available. + + + +There are about 65 "top level" operations in the SMB protocol (things +like SMBread and SMBwrite). Some of these include hundreds of +sub-functions (SMBtrans has at least 120 sub-functions, like +DosPrintQAdd and NetSessionEnum). All of them take several options +that can change the way they work. Many take dozens of possible +"information levels" that change the structures that need to be +returned. Samba supports all but 2 of the "top level" functions. It +supports only 8 (so far) of the SMBtrans sub-functions. Even NT +doesn't support them all. + + + +Samba currently supports up to the "NT LM 0.12" protocol, which is the +one preferred by Win95 and WinNT3.5. Luckily this protocol level has a +"capabilities" field which specifies which super-duper new-fangled +options the server suports. This helps to make the implementation of +this protocol level much easier. + + + +There is also a problem with the SMB specications. SMB is a X/Open +spec, but the X/Open book is far from ideal, and fails to cover many +important issues, leaving much to the imagination. Microsoft recently +renamed the SMB protocol CIFS (Common Internet File System) and have +published new specifications. These are far superior to the old +X/Open documents but there are still undocumented calls and features. +This specification is actively being worked on by a CIFS developers +mailing list hosted by Microsft. + + + + -- cgit