diff options
-rw-r--r-- | prog_guide.txt | 569 |
1 files changed, 569 insertions, 0 deletions
diff --git a/prog_guide.txt b/prog_guide.txt new file mode 100644 index 0000000000..c000f66226 --- /dev/null +++ b/prog_guide.txt @@ -0,0 +1,569 @@ +THIS IS INCOMPLETE! I'M ONLY COMMITING IT IN ORDER TO SOLICIT COMMENTS +FROM A FEW PEOPLE. DON'T TAKE THIS AS THE FINAL VERSION YET. + + + + +Samba4 Programming Guide +------------------------ + +The internals of Samba4 are quite different from previous versions of +Samba, so even if you are an experienced Samba developer please take +the time to read through this document. + +This document will explain both the broad structure of Samba4, and +some of the common coding elements such as memory management and +dealing with macros. + + +Coding Style +------------ + +In past versions of Samba we have basically let each programmer choose +their own programming style. Unfortunately the result has often been +that code that other members of the team find difficult to read. For +Samba version 4 I would like to standardise on a common coding style +to make the whole tree more readable. For those of you who are +horrified at the idea of having to learn a new style, I can assure you +that it isn't as painful as you might think. I was forced to adopt a +new style when I started working on the Linux kernel, and after some +initial pain found it quite easy. + +That said, I don't want to invent a new style, instead I would like to +adopt the style used by the Linux kernel. It is a widely used style +with plenty of support tools available. See Documentation/CodingStyle +in the Linux source tree. This is the style that I have used to write +all of the core infrastructure for Samba4 and I think that we should +continue with that style. + +I also think that we should most definately *not* adopt an automatic +reformatting system in cvs (or whatever other source code system we +end up using in the future). Such automatic formatters are, in my +experience, incredibly error prone and don't understand the necessary +exceptions. I don't mind if people use automated tools to reformat +their own code before they commit it, but please do not run such +automated tools on large slabs of existing code without being willing +to spend a *lot* of time hand checking the results. + +Finally, I think that for code that is parsing or formatting protocol +packets the code layout should strongly reflect the packet +format. That means ordring the code so that it parses in the same +order as the packet is stored on the while (where possible) and using +white space to align packet offsets so that a reader can immediately +map any line of the code to the corresponding place in the packet. + + +Static and Global Data +---------------------- + +The basic rule is "avoid static and global data like the plague". What +do I mean by static data? The way to tell if you have static data in a +file is to use the "size" utility in Linux. For example if we run: + + size libcli/raw/*.o + +in Samba4 then you get the following: + + text data bss dec hex filename + 2015 0 0 2015 7df libcli/raw/clikrb5.o + 202 0 0 202 ca libcli/raw/clioplock.o + 35 0 0 35 23 libcli/raw/clirewrite.o + 3891 0 0 3891 f33 libcli/raw/clisession.o + 869 0 0 869 365 libcli/raw/clisocket.o + 4962 0 0 4962 1362 libcli/raw/clispnego.o + 1223 0 0 1223 4c7 libcli/raw/clitransport.o + 2294 0 0 2294 8f6 libcli/raw/clitree.o + 1081 0 0 1081 439 libcli/raw/raweas.o + 6765 0 0 6765 1a6d libcli/raw/rawfile.o + 6824 0 0 6824 1aa8 libcli/raw/rawfileinfo.o + 2944 0 0 2944 b80 libcli/raw/rawfsinfo.o + 541 0 0 541 21d libcli/raw/rawioctl.o + 1728 0 0 1728 6c0 libcli/raw/rawnegotiate.o + 723 0 0 723 2d3 libcli/raw/rawnotify.o + 3779 0 0 3779 ec3 libcli/raw/rawreadwrite.o + 6597 0 0 6597 19c5 libcli/raw/rawrequest.o + 5580 0 0 5580 15cc libcli/raw/rawsearch.o + 3034 0 0 3034 bda libcli/raw/rawsetfileinfo.o + 5187 0 0 5187 1443 libcli/raw/rawtrans.o + 2033 0 0 2033 7f1 libcli/raw/smb_signing.o + +notice that the "data" and "bss" columns are all zero? That is +good. If there are any non-zero values in data or bss then that +indicates static data and is bad (as a rule of thumb). + +Lets compare that result to the equivalent in Samba3: + + text data bss dec hex filename + 3978 0 0 3978 f8a libsmb/asn1.o + 18963 0 288 19251 4b33 libsmb/cliconnect.o + 2815 0 1024 3839 eff libsmb/clidgram.o + 4038 0 0 4038 fc6 libsmb/clientgen.o + 3337 664 256 4257 10a1 libsmb/clierror.o + 10043 0 0 10043 273b libsmb/clifile.o + 332 0 0 332 14c libsmb/clifsinfo.o + 166 0 0 166 a6 libsmb/clikrb5.o + 5212 0 0 5212 145c libsmb/clilist.o + 1367 0 0 1367 557 libsmb/climessage.o + 259 0 0 259 103 libsmb/clioplock.o + 1584 0 0 1584 630 libsmb/cliprint.o + 7565 0 256 7821 1e8d libsmb/cliquota.o + 7694 0 0 7694 1e0e libsmb/clirap.o + 27440 0 0 27440 6b30 libsmb/clirap2.o + 2905 0 0 2905 b59 libsmb/clireadwrite.o + 1698 0 0 1698 6a2 libsmb/clisecdesc.o + 5517 0 0 5517 158d libsmb/clispnego.o + 485 0 0 485 1e5 libsmb/clistr.o + 8449 0 0 8449 2101 libsmb/clitrans.o + 2053 0 4 2057 809 libsmb/conncache.o + 3041 0 256 3297 ce1 libsmb/credentials.o + 1261 0 1024 2285 8ed libsmb/doserr.o + 14560 0 0 14560 38e0 libsmb/errormap.o + 3645 0 0 3645 e3d libsmb/namecache.o + 16815 0 8 16823 41b7 libsmb/namequery.o + 1626 0 0 1626 65a libsmb/namequery_dc.o + 14301 0 1076 15377 3c11 libsmb/nmblib.o + 24516 0 2048 26564 67c4 libsmb/nterr.o + 8661 0 8 8669 21dd libsmb/ntlmssp.o + 3188 0 0 3188 c74 libsmb/ntlmssp_parse.o + 4945 0 0 4945 1351 libsmb/ntlmssp_sign.o + 1303 0 0 1303 517 libsmb/passchange.o + 1221 0 0 1221 4c5 libsmb/pwd_cache.o + 2475 0 4 2479 9af libsmb/samlogon_cache.o + 10768 32 0 10800 2a30 libsmb/smb_signing.o + 4524 0 16 4540 11bc libsmb/smbdes.o + 5708 0 0 5708 164c libsmb/smbencrypt.o + 7049 0 3072 10121 2789 libsmb/smberr.o + 2995 0 0 2995 bb3 libsmb/spnego.o + 3186 0 0 3186 c72 libsmb/trustdom_cache.o + 1742 0 0 1742 6ce libsmb/trusts_util.o + 918 0 28 946 3b2 libsmb/unexpected.o + +notice all of the non-zero data and bss elements? Every bit of that +data is a bug waiting to happen. + +Static data is evil as it has the following consequences: + - it makes code much less likely to be thread-safe + - it makes code much less likely to be recursion-safe + - it leads to subtle side effects when the same code is called from + multiple places + +Static data is particularly evil in library code (such as our internal +smb and rpc libraries). If you can get rid of all static data in +libraries then you can make some fairly strong guarantees about the +behaviour of functions in that library, which really helps. + +Of course, it is possible to write code that uses static data and is +safe, it's just much harder to do that than just avoid static data in +the first place. We have been tripped up countless times by subtle +bugs in Samba due to the use of static data, so I think it is time to +start avoiding it in new code. Much of the core infrastructure of +Samba4 was specifically written to avoid static data, so I'm going to +be really annoyed if everyone starts adding lots of static data back +in. + +So, how do we avoid static data? The basic method is to use context +pointers. When reading the Samba4 code you will notice that just about +every function takes a pointer to a context structure as its first +argument. Any data that the function needs that isn't an explicit +argument to the function can be found by traversing that context. + +Note that this includes all of the little caches that we have lying +all over the code in Samba3. I'm referring to the ones that generally +have a "static int initialised" and then some static string or integer +that remembers the last return value of the function. Get rid of them! +If you are *REALLY* absolutely completely certain that your personal +favourite mini-cache is needed then you should do it properly by +putting it into the appropriate context rather than doing it the lazy +way by putting it inside the target function. I would suggest however +that the vast majority of those little caches are useless - don't +stick it in unless you have really firm benchmarking results that show +that it is needed and helps by a significant amount. + +Note that Samba4 is not yet completely clean of static data like +this. I've gotten the smbd/ directory down to 24 bytes of static data, +and libcli/raw/ down to zero. I've also gotten the ntvfs layer and all +backends down to just 8 bytes in ntvfs_base.c. The rest still needs +some more work. + +Also note that truly constant data is OK, and will not in fact show up +in the data and bss columns in "size" anyway (it will be included in +"text"). So you can have constant tables of protocol data. + + +Memory Contexts +--------------- + +We introduced the talloc() system for memory contexts during the 2.2 +development cycle and it has been a great success. It has greatly +simplified a lot of our code, particularly with regard to error +handling. + +In Samba4 we use talloc even more extensively to give us much finer +grained memory management. The really important thing to remember +about talloc in Samba4 is: + + "don't just use the first talloc context that comes to hand - use + the RIGHT talloc context" + +Just using the first talloc context that comes to hand is probably the +most common systematic bug I have seen so far from programmers that +have worked on the Samba4 code base. The reason this is vital is that +different talloc contexts have vastly different lifetimes, so if you +use a talloc context that has a long lifetime (such as one associated +with a tree connection) for data that is very short lived (such as +parsing an individual packet) then you have just introduced a huge +memory leak. + +In fact, it is quite common that the correct thing to do is to create +a new talloc context for some little function and then destroy it when +you are done. That will give you a memory context that has exactly the +right lifetime. + +You should also go and look at a new talloc function in Samba4 called +talloc_steal(). By using talloc_steal() you can move a lump of memory +from one memory context to another without copying the data. This +should be used when a backend function (such as a packet parser) +produces a result as a lump of talloc memory and you need to keep it +around for a longer lifetime than the talloc context it is in. You +just "steal" the memory from the short-lived context, putting it into +your long lived context. + + +Interface Structures +-------------------- + +One of the biggest changes in Samba4 is the universal use of interface +structures. Go take a look through include/smb_interfaces.h now to get +an idea of what I am talking about. + +In Samba3 many of the core wire structures in the SMB protocol were +never explicitly defined in Samba. Instead, our parse and generation +functions just worked directly with wire buffers. The biggest problem +with this is that is tied our parse code with out "business logic" +much too closely, which meant the code got extremely confusing to +read. + +In Samba4 we have explicitly defined interface structures for +everything in the protocol. When we receive a buffer we always parse +it completely into one of these structures, then we pass a pointer to +that structure to a backend handler. What we must *not* do is make any +decisions about the data inside the parse functions. That is critical +as different backends will need different portions of the data. This +leads to a golden rule for Samba4: + + "don't design interfaces that lose information" + +In Samba3 our backends often received "condensed" versions of the +information sent from clients, but this inevitably meant that some +backends could not get at the data they needed to do what they wanted, +so from now on we should expose the backends to all of the available +information and let them choose which bits they want. + +Ok, so now some of you will be thinking "this sounds just like our +msrpc code from Samba3", and while to some extent this is true there +are extremely important differences in the approach that are worth +pointing out. + +In the Samba3 msrpc code we used explicit parse strucrures for all +msrpc functions. The problem is that we didn't just put all of the +real variables in these structures, we also put in all the artifacts +as well. A good example is the security descriptor strucrure that +looks like this in Samba3: + +typedef struct security_descriptor_info +{ + uint16 revision; + uint16 type; + + uint32 off_owner_sid; + uint32 off_grp_sid; + uint32 off_sacl; + uint32 off_dacl; + + SEC_ACL *dacl; + SEC_ACL *sacl; + DOM_SID *owner_sid; + DOM_SID *grp_sid; +} SEC_DESC; + +The problem with this structure is all the off_* variables. Those are +not part of the interface, and do not appear in any real descriptions +of Microsoft security descriptors. They are parsing artifacts +generated by the IDL compiler that Microsoft use. That doesn't mean +they aren't needed on the wire - indeed they are as they tell the +parser where to find the following four variables, but they should +*NOT* be in the interface structure. + +In Samba3 there were unwritten rules about which variables in a +strucrure a high level caller has to fill in and which ones are filled +in by the marshalling code. In Samba4 those rules are gone, because +the redundent artifact variables are gone. The high level caller just +sets up the real variables and the marshalling code worries about +generating the right offsets. + +The same rule applies to strings. In many places in the SMB and MSRPC +protocols complex strings are used on the wire, with complex rules +about padding, format, alighment, termination etc. None of that +information is useful to a high level calling routine or to a backend +- its all just so much wire fluff. So, in Samba4 these strings are +just "char *" and are always in our internal multi-byte format (which +is usually UTF8). It is up to the parse functions to worry about +translating the format and getting the padding right. + +The one exception to this is the use of the WIRE_STRING type, but that +has a very good justification in terms of regression testing. Go and +read the comment in smb_interfaces.h about that now. + +So, here is another rule to code by. When writing an interface +structure think carefully about what variables in the structure can be +left out as they are redundent. If some length is effectively defined +twice on the wire then only put it once in the packet. If a length can +be inferred from a null termination then do that and leave the length +out of the structure completely. Don't put redundent stuff in +structures! + + +Async Design +------------ + +Samba4 has an asynchronous design. That affects *lots* of the code, +and the implications of the asynchronous design needs to be considered +just about everywhere. + +The first aspect of the async design to look at is the SMB client +library. Lets take a look at the following three functions in +libcli/raw/rawfile.c: + +struct cli_request *smb_raw_seek_send(struct cli_tree *tree, struct smb_seek *parms); +NTSTATUS smb_raw_seek_recv(struct cli_request *req, struct smb_seek *parms); +NTSTATUS smb_raw_seek(struct cli_tree *tree, struct smb_seek *parms); + +Go and read them now then come back. + +Ok, first notice there there are 3 separate functions, whereas the +equivalent code in Samba3 had just one. Also note that the 3rd +function is extremely simple - its just a wrapper around calling the +first two in order. + +The three separate functions are needed because we need to be able to +generate SMB calls asynchronously. The first call, which for smb calls +is always called smb_raw_XXXX_send(), constructs and sends a SMB +request and returns a "struct cli_request" which acts as a handle for +the request. The caller is then free to do lots of other calls if it +wants to, then when it is ready it can call the smb_raw_XXX_recv() +function to receive the reply. + +If all you want is a synchronous call then call the 3rd interface, the +one called smb_raw_XXXX(). That just calls the first two in order, and +blocks waiting for the reply. + +But what if you want to be called when the reply comes in? Yes, thats +possible. You can do things like this: + + struct cli_request *req; + + req = smb_raw_XXX_send(tree, params); + + req->async.fn = my_callback; + req->async.private = my_private_data; + +then in your callback function you can call the smb_raw_XXXX_recv() +function to receive the reply. Your callback will receive the "req" +pointer, which you can use to retrieve your private data. + +Then all you need to do is ensure that the main loop in the client +library gets called. You can either do that by polling the connection +using cli_transport_pending() and cli_request_receive_next() or you +can use transport->idle.func to setup an idle function handler to call +back to your main code. Either way, you can build a fully async +application. + +In order to support all of this we have to make sure that when we +write a piece of library code (SMB, MSRPC etc) that we build the +separate _send() and _recv() functions. It really is worth the effort. + +Now about async in smbd, a much more complex topic. + +The SMB protocol is inherently async. Some functions (such as change +notify) often don't return for hours, while hundreds of other +functions pass through the socket. Take a look at the RAW-MUX test in +the Samba4 smbtorture to see some really extreme examples of the sort +of async operations that Windows supports. + +In Samba3 we handled this stuff very badly. We had awful "pending +request" queues that allocated full 128k packet buffers, and even with +all that crap we got the semantics wrong. In Samba4 I intend to make +sure we get this stuff right. + +So, how do we do this? We now an async interface between smbd and the +NTVFS backends. Whenever smbd calls into a backend the backend has an +option of answer the request in a synchronous fashion if it wants to +just like in Samba3, but it also has the option of answering the +request asynchronously. The only backend that currently does this is +the CIFS backend, but I hope the other backends will soon do this to. + +To make this work you need to do things like this in the backend: + + req->control_flags |= REQ_CONTROL_ASYNC; + +that tells smbd that the backend has elected to reply later rather +than replying immediately. The backend must *only* do this if +req->async.send_fn is not NULL. If send_fn is NULL then it means that +smbd cannot handle this function being replied to in an async fashion. + +If the backend does this then it is up to the backend to call +req->async.send_fn() when it is ready to reply. It the meantime smbd +puts the call on hold and goes back to answering other requests on the +socket. + +Inside smbd you will find that there is code to support this. The most +obvious change is that smbd splits each SMB reply function into two +parts - just like the client library has a _send() and _recv() +function, so smbd has a _send() function and the parse function for +each SMB. + +Go and have a look at reply_getatr_send() and reply_getatr() in +smbd/reply.c. Read them? Good. + +Notice that reply_getatr() sets up the req->async structure to contain +the send function. Thats how the backend gets to do an async +reply. Also notice that reply_getatr() only does the parsing of the +request, and does not to the reply generation. That is done by the +_send() function. Nice and simple really. + + + +MSRPC +----- + + + + - ntvfs + - testing + - command line handling + - libcli structure + - posix reliance + - uid/gid handling + - process models + - static data + - msrpc + + + + +GMT vs TZ in printout of QFILEINFO timezones + +put in full UNC path in tconx + +test timezone handling by using a server in different zone from client + +don't just use any old TALLOC_CTX, use the right one! + +do {} while (0) system + +NT_STATUS_IS_OK() is NOT the opposite of NT_STATUS_IS_ERR() + +need to implement secondary parts of trans2 and nttrans in server and +client + +add talloc_steal() to move a talloc ptr from one pool to another + +document access_mask in openx reply + +check all capabilities and flag1, flag2 fields (eg. EAs) + +large files -> pass thru levels + +setpathinfo is very fussy about null termination of the file name + +the overwrite flag doesn't seem to work on setpathinfo RENAME_INFORMATION + +END_OF_FILE_INFORMATION and ALLOCATION_INFORMATION don't seem to work +via setpathinfo + +on w2k3 setpathinfo DISPOSITION_INFORMATION fails, but does have an +effect. It leaves the file with SHARING_VIOLATION. + +on w2k3 trans2 setpathinfo with any invalid low numbered level causes +the file to get into a state where DELETE_PENDING is reported, and the +file cannot be deleted until you reboot + +trans2 qpathinfo doesn't see the delete_pending flag correctly, but +qfileinfo does! + +get rid of pstring, fstring, strtok + +add programming documentation note about lp_set_cmdline() + +need to add a wct checking function in all client parsing code, +similar to REQ_CHECK_WCT() + +need to make sure that NTTIME is a round number of seconds when +converted from time_t + +not using a zero next offset in SMB_FILE_STREAM_INFORMATION for last +entry causes explorer exception under win2000 + + +if the server sets the session key the same for a second SMB socket as +an initial socket then the client will not re-authenticate, it will go +straight to a tconx, skipping session setup and will use all the +existing parameters! This allows two sockets with the same keys!? + + +removed blocking lock code, we now queue the whole request the same as +we queue any other pending request. This allows for things like a +close() while a pending blocking lock is being processed to operate +sanely. + +disabled change notify code + +disabled oplock code + + + +MILESTONES +========== + + +client library and test code +---------------------------- + + convert client library to new structure + get smbtorture working + get smbclient working + expand client library for all requests + write per-request test suite + gentest randomised test suite + separate client code as a library for non-Samba use + +server code +----------- + add remaining core SMB requests + add IPC layer + add nttrans layer + add rpc layer + fix auth models (share, server, rpc) + get net command working + connect CIFS backend to server level auth + get nmbd working + get winbindd working + reconnect printing code + restore removed smbd options + add smb.conf macro substitution code + add async backend notification + add generic timer event mechanism + +clustering code +--------------- + + write CIFS backend + new server models (break 1-1) + test clustered models + add fulcrum statistics gathering + +docs +---- + + conference paper + developer docs |