summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)AuthorFilesLines
2012-06-22dbwrap: don't ignore the result of dbwrap_parse_record in dbwrap_fetch_int32()Stefan Metzmacher1-1/+5
metze Autobuild-User(master): Stefan Metzmacher <metze@samba.org> Autobuild-Date(master): Fri Jun 22 17:10:52 CEST 2012 on sn-devel-104
2012-06-22dbwrap: intialize state.status in dbwrap_fetch_int32()Stefan Metzmacher1-0/+2
This might not be needed, but it makes it more clear that we won't use uninitialized memory, it the callback was not triggered. metze
2012-06-22dbwrap: Convert fetch_int32 to dbwrap_parse_recordVolker Lendecke1-13/+24
Now dbwrap_fetch_int32 is used in smbd/locking/posix.c is used a lot more often than before. Signed-off-by: Stefan Metzmacher <metze@samba.org>
2012-06-22dbwrap: Add dbwrap_fetch_int32Volker Lendecke2-3/+11
Signed-off-by: Stefan Metzmacher <metze@samba.org>
2012-06-22dbwrap: Add dbwrap_change_int32_atomicVolker Lendecke2-9/+21
Signed-off-by: Stefan Metzmacher <metze@samba.org>
2012-06-22tdb: don't use err.h in tests.Rusty Russell21-22/+1
It's not portable. While we could use ccan/err, it seems overkill since we actually only use it in one test (I obviously cut & paste the #include). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Autobuild-User(master): Rusty Russell <rusty@rustcorp.com.au> Autobuild-Date(master): Fri Jun 22 09:22:28 CEST 2012 on sn-devel-104
2012-06-22tdb: make TDB_NOSYNC merely disable sync.Rusty Russell3-15/+20
(As suggested by Stefan Metzmacher, based on the change to ntdb.) Since commit ec96ea690edbe3398d690b4a953d487ca1773f1c, we handle the case where a process dies during a transaction commit. Unfortunately, TDB_NOSYNC means this no longer works, as it disables the recovery area as well as the actual msync/fsync. We should do everything except the syncs. This also means we can do a complete test with $TDB_NO_FSYNC set; just to get more complete coverage, we disable it explicitly for one test (where we override the actual sync calls anyway). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22dbwrap: dbwrap_hash_size().Rusty Russell7-0/+12
Implemented for ntdb and tdb; falls back to 0 for others. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22dbwrap: dbwrap_name().Rusty Russell7-1/+11
Useful for debug messages: particularly once we start switching between .tdb and .ntdb files. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22dbwrap: dbwrap_transaction_start_nonblock().Rusty Russell4-0/+26
Implemented for ntdb and tdb; falls back to the blocking variant for others. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22dbwrap: dbwrap_fetch_locked_timeout().Rusty Russell4-0/+52
Implemented for ntdb and tdb; falls back to the non-timeout variant for others. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22dbwrap: add dbwrap_check() function.Rusty Russell4-0/+31
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22dbwrap: dbwrap_local_open()Rusty Russell4-0/+92
This simply opens a tdb: it will eventually switch depending on the extension. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22dbwrap: remove get_flags().Rusty Russell5-24/+0
The flags returned were TDB-specific: this was only used for detecting the endianness of obsolete databases (the conversion code was put in in 2003, with reference to Samba 2.3). It's easier to remove it than to translate the NTDB flags to TDB flags, and it's a really weird thing to ask for anyway. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22util_tdb: move timeout chainlock variants from source3/lib/util/util_tdb.cRusty Russell3-1/+98
We're about to use them for dbwrap. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22util: util_ntdb ntdb_fetch_int32/ntdb_store_int32 and ntdb_add_int32_atomicRusty Russell2-0/+89
Similar to the util_tdb versions, but return the error code. ntdb_add_int32_atomic seems a clearer name than tdb_change_int32_atomic. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22util: util_ntdb.c gets NTDB_ERROR => NTSTATUS map.Rusty Russell2-0/+49
Very similar to the tdb version. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22util: util_ntdb.c gains bystring functions.Rusty Russell2-1/+85
Very similar to the util_tdb versions, but these return the error. I've only implemented those functions actually used. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22util: ntdb_new() supports NTDB_CLEAR_IF_FIRST.Rusty Russell2-1/+99
There are various issues with NTDB_CLEAR_IF_FIRST which makes it better if we don't have to use it, but much of the code does, so we fake up support here. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22util: util_ntdb.cRusty Russell3-0/+168
The first function is ntdb_new: this is preferred over ntdb_open, as it makes the ntdb_context returned (and all NTDB_DATA returned from ntdb_fetch) valid talloc pointers. The API is very similar to tdb_wrap_open(). Note that we handle $TDB_NO_FSYNC here, since ntdb doesn't do that hack (and it's great for speeding up testing!). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22ntdb: take advantage of direct access across expand.Rusty Russell1-33/+15
This means we no longer have to unmap if we want to compare a record. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22ntdb: test arbitrary operations during ntdb_parse_record().Rusty Russell2-0/+90
In particular, this tests that we can store enough records to make the database expand while we map the given record. We use a global lock for this, but it could happen in theory with another process. It also tests the that we can recurse inside ntdb_parse_record(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22ntdb: make database read-only during ntdb_parse() callback.Rusty Russell6-24/+195
Since we have a readlock, any write will grab a write lock: if it happens to be on the same bucket, we'll fail. For that reason, enforce read-only so every write operation fails (even for NTDB_NOLOCK or NTDB_INTERNAL dbs), and document it! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22ntdb: allow direct access for NTDB_INTERNAL dbs during expansion.Rusty Russell1-14/+45
NTDB_INTERNAL databases need to malloc and copy to keep old versions around if we expand, in a similar way to the manner in which keep old mmaps around. Of course, it only works for read-only accesses, since the two copies are not synced. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22ntdb: enhancement to allow direct access to the ntdb map during expansion.Rusty Russell5-33/+86
This means keeping the old mmap around when we expand the database. We could revert to read/write, except for platforms with incoherent mmap (ie. OpenBSD), where we need to use mmap for all accesses. Thus we keep a linked list of old maps, and unmap them when the last access finally goes away. This is required if we want ntdb_parse_record() callbacks to be able to expand the database. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22ntdb: don't munmap the database on every close.Rusty Russell2-13/+16
Since we can have multiple openers, we should leave the mmap in place for the other openers to use. Enhance the test to check the bug (it still works, because without mmap we fall back to read/write, but performance would be terrible!). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22ntdb: hand correct error code when alloc_read allocation fails.Rusty Russell1-1/+1
-ECUTNPASTE. This is not a usage error! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22autobuild: always set TDB_NO_FSYNC.Rusty Russell1-0/+4
Then we unset it inside the tdb test target itself. This means that new code can't accidently forget it, and we can set it in the 'buildnice' script on sn-devel, for example. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22ntdb: respect TDB_NO_FSYNC flag for 'make test'Rusty Russell51-151/+187
This reduces test time from 31 seconds to 6, on my laptop. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-21Add --disable-ntdb option for building.Jelmer Vernooij1-0/+1
Autobuild-User(master): Jelmer Vernooij <jelmer@samba.org> Autobuild-Date(master): Thu Jun 21 19:59:57 CEST 2012 on sn-devel-104
2012-06-20ntdb: fix occasional abort in testing.Rusty Russell1-1/+7
Occasionally, the capability test inserts multiple used records and they clash, but our primitive test layout engine doesn't handle hash clashes and aborts. Force a seed value which we know doesn't clash. Reported-by: Andrew Bartlett <abartlet@samba.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Autobuild-User(master): Rusty Russell <rusty@rustcorp.com.au> Autobuild-Date(master): Wed Jun 20 16:50:20 CEST 2012 on sn-devel-104
2012-06-19ntdb: add autoconf support.Rusty Russell1-0/+41
This is copied from tdb; we build the utilities, but as nothing else links against it, we shouldn't be adding anything to the normal samba binary sizes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Autobuild-User(master): Rusty Russell <rusty@rustcorp.com.au> Autobuild-Date(master): Tue Jun 19 07:31:06 CEST 2012 on sn-devel-104
2012-06-19lib/tdb_wrap: use tdb directly, not tdb_compat.Rusty Russell3-4/+6
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19ldb: use tdb directly, not tdb_compat.Rusty Russell8-21/+24
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19lib/dbwrap: depend directly on tdb, not tdb_compat.Rusty Russell1-1/+1
Simple change, as we get rid of tdb_compat in favour of either ntdb directly or dbwrap. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19lib/util_tdb: depend directly on tdb, not tdb_compat.Rusty Russell2-5/+5
Simple change, as we get rid of tdb_compat in favour of tdb directly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19ntdb: update documentation.Rusty Russell6-5821/+505
Update the design.lyx file with the latest status and the change in hashing. Also, refresh and add examples to the TDB_porting.txt file. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19ntdb: optimize ntdb_fetch.Rusty Russell7-51/+59
We access the key on lookup, then access the data in the caller. It makes more sense to access both at once. We also put in a likely() for the case where the hash is not chained. Before: Adding 1000 records: 3644-3724(3675) ns (129656 bytes) Finding 1000 records: 1596-1696(1622) ns (129656 bytes) Missing 1000 records: 1409-1525(1452) ns (129656 bytes) Traversing 1000 records: 1636-1747(1668) ns (129656 bytes) Deleting 1000 records: 3138-3223(3175) ns (129656 bytes) Re-adding 1000 records: 3278-3414(3329) ns (129656 bytes) Appending 1000 records: 5396-5529(5426) ns (253312 bytes) Churning 1000 records: 9451-10095(9584) ns (253312 bytes) smbtorture results (--entries=1000) ntdb speed 183881-191112(188223) ops/sec After: Adding 1000 records: 3590-3701(3640) ns (129656 bytes) Finding 1000 records: 1539-1605(1566) ns (129656 bytes) Missing 1000 records: 1398-1440(1413) ns (129656 bytes) Traversing 1000 records: 1629-2015(1710) ns (129656 bytes) Deleting 1000 records: 3118-3236(3163) ns (129656 bytes) Re-adding 1000 records: 3235-3355(3275) ns (129656 bytes) Appending 1000 records: 5335-5444(5385) ns (253312 bytes) Churning 1000 records: 9350-9955(9494) ns (253312 bytes) smbtorture results (--entries=1000) ntdb speed 180559-199981(195106) ops/sec
2012-06-19ntdb: add -h arg to ntdbrestoreRusty Russell1-8/+28
Since our default hashsize is 8192 not 131, we look fat when we convert near-empty TDBs. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19ntdb: reduce default hashsize on ntdbtorture.Rusty Russell1-3/+10
Just like tdbtorture, having a hashsize of 2 stresses us much more! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19ntdb: add NTDB_ATTRIBUTE_HASHSIZERusty Russell2-2/+41
Since we've given up on expansion, let them frob the hashsize again. We have attributes, so we should use them for optional stuff like this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19ntdb: remove hash table trees.Rusty Russell36-1662/+1096
TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1
2012-06-19ntdb: special accessor functions for read/write of an offset.Rusty Russell3-50/+114
We also split off the NTDB_CONVERT case (where the ntdb is of a different endian) into its own io function. NTDB speed: Adding 10000 records: 3894-9951(8553) ns (815528 bytes) Finding 10000 records: 1644-4294(3580) ns (815528 bytes) Missing 10000 records: 1497-4018(3303) ns (815528 bytes) Traversing 10000 records: 1585-4225(3505) ns (815528 bytes) Deleting 10000 records: 3088-8154(6927) ns (815528 bytes) Re-adding 10000 records: 3192-8308(7089) ns (815528 bytes) Appending 10000 records: 5187-13307(11365) ns (1274312 bytes) Churning 10000 records: 6772-17567(15078) ns (1274312 bytes) NTDB speed in transaction: Adding 10000 records: 1602-2404(2214) ns (815528 bytes) Finding 10000 records: 456-871(778) ns (815528 bytes) Missing 10000 records: 393-522(503) ns (815528 bytes) Traversing 10000 records: 729-1015(945) ns (815528 bytes) Deleting 10000 records: 1065-1476(1374) ns (815528 bytes) Re-adding 10000 records: 1397-1930(1819) ns (815528 bytes) Appending 10000 records: 2927-3351(3184) ns (1274312 bytes) Churning 10000 records: 3921-4697(4378) ns (1274312 bytes) smbtorture results: ntdb speed 86581-191518(175666) ops/sec Applying patch..increase-top-level.patch
2012-06-19ntdb: inline oob checkRusty Russell6-14/+24
The simple "is it in range" check can be inline; complex cases can be handed through to the normal or transaction handler. NTDB speed: Adding 10000 records: 4111-9983(9149) ns (815528 bytes) Finding 10000 records: 1667-4464(3810) ns (815528 bytes) Missing 10000 records: 1511-3992(3546) ns (815528 bytes) Traversing 10000 records: 1698-4254(3724) ns (815528 bytes) Deleting 10000 records: 3608-7998(7358) ns (815528 bytes) Re-adding 10000 records: 3259-8504(7805) ns (815528 bytes) Appending 10000 records: 5393-13579(12356) ns (1274312 bytes) Churning 10000 records: 6966-17813(16136) ns (1274312 bytes) NTDB speed in transaction: Adding 10000 records: 916-2230(2004) ns (815528 bytes) Finding 10000 records: 330-866(770) ns (815528 bytes) Missing 10000 records: 196-520(471) ns (815528 bytes) Traversing 10000 records: 356-879(800) ns (815528 bytes) Deleting 10000 records: 505-1267(1108) ns (815528 bytes) Re-adding 10000 records: 658-1681(1477) ns (815528 bytes) Appending 10000 records: 1088-2827(2498) ns (1274312 bytes) Churning 10000 records: 1636-4267(3785) ns (1274312 bytes) smbtorture results: ntdb speed 85588-189430(157110) ops/sec
2012-06-19ntdb: allocator attribute.Rusty Russell15-92/+311
This is designed to allow us to make ntdb_context (and NTDB_DATA returned from ntdb_fetch) a talloc pointer. But it can also be used for any other alternate allocator. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19ntdb: still prepare recovery area with NTDB_NOSYNC.Rusty Russell1-10/+8
NTDB_NOSYNC now just prevents the fsync/msync calls, which speeds testing while still providing full coverage. It also provides safety against processes dying during transaction commit (though obviously, not against the machine dying). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19ntdb: simply disallow NULL names.Rusty Russell3-10/+5
TDB allows this for internal databases, but it's a bad idea, since the name is useful for logging. They're a hassle to deal with, and we'd just end up putting "unnamed" in there, so let the user deal with it. If they don't, they get an informative core dump. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19ntdb: reduce transaction pagesize from 64k to 16k.Rusty Russell1-1/+1
The performance numbers for transaction pagesize are indeterminate: larger pagesizes means a smaller transaction array, and a better chance of having a contiguous record (more efficient for ntdb_parse_record and some internal operations inside a transaction). On the other hand, large pagesize means more I/O even if we change a few bytes. But it also controls the multiple by which we will enlarge the file, and hence the minimum db size. It's 4k for tdb1, but 16k seems reasonable in these modern times. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19ntdb: remove last block transactoin logic.Rusty Russell1-44/+1
Now our database is always a multiple of NTDB_PGSIZE, we can remove the special handling for the last block. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-19ntdb: create initial database to be multiple of NTDB_PGSIZE.Rusty Russell8-88/+138
As copied from tdb1, there is logic in the transaction code to handle a non-PGSIZE multiple db, but in fact this only happens for a completely unused database: as soon as we add anything to it, it is expanded to a NTDB_PGSIZE multiple. If we create the database with a free record which pads it out to NTDB_PGSIZE, we can remove this last-page-is-different logic. Of course, the fake ntdbs we create in our tests now also need to be multiples of NTDB_PGSIZE, so we change some numbers there too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>