[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[SAGE] simple database problem



	i have an application which needs to maintain a mapping of
(name,md5sum) to pathname for 10k-1000k mappings.
currently, we convert the key to a string, and use gdbm (or ndbm).
(the application runs on Linux, FreeBSD, MacOSX, Solaris and Irix.)
30% of the time, we add a single mapping, 20% of teh time we delete
a mapping, and 50% of the time, we print out all mappings.
rarely, we add or delete a largish number of mappings.

	the problem is that on Linux (actually, i could just stop here, 
couldn't I?),
the 'print all' operation can take 30mins or more on a busy machine,
(busy here means lots of I/O) as opposed to the normal 2-3secs,
apparently because of the random seeking around in the database file.
performance is significantly helped by simply running 'wc db.dbm'
just prior to using the database.

	is there a better way to implement this databse that will not be prone
to this kind of 'failure' (and make no mistake, taking 30mins is for 
all intents
and purposes, a failure)? of course, this does not manifest itself on
any of our other platforms, but Linux performance has always been 
unusually fragile
with respect to the contents of teh buffer cache.

----
Andrew Hume  (best -> Telework) +1 732-886-1886
andrew@research.att.com  (Work) +1 973-360-8651
AT&T Labs - Research; member of USENIX and SAGE