[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[SAGE] simple database problem
i have an application which needs to maintain a mapping of
(name,md5sum) to pathname for 10k-1000k mappings.
currently, we convert the key to a string, and use gdbm (or ndbm).
(the application runs on Linux, FreeBSD, MacOSX, Solaris and Irix.)
30% of the time, we add a single mapping, 20% of teh time we delete
a mapping, and 50% of the time, we print out all mappings.
rarely, we add or delete a largish number of mappings.
the problem is that on Linux (actually, i could just stop here,
couldn't I?),
the 'print all' operation can take 30mins or more on a busy machine,
(busy here means lots of I/O) as opposed to the normal 2-3secs,
apparently because of the random seeking around in the database file.
performance is significantly helped by simply running 'wc db.dbm'
just prior to using the database.
is there a better way to implement this databse that will not be prone
to this kind of 'failure' (and make no mistake, taking 30mins is for
all intents
and purposes, a failure)? of course, this does not manifest itself on
any of our other platforms, but Linux performance has always been
unusually fragile
with respect to the contents of teh buffer cache.
----
Andrew Hume (best -> Telework) +1 732-886-1886
andrew@research.att.com (Work) +1 973-360-8651
AT&T Labs - Research; member of USENIX and SAGE