[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Reliability and assurances
If some subsystem is expected to fail, you have no guarantee that
it will be available at any specific moment... but just because you
expect some subsystem to be unavailable doesn't mean you should
assume that it will be unavailable all the time. Everything will be
unavailable at some point. Astronomers tell us that if we wait
long enough and the sun will change what it delivers.
There is a long history of statically basis analysis and
predictions to engineer reliable systems. Some of the statistics
are counter intuitive (at least to me). Go to any good bookstore
with a good engineering section (EE, MechE or CivilE) and you should
be able to find a decent textbooks. There was a lot of good
research done in the 60's about how you can (with a lot of work)
build reliable systems out of components which will fail. I could
cite the lit, but Peter Neumann has a good report which touches on
most issues which can be found at
http://www.csl.sri.com/neumann/arl-one.html
If you are thinking about building HA system,
http://www.interlog.com/~resnick/HA.htm
might also provide some useful information / perspective.
--Mark Verber
Network Operations, Tellme Networks
http://www.verber.com/mark/
-----Original Message-----
From: owner-sage-members@usenix.org
[mailto:owner-sage-members@usenix.org]On Behalf Of Mark R. Lindsey
Sent: Saturday, January 08, 2000 10:35 AM
To: sage-members@usenix.org
Subject: Reliability and assurances
I'm working on a theory: if you can't be assured that a subsystem is going
to work all of the time, then you can't be assured that a subsystem is
going to work any of the time.
Does that seem reasonable?
When I say `subsystem' here, it's for lack of a better term to describe
something atomic; e.g., a source of power for computer X, or a database
server for application Y, &c. Obviously, everything depends on something
else, and I'm not talking here about an analysis that extends up a
reliability tree.