[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [SAGE] number of eggs in a basket
At 5:32 PM -0500 2005-01-06, Jan Schaumann wrote:
> I'd like to get some opinions regarding best practices for mission
> critical systems with multiple services.
If it's mission critical, then I believe it should be replicated,
distributed, and set up for N+M
load-balancing/fail-over/fault-resilience, where N is how many
machines you need to handle your maximum projected peak load, and M
is the number of machines you need to be able to lose and still
handle that load.
> I have a system that basically is a single point of failure: if it's
> down, nothing goes. The services on that machine are WWW, NIS, NFS and
> mail. Mail is delivered to ~/.mail so mail can be read via NFS and need
> not be fetched.
Personally, I'd split these services.
Put the NFS stuff on a dedicated NetApp NFS cluster -- unless
you're AOL, you can't afford to build your own NFS boxes that can
provide better price/performance, nor can you make them easier to
manage. I have my problems with NetApp, but they're mostly to do
with the company, and there are relatively few criticisms that I can
level at the products -- so long as you stick to NFS.
From that cluster, mount the NFS home directories everywhere you need.
You can do mail on NFS -- Nick Christensen showed us how to do
that in a scalable and reliable fashion in his paper "A Highly
Scalable Electronic Mail Service Using Open Systems" at
<http://www.jetcafe.org/npc/doc/mail_arch.html>. I would not be
inclined to try to do IMAP on NFS, unless you're interesting in
taking a shot at writing the next chapter in "Design and
Implementation of Highly Scalable E-mail Systems" at
<http://www.shub-internet.org/brad/papers/dihses/>.
> I do not like having all my eggs in this one basket, but on the other
> hand distributing the services to several machines seems to complicate
> things and increase the likeliness of one of the services failing.
If you can't afford to use a NetApp cluster, then you could do a
lower-cost cluster with a SAN and a cluster-aware filesystem, and
roll your own NFS cluster. It won't be as reliable or easy to manage
as a NetApp cluster, but you may be able to live with that, since the
result would probably still be an improvement over what you've got
now.
Once the NFS service is moved over to a cluster, the rest of the
parts are easily de-composed into their own groups for distribution
across clusters. Or, you could stick with the cluster-aware
filesystem over a SAN and dispense with NFS entirely, having the
other application servers read/write directly from/to the cluster
filesystem.
Keep in mind that you can use lower-end servers for the clusters,
and still wind up with higher overall system throughput. Of course,
if you try to cut too many corners with the cluster servers, you'll
wind up with excessive complexity, failures, and downtime, and you'll
wish you hadn't gone down that road. Stick with server-grade
components, regardless of whatever platform and OS you choose.
> So... what are your comments/experiences? How many eggs do you keep in
> your basket(s)?
I prefer to have at least one basket for each egg.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.