[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SAGE] number of eggs in a basket



At 5:32 PM -0500 2005-01-06, Jan Schaumann wrote:

>  I'd like to get some opinions regarding best practices for mission
>  critical systems with multiple services.

	If it's mission critical, then I believe it should be replicated, 
distributed, and set up for N+M 
load-balancing/fail-over/fault-resilience, where N is how many 
machines you need to handle your maximum projected peak load, and M 
is the number of machines you need to be able to lose and still 
handle that load.

>  I have a system that basically is a single point of failure:  if it's
>  down, nothing goes.  The services on that machine are WWW, NIS, NFS and
>  mail.  Mail is delivered to ~/.mail so mail can be read via NFS and need
>  not be fetched.

	Personally, I'd split these services.

	Put the NFS stuff on a dedicated NetApp NFS cluster -- unless 
you're AOL, you can't afford to build your own NFS boxes that can 
provide better price/performance, nor can you make them easier to 
manage.  I have my problems with NetApp, but they're mostly to do 
with the company, and there are relatively few criticisms that I can 
level at the products -- so long as you stick to NFS.

	From that cluster, mount the NFS home directories everywhere you need.

	You can do mail on NFS -- Nick Christensen showed us how to do 
that in a scalable and reliable fashion in his paper "A Highly 
Scalable Electronic Mail Service Using Open Systems" at 
<http://www.jetcafe.org/npc/doc/mail_arch.html>.  I would not be 
inclined to try to do IMAP on NFS, unless you're interesting in 
taking a shot at writing the next chapter in "Design and 
Implementation of Highly Scalable E-mail Systems" at 
<http://www.shub-internet.org/brad/papers/dihses/>.

>  I do not like having all my eggs in this one basket, but on the other
>  hand distributing the services to several machines seems to complicate
>  things and increase the likeliness of one of the services failing.

	If you can't afford to use a NetApp cluster, then you could do a 
lower-cost cluster with a SAN and a cluster-aware filesystem, and 
roll your own NFS cluster.  It won't be as reliable or easy to manage 
as a NetApp cluster, but you may be able to live with that, since the 
result would probably still be an improvement over what you've got 
now.

	Once the NFS service is moved over to a cluster, the rest of the 
parts are easily de-composed into their own groups for distribution 
across clusters.  Or, you could stick with the cluster-aware 
filesystem over a SAN and dispense with NFS entirely, having the 
other application servers read/write directly from/to the cluster 
filesystem.


	Keep in mind that you can use lower-end servers for the clusters, 
and still wind up with higher overall system throughput.  Of course, 
if you try to cut too many corners with the cluster servers, you'll 
wind up with excessive complexity, failures, and downtime, and you'll 
wish you hadn't gone down that road.  Stick with server-grade 
components, regardless of whatever platform and OS you choose.

>  So... what are your comments/experiences?  How many eggs do you keep in
>  your basket(s)?

	I prefer to have at least one basket for each egg.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.