[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [SAGE] number of eggs in a basket



At 9:40 PM -0700 2005-01-07, Ruth Milner wrote:

>  What I said was that it was "10x more likely that some sort of
>  hardware or *system software* problem will take out a service".

	I'm not convinced.  If you have N+M load-balanced/fail-over 
clusters, the probability of the entire service being taken out by a 
single hardware or system software failure should approach zero.

>  I didn't put a number on the overall *total* number of failures,
>  only that it would increase (which is absolutely the case). The
>  cause doesn't really matter, though: if you have 10x the number
>  of machines for sysadmin missteps to be made on, then the overall
>  total failure incidents are likewise going to increase - though
>  not necessarily linearly.

	If you're using the right admin tools, a site with 100,000 
machines may have a lower overall probability of an "admin oops" 
taking out a significant chunk of the system as compared to a smaller 
site with just 100 machines or even just one machine, if they don't 
have the right tools.

>  Well, I did say a little more than just that one quoted paragraph.
>  :-) My point in that bit was where the idea might come from that
>  the number of failures would increase by decentralizing, which at
>  least one respondent had questioned.

	The total potential number of failures may go up, but if the 
system is designed correctly, those should be accounted for and 
should not be a visible impact on the overall services being 
provided.  You should be able to take a hit overnight (or over the 
weekend), get notified by e-mail, and then fix it whenever you feel 
like getting around to it the next working day.  At least, for most 
types of hits.

>  This does not make decentralization bad; as everyone has been
>  saying (including me), it's a complex issue. The point is that
>  decentralization also has costs that shouldn't be glossed over,
>  especially in a small shop.

	Fair enough.  YMMV, definitely.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.