[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [SAGE] Strategies for taking ownership of existing infrastructure?
Jesús,
I second all the advice so far.
Life may be stressful at first, especially if a machine goes down, you
lose power, or you have to reboot when you'd rather not... and the
infrastructure doesn't come all the way up.
Once something like this happens, your life won't be the same until know
that you can freely reboot all your servers, and you know what their
interdependencies are. In a "loose" environement, many servers can put
put up, run for 400+ days, and things can get to the point where you
realize you don't exactly know how to shut things down and bring them up
in an orderly fashion, even when your own team set it all up!
My advice to you is to tell management that you can't promise /anything/
until you have done some fire drills. Always inform your users and
management if you must reboot an unknown server; and that you will know
better NEXT TIME how long it will be before things are back on-line, etc.
Also insist on carving out some time at night on the weekend where you
have scheduled downtime if you need them. Even though many sites want
24x7 - because they think they can have it, build a case for having
everything unavailable at 11 PM of Sat - 6 AM Sunday for example, then
use it.
Remain calm if something fails to come up. Use your head. Talk things
through with a user of the system - even if they don't know system
admin, they may prove to be a useful sounding board as you try to
explain what isn't working. (They won't have anything better to do
anyway, if their server is down. :^)
Have fun!
John Miller
http://www.metro-region.org