[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[SAGE] Summary: Need help with performace diagnosis
- To: "Cyrus Vesuna" <cyrus.sage@gmail.com>
- Subject: [SAGE] Summary: Need help with performace diagnosis
- From: Paul Lussier <p.lussier@comcast.net>
- Date: Wed, 03 Jan 2007 10:24:54 -0500
- Cc: sage-members@sage.org
- In-Reply-To: <fdcccb10701021830p39f7eeaai973a65f12cc0f892@mail.gmail.com> (CyrusVesuna's message of "Tue, 2 Jan 2007 18:30:12 -0800")
- References: <87r6vk69fx.fsf@comcast.net><fdcccb10701021830p39f7eeaai973a65f12cc0f892@mail.gmail.com> <200612011000.kB1A0kbJ019019@voyager.usenix.org>
- Sender: owner-sage-members@usenix.org
- User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)
"Cyrus Vesuna" <cyrus.sage@gmail.com> writes:
> Hi Paul,
> Did you get any closer to a solution for this issue?
Hi Cyrus,
Yes, we discovered a lot of things in a very short time:
a) We really need to upgrade the kernel.
The kernel we're using is very old (2.4.22) and doesn't manage
memory very well. We added 2GB of memory to the existing 2GB and
that made performance even worse. Ideally we should move up to a
2.6 kernel, but we don't have the down-time or the spare equipment
to do this.
b) Our NFS client mount options are horrible.
All of our systems mount this single 1TB partition with the
default NFS options for Linux. Considering a) we have 300+
clients, and b) they're doing lots and lots of writes, this is
very bad. We really ought to be mounting with at least the
following options:
nfsvers=3,rw,noatime,rsize=8192,wsize=8192
(r,w)size probably ought to be up around 32k instead, since NFSv3
supports that.
c) We really should have more than a single file system.
This is something I argued for when the system was designed, but I
lost. I've been regretting it ever since.
d) We're using our space extremely inefficiently.
This is a consequence of c. above. We have thousands of automated
tests which run nightly, all of which write into NFS as scratch
space (on a RAID5 array no less). The tests really ought to be
re-factored to write scratch data locally, then move it to NFS
later if it's deemed valuable. Unfortunately here, I'm fighting
against a group of developers who claim that this perceived
convenience is too important to change. (Ironically, they all
complain when they can't get work done exactly *because* of this
supposed convenience, but then blame "the system" as being
"the problem" :)
To alleviate some of the problems, we've done the following:
1. Dropped the main NFS server back to 2GB of RAM.
2. Moved the scratch space onto a RAID0 set on a different NFS
server and set the clients to use the above mentioned mount
points for this file system.
We plan to change the options for the main NFS server as well,
but we wanted to see how the performace of the tests was affected
first by moving to a new server with more "correct" mount options.
(I've been told things look really good).
3. Planned/budgeted for a new NFS server.
We've decided to go with something more managable/scalable than
what we currently have and are getting an OnStor NFS appliance.
I apologize for the delay in providing a followup, and thanks to all
who assisted me. I learned a tremendous amount from this experience!
--
Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853 E808 BB07 9239 53F1 28EE