[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[SAGE] Summary: Need help with performace diagnosis



"Cyrus Vesuna" <cyrus.sage@gmail.com> writes:

> Hi Paul,
> Did you get any closer to a solution for this issue?

Hi Cyrus,

Yes, we discovered a lot of things in a very short time:

 a) We really need to upgrade the kernel.

    The kernel we're using is very old (2.4.22) and doesn't manage
    memory very well.  We added 2GB of memory to the existing 2GB and
    that made performance even worse.  Ideally we should move up to a
    2.6 kernel, but we don't have the down-time or the spare equipment
    to do this.

 b) Our NFS client mount options are horrible.

    All of our systems mount this single 1TB partition with the
    default NFS options for Linux.  Considering a) we have 300+
    clients, and b) they're doing lots and lots of writes, this is
    very bad.  We really ought to be mounting with at least the
    following options:

      nfsvers=3,rw,noatime,rsize=8192,wsize=8192

   (r,w)size probably ought to be up around 32k instead, since NFSv3
   supports that.

 c) We really should have more than a single file system.

    This is something I argued for when the system was designed, but I
    lost.  I've been regretting it ever since.

 d) We're using our space extremely inefficiently.

    This is a consequence of c. above.  We have thousands of automated
    tests which run nightly, all of which write into NFS as scratch
    space (on a RAID5 array no less).  The tests really ought to be
    re-factored to write scratch data locally, then move it to NFS
    later if it's deemed valuable.  Unfortunately here, I'm fighting
    against a group of developers who claim that this perceived
    convenience is too important to change. (Ironically, they all
    complain when they can't get work done exactly *because* of this
    supposed convenience, but then blame "the system" as being
    "the problem" :)

To alleviate some of the problems, we've done the following:

 1.  Dropped the main NFS server back to 2GB of RAM.

 2.  Moved the scratch space onto a RAID0 set on a different NFS
     server and set the clients to use the above mentioned mount
     points for this file system. 

     We plan to change the options for the main NFS server as well,
     but we wanted to see how the performace of the tests was affected
     first by moving to a new server with more "correct" mount options.
     (I've been told things look really good).

 3.  Planned/budgeted for a new NFS server.

     We've decided to go with something more managable/scalable than
     what we currently have and are getting an OnStor NFS appliance.

I apologize for the delay in providing a followup, and thanks to all
who assisted me. I learned a tremendous amount from this experience!

-- 
Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE